On 25-okt-2006, at 18:26, Andrzej Bialecki wrote:
Eelco Lempsink wrote:Of course, for high volumes of data first indexing, and afterwards removing it, doesn't sound like a good option in my case where only a small part of the fetched data needs to be indexed.Has anyone solved this problem (elegantly)? I mainly wonder if it's feasible to do it only using plugins, since I suspect I must implement my own Indexer.Plugins may also return null doc. Standard Indexer would have to be modified to handle this gracefully, but it's trivial:
Thank you, that's indeed a good solution. The only thing that bothers me is that plugins _may_ return null doc's, but it's not handled well. (In other words, by reading the code I didn't get the idea that returning a null doc would be okay.) I submitted a bug report for this (https://issues.apache.org/jira/browse/NUTCH-393).
-- Regards, Eelco Lempsink
PGP.sig
Description: This is a digitally signed message part
------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________ Nutch-general mailing list Nutch-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-general