Hi Armel, thanks for you quick reply!
> I have been working on a similar project for the last couple of months but I > am taking a slightly different approach. Because fetching - parsing - > indexing can be time consuming and in my case, I also need the unclassified > indexes. Using classification algorithm and the Lucene API, I build > classified indexes by using the first index as corpus. > This is definitely a good idea and a somewhat other approach as it moves the classification task out of Nutch and into Lucene. Are there any frameworks/plugins already available for applying document classification within Lucene? The much faster parsing and indexing process within Nutch if no "online" classification takes places stands against the disk space consumption which is some thousand times greater when indexing all parsed documents instead of indexing only the positively classified ones. > Maybe we should discuss together on skype or MSN let me know. My skype is > etapix. > That would be really nice, thanks for the offer! I'll let you know my MSN-nummer after I've created an account. Best regards Bastian ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers