Kashif Khadim wrote:
Hi,
How can i remove url which ends with url like ".org" with this tool,if i try with url then it also deletes sites like http://somesites.com/org/ and this sites dont ends with domain ".org" .I want to have only ".com" sites for some index.

Current index structure doesn't yield easily to such operations. It is still possible to do it, but with a performance hit - add a PruneChecker implementation which retrieves the "url" field and checks with String.endsWith().


You could also change the index fields e.g. by changing the index-more plugin, and then re-index your segments. E.g. add a field called "TLD" containing "com" "org" etc...


-- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com



-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to