Jason Camp wrote:
Doh, I think I found out the problem. After using luke to dig through the indexed segments, it looks like all of the segments that I generated contain the same exact urls. When you generate a segment with the top 100k urls, I'm guessing they are not marked in any way to prevent the next generate from grabbing the same urls? I'd like to generate multiple segments in a row, and send them off to another server, is this possible using the local file system?
No, at the moment they are not marked in any way. This is on my TODO list, but not with a high priority.
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
