Anyway just try and make experience with your use-case in general there is no best practice only different use cases.
HTH Stefan
Am 09.03.2005 um 19:54 schrieb nutuser2056:
Dear All, We need to crawl a number of sites which will amount to around 2 million pages and several GB of data. I know this is well within the ability of Nutch, but I a question about what is the right strategy to pursue to be able to crawl index with the least amount of bother.
Obviously I would need a single database create several manageable sized
segments and run the fetcher on each one, and subsequently update the
database, no problem so far. But what would then be the right strategy
to get an index which could be searched?
Should I :
1) merge all the segments and then index them, or 2) Should I index each segment individually and then merge the indexes, keeping the segments separate. Or 3) Should I index each segment separately, and keep both segments and indexes separate, and search across multiple indexes (but I have heard there are issues with the ranking)
Please let me know your views!!
Thanks a lot!! Regards,
--------------------------------------------------------------- company: http://www.media-style.com forum: http://www.text-mining.org blog: http://www.find23.net
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
