Hi,
I am looking to see if there are ways to merge
different Crawl results,
Let's say, I have two url sets in two different files,
I use following commands,
bin/nutch crawl URLs1.txt -dir test1 -depth 0 >&
test1.log
bin/nutch crawl URLs2.txt -dir test1 -depth 0 >&
test2.log
Then I have two folders test1 and test2.
My question are,
1. Is there a way to merge two sets of above result?
If it is, what's command string?
2. If above two sets have duplicate urls, how to make
merge results unique?
Or may be I can do it in different way if I want to do
accumlation indexing but not need do everything again
and again from beginning.
Can someone help out?
Thanks a lot.
Benny
____________________________________________________
Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs
-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general