Hi,
I am looking to see if there are ways to merge
different Crawl results,
Let's say, I have two url sets in two different files,
I use following commands,
bin/nutch crawl URLs1.txt -dir test1 -depth 0 >&
test1.log
bin/nutch crawl URLs2.txt -dir test1 -depth 0 >&
test2.log
Then I have two folders test1 and test2.
My question are,
1. Is there a way to merge two sets of above result?
If it is, what's command string?
2. If above two sets have duplicate urls, how to make
merge results unique?
Or may be I can do it in different way if I want to do
accumlation indexing but not need do everything again
and again from beginning.
Can someone help out?
Thanks a lot.
Benny
____________________________________________________
Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs