Hi Everyone.. Ok I have 3 separate indexes on HDFS that are about 250 million URLS in each one. I've been trying to merge these using the script for merging the indexes after modifying it to work on HDFS. The script seems to be having issues and chokes at the end when it tries to build the merged index.
I'm sure it's a fairly simple matter of 3-4 individual commands to merge these 3 indexes. Are there any examples out there of how to utilize the merge commands and how to merge indexes on the HDFS? Do I need to move these on to a local disk to merge them? I'll keep poking around here and see what I can find. I'm about 2 crawls away from having 1.5 billion URLS and need to get these indexes merged and deduped somehow.. Thanks for any help.. Axel..
