Hi,
Hadoop is integrated with Nutch. Hence you don't need to specifically import
Hadoop into Nutch in order to use it.
The patch nutch-249 is meant for Nutch-0.8. Thats why you are having
difficulties using it with Nutch-0.9. Nutch-0.8 uses a older version of
Hadoop as compare to Nutch-0.9. You
I am getting warnings in hadoop.log that segments.gen and segments_2 are not
directories, and as you can see by the listing, they are in fact files not
directories. I'm not sure what stage of the process this is happening in, as
I just now stumbled on them, but it concerns me that it says it is
actually searcher.dir is still the default crawl. The warnings are showing
up either while indexing segments or merging indexes. I need to spend some
time figuring out just where it is happening at. I will look into it later
tonight, work doesn't like my hobbies intruding. :)
I may need some more
What is segments.gen and segments_2 ?
The warning I am getting happens when I dedup two indexes.
I create index1 and index2 through generate/fetch/index/...etc
index1 is an index of 1/2 the segments. index2 is an index of the other 1/2
The warning is happening on both datanodes.
The command I