Hi ALL,
I've download the latest Nutch 0.8.0 and hadoop(05-08-06) version, I want
to study how Nutch put the information to the filesystem across all the
datanodes, (like if I already have my stuff indexed, how do I put them to the
hadoop filesystem). I've searched online but there's not many info about it.
I've studied the codes of Nutch and hadoop but it just makes me more confused.
I need some experts to give me a big picture or guide me to start.
Thanks for Andrzej previous reply, mention that using hadoop dfs
copyFromLocal will work. So here's my other questions,
1) using the command copyFromLocal, is the search will get the right data by
itself? I assume I'll still need to do some more work to make it work.
2) Since I don't get the big picture yet, what kind of data input
requirement is needed for copyFromLocal? since I have my own modified lucene
that will index all the data for me, and those data type may not be the same
as nutch, if I use that command and put the data into the filesystem, what
else should I implement inorder to do the search, or update?
3) What files should I study that's related to this part of work?
Any help will be appreciated.
William
---------------------------------
Sneak preview the all-new Yahoo.com. It's not radically different. Just
radically better.