I have a few requests to make ndfs run better for those of us who plan to use 
nutch for mulit-billion page indexes.
1.)redundant namenodes even load balancing namenodes would be nice too

2.)folder structure to the "data" folder so there isnt a single directory 
containing thousands of "chunk-files" of data!!! I think the way that squid's 
directory setup would work fine for this. The reason I ask this is I have a 
couple multi-terabyte arrays that are going to be running nutch if each chunk 
is 32mb then I'll have a data directory with 625000 files in!!! it not too 
cool if you ever need to browse it, I dont know how well reiser3 will fare 
with that, reiser4 should do fine but who's running that yet?

3.)make the namenode function properly without a datanode running on the same 
machine. I want to have my most powerfull machines running namenode strictly 
to replicate data and handle requests, to achieve a higher io rate.

4.)multi-directory start points, so you dont have to run multiple datanodes 
on a machine with more than 1 logical drive, datanode is very processor 
intensive running 40 datanodes on 1 machine will even kill a quad processor 
machine, but 1 datanodes with 40 starting points to store data will allow the 
other processors to do other tasks. round robin will work really good here 
for data storage even on computers with different size volumes.

5.)make it scalable to a very large size (petabytes)

6.)Anyone looking for a programming job??? I need someone who knows nutch in 
and out, someone who could fix the ndfs problems plus make a couple other 
things to do with the webui and plugins for nutch to extend its features.
e-mail me a salary qoute please if you have these skills.
Thanks,
Jay Pound
[EMAIL PROTECTED]


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to