The idea of NDFS is a means of cheap/decrantalized
storage - has there been any thoughts into
cheap/"brokered" processing of the WebDB?

My concern is at 100 million pages it takes days to
process on a dual/quad xeon machine - to compete with
the likes of google in keeping up with generate
segments, update db requests and alike 24/7 i'm not
sure a single "processing" node could work.

Is it feasable to think of a "broker" server that
takes a webdb request and then sends it to the correct
"bucket/client" server for processing and then it
takes the results from the bucket servers and makes a
decision based upon that? (much like the distributed
index processes)

The idea is that update db would be streamed to a
broker server that could compute link statistics
almost in real time by sending a simple query to the
db servers asking for the mechanics of the document
that is incoming and then distributing the decision to
the appropriate bucket.  The premise being that the
bucket servers are maintaining themselves on a smaller
scale than the "whole" and would communicate
diffs/changes/updates and insterts/deletes to each
other?

One way to manage the process and scale it according
to cost effective use of cpu & funds would be to sort
the "buckets" based upon a rank method that as the db
is grown and analyzed it would naturally distribute
itself according to the ranks of the documents.

Thus you could possibly build segments to be fetched
based upon each "bucket" and cut down on analyze time
as well and send the fetched segments webdb updates to
the broker servers that would repeat the
computaitonal/ranking process in top down fashion
through a somewhat "natural selection" process :)

I may be off the wall, but i'm just tyring to think of
ideas here.  Sort of like partitioning an oracle
database based upon specific ranges and only querying
the necessary ranges for your "heavy" tasks and then
having a process by which you manage/massage & update
the partitioned data accordingly.




-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - 
digital self defense, top technical experts, no vendor pitches, 
unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to