Hello Guys,

I have been trying to integrate the NutchRest with my web application and it
seem to be working but I realize that the crawling when invoked via the
application is going to take a long time as it is a candidate for the batch
process.

I have been thinking of having 

1) N number of NutchRest servers.
2) Queuing System which will get the messages when nutch completes/finished
each Job.
3) The Queuing System should be smart enough to send the subsequent Job
processing to NutchRest servers, may be we can have a pluggable Algorithm
for it. We can have RoundRobin as default.

This way we could scale things however it would require the code changes in
the Nutch, each Job when completed need to be sending the events to the
Queuing system and the NutchRest Server. The Queuing system needs to manage
the work flow.

I am hoping someone might have though on these lines and may have
implemented, I would be interested to know the opinion of the folks here
about it. 

Hoping to hear more about it.

Thanks,
Vicky



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Nutch-and-workflow-for-scaling-tp4317955.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to