Kelvin Tan wrote:
Interesting. I haven't tried it myself. Do you have any code/benchmarks for 
this?

I never committed it anywhere. I initially tried to write Nutch's IPC mechanism with nio and it was slow and buggy. One problem was that I needed to switch streams to non-blocking mode in order to read arbitrarily large objects, then switch them back to blocking mode in order to select() on them. But you can't change this state and remove them from the selector without going through the scheduler. So the benefit of skipping the scheduler wasn't there. If I was willing to fragment objects into fixed size chunks then it might have worked, but that's a lot of work. It's a strange limitation, since with native sockets one can select and then perform arbitrary stream i/o, not limited to a single buffer.

Also, there's an nio version of Lucene's Directory that's a bit slower than the non-nio version, but this is not using select() or anything.

Are you aware of others facing the same problem?

How much non-blocking nio code do you find in real Java code? I have not seen a lot.

I did find that Sun has implemented a high-performance HTTP client using nio. This is documented at:

http://blogs.sun.com/roller/resources/fp/grizzly.pdf

From what I can tell the primary benefit is in number of simultaneous clients, not in throughput. Does a crawler require 1000's of simultaneous connections? If so, then it looks like careful use of nio could offer some real benefits.

Doug


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to