2006/11/13, carmmello <[EMAIL PROTECTED]>:
Hi, Nutch, from version 0.8 is, really, very, very slow, using a single machine, to process data, after the crawling. Compared with Nutch 0.7.2 I would say, ... this series. I don`t believe that there are many Nutch users, in the real world of searching, with a farm of computers. I, for myself, have already
Ditto, on both points. Furthermore, I'd say I'm much more likely to deliver 10 single machine nutch setups than a single system with 10 nodes. I believe the same goes for a number of other users. I had a look at the hadoop code and, well, it'd take a week (probably an optimistic estimate) just to get acquainted with selected points of interest, leaving a lot unknown. And this is just to get started. At the moment, I can't justify a possible hi-risk, multi-week effort to investigate where the bottleneck is and find a workable solution - I can only imagine how this problem would look to someone without any prior knowledge about distributed systems and/or indexing technology... ...in the meantime, I suspect we might see something that seems much more reasonable in the mid-term: a lot of useful code back-ported to 0.7.2., doing an excellent nice job on installations on one or a hand-full of servers. t.n.a.
