2006/11/13, carmmello <[EMAIL PROTECTED]>:
> Hi,
> Nutch, from version 0.8 is, really, very, very slow, using a single machine,
> to process data, after the crawling.  Compared with Nutch 0.7.2 I would say,
> ...
> this series.  I don`t believe that there are many Nutch users, in the real
> world of searching, with a farm of computers.  I, for myself, have already

Ditto, on both points.
Furthermore, I'd say I'm much more likely to deliver 10 single machine
nutch setups than a single system with 10 nodes. I believe the same
goes for a number of other users.

I had a look at the hadoop code and, well, it'd take a week (probably
an optimistic estimate) just to get acquainted with selected points of
interest, leaving a lot unknown. And this is just to get started. At
the moment, I can't justify a possible hi-risk, multi-week effort to
investigate where the bottleneck is and find a workable solution - I
can only imagine how this problem would look to someone without any
prior knowledge about distributed systems and/or indexing
technology...
...in the meantime, I suspect we might see something that seems much
more reasonable in the mid-term: a lot of useful code back-ported to
0.7.2., doing an excellent nice job on installations on one or a
hand-full of servers.

t.n.a.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to