Hi Julien,

>
>
> you seem to imply that this is not the case but some of us do (or have
> done) large crawls :-)
>

Not at all, I unreservedly take my words back if they came across like
that, I am well aware of the work some of you guys are doing. Please let me
rephrase, when was the last time we said we were able to benchmark Nutch
performance, given some environmenal factors X, and some Hadoop
distribution Y? Although this might not directly benefit the community in
terms of improving Nutch codebase, it might give us an indication of where
we can say Nutch works and where it doesn't. Take NUTCH-839 for example.

>
>
>
> We'd probably end up bogged down in endless discussions about parameters
> tuning, feature comparison etc...
>

Ok I agree with you here, but I think as the purpose of this would not be
mission critical for Nutch, it would however be nice to try and set some
'consistent' Nutch deployment ontop of many distros of Hadoop and build
recursively.



> I did not know about bigtop, thanks for the pointer!
>
> Who would provide the cluster for running the tests? Doing large scale
> crawls is not just about setting it up and watching it work : it does
> involve a fair amount of monitoring (unless you don't mind having 90% of
> your crawlDB filled by porn/junk etc... ). Not sure who would find the time
> to do that.
>

OK so the actual testing is hosted by Cloudera on thier own Jenkins area.
>From speaking to the guys here, they mentioned that Apache infrastructure
was not quite sufficient enough to handle the testing environment so they
are now working from Cloudera's infrastructure. From my discussions so far,
they are wanting as many application running in a distributed fashion on
the platform as possible as it will also help them to identify where bugs
lie in thier own code. We all know Nutch is an ideal candidate for this
type of job so hopefully we can find some common ground beneficial to both
projects. I will post the Cloudera Jenkins URL when I find Roman and get it
from him. In terms of maintenence, I really don't know how that would work
Julien, but I know that I've got another two days of face-to-face
oppertunity with these guys so there will be no better time to try and sort
this kind of thing out.

Thank you

Reply via email to