Hi Mo,

On Thu, Aug 28, 2014 at 3:33 PM, <[email protected]> wrote:

>
> Sorry for the late reply.


Me included. This email was lost in the pile!

I use Nutch 2.x as it enables me to do analytics over the data I am
crawling. This is my justification for trying to maintain an further the
development on that branch over the last while.
I am also extremely interested in the technologies supported within the
Nutch 2.X stack and I like keeping up with their development and using them
to fix my problems if and when the problems arise.
I like having fine grained control over my storage architecture. This is
also a pro for me.
The performance Julien talks about (and please correct me if I am wrong
Julien) is not so much Nutch related as it is Gora. Different Gora backends
perform differently, this is itself driven by who wishes to maintain them.

On another note, we've identified that for users, Nutch 2.X is a bloody
pain to provision and get running. This is a problem for this branch and
for the people that invest and possibly waste time trying to determine
revisions, etc.

It is my intention to build different Vagrant flavours for each Nutch 2.X
stack.
https://issues.apache.org/jira/browse/NUTCH-1812

If ANYONE on this list is intersted in helping with this effort them I
would dedicate some time to document the process on the wiki so that it can
be reproduced for everyone's benefit. I feel that this would be a huge move
forward for the 2.X branch.

Lewis

Reply via email to