Hi Mo, On Thu, Aug 28, 2014 at 3:33 PM, <[email protected]> wrote:
> > Sorry for the late reply. Me included. This email was lost in the pile! I use Nutch 2.x as it enables me to do analytics over the data I am crawling. This is my justification for trying to maintain an further the development on that branch over the last while. I am also extremely interested in the technologies supported within the Nutch 2.X stack and I like keeping up with their development and using them to fix my problems if and when the problems arise. I like having fine grained control over my storage architecture. This is also a pro for me. The performance Julien talks about (and please correct me if I am wrong Julien) is not so much Nutch related as it is Gora. Different Gora backends perform differently, this is itself driven by who wishes to maintain them. On another note, we've identified that for users, Nutch 2.X is a bloody pain to provision and get running. This is a problem for this branch and for the people that invest and possibly waste time trying to determine revisions, etc. It is my intention to build different Vagrant flavours for each Nutch 2.X stack. https://issues.apache.org/jira/browse/NUTCH-1812 If ANYONE on this list is intersted in helping with this effort them I would dedicate some time to document the process on the wiki so that it can be reproduced for everyone's benefit. I feel that this would be a huge move forward for the 2.X branch. Lewis

