Hi Ed, Disappointing to hear that this really got under your skin... never nice to hear that frustration becomes the outcome rather than successfully running the software. I've provided comments below
On Sat, Jun 3, 2017 at 12:27 PM, <[email protected]> wrote: > > From: Edward Capriolo <[email protected]> > To: "[email protected]" <[email protected]> > Cc: > Bcc: > Date: Sat, 3 Jun 2017 15:27:06 -0400 > Subject: What up with 2.3.1 ? > > Nutch 2.3.1, I have to say, I do not even understand it as a release. > This could be understood... as a previous (historical) user of the Nutch 1.X series... you seem to have prior expectations which are/were based on a simplified technology stack. Nutch 2.X is aimed at using a different stack and focuses on use of more modern storage solutions as you've found out. It has never really been touted as the go-to Nutch branch... you will notice that Nutch 1.X is the mainstream (master) branch. You'll also see, that over a number of years, the message has been consistent... Nutch 1.X is the go-to software both for users of source and release artifacts. > > First, I attempted to ... </snip> If you want to use Nutch 2.3.1 with HBase, you should use the backend datastore support which ships with the release announcement. That is as follows Apache Avro 1.7.6 Apache Hadoop 1.2.1 and 2.5.2 Apache HBase 0.98.8-hadoop2 (although also tested with 1.X) Apache Cassandra 2.0.2 Apache Solr 4.10.3 MongoDB 2.6.X Apache Accumlo 1.5.1 Apache Spark 1.4.1 I've tried my best, alongside several others over at the Gora community, to ensure all of these datastores are documented over at http://gora.apache.org/current/index.html#gora-modules. It should be noted that since then, Gora master branch contains datastore version upgrades for nearly every datastore. > > > > I just do not get the entire 2.3.1 release. It is very frustrating. Yes, as I said this is disappointing to see that you struggled so much with this. I've tried to make best efforts to ensure our Nutch2 tutorial is up-to-date https://wiki.apache.org/nutch/Nutch2Tutorial > The > webui's tend to fire blank pages with no stack traces. Please feel free to log issues... if it is broken then we can try to fix it. Without some Jira issue or debug information then we don't know it is broken. > Its unclear why > backends that do not work are even documented. HBase is most widely used, followed by MongoDB... on the other end of the spectrum, Cassandra is least used and broken. It has not been maintained for quite some time... and yes this is reflected by use of Super Columns. We are currently re-writing the backend as part of a GSoC project. > How can even the file/avro > support not even work? > Please log your issue(s) in Jira and I can try to reproduce it using 2.x branch. I do not use this backend now when I have deployed 2.X. I was not aware that it was broken. Lewis -- http://home.apache.org/~lewismc/ @hectorMcSpector http://www.linkedin.com/in/lmcgibbney

