RE: What up with 2.3.1 ?

2017-06-05 Thread lewis john mcgibbney
Forwarding with correct thread name.

-- Forwarded message --
From: lewis john mcgibbney 
Date: Mon, Jun 5, 2017 at 2:50 PM
Subject: Re: user Digest 3 Jun 2017 19:27:20 - Issue 2758
To: "user@nutch.apache.org" 


Hi Ed,
Disappointing to hear that this really got under your skin... never nice to
hear that frustration becomes the outcome rather than successfully running
the software. I've provided comments below

On Sat, Jun 3, 2017 at 12:27 PM,  wrote:

>
> From: Edward Capriolo 
> To: "user@nutch.apache.org" 
> Cc:
> Bcc:
> Date: Sat, 3 Jun 2017 15:27:06 -0400
> Subject: What up with 2.3.1 ?
>
> Nutch 2.3.1, I have to say, I do not even understand it as a release.
>

This could be understood... as a previous (historical) user of the Nutch
1.X series... you seem to have prior expectations which are/were based on a
simplified technology stack. Nutch 2.X is aimed at using a different stack
and focuses on use of more modern storage solutions as you've found out. It
has never really been touted as the go-to Nutch branch... you will notice
that Nutch 1.X is the mainstream (master) branch. You'll also see, that
over a number of years, the message has been consistent... Nutch 1.X is the
go-to software both for users of source and release artifacts.


>
> First, I attempted to ...



If you want to use Nutch 2.3.1 with HBase, you should use the backend
datastore support which ships with the release announcement. That is as
follows

Apache Avro 1.7.6
Apache Hadoop 1.2.1 and 2.5.2
Apache HBase 0.98.8-hadoop2 (although also tested with 1.X)
Apache Cassandra 2.0.2
Apache Solr 4.10.3
MongoDB 2.6.X
Apache Accumlo 1.5.1
Apache Spark 1.4.1

I've tried my best, alongside several others over at the Gora community, to
ensure all of these datastores are documented over at
http://gora.apache.org/current/index.html#gora-modules.
It should be noted that since then, Gora master branch contains datastore
version upgrades for nearly every datastore.


>
>
>
> I just do not get the entire 2.3.1 release. It is very frustrating.


Yes, as I said this is disappointing to see that you struggled so much with
this. I've tried to make best efforts to ensure our Nutch2 tutorial is
up-to-date
https://wiki.apache.org/nutch/Nutch2Tutorial


> The
> webui's tend to fire blank pages with no stack traces.


Please feel free to log issues... if it is broken then we can try to fix
it. Without some Jira issue or debug information then we don't know it is
broken.


> Its unclear why
> backends that do not work are even documented.


HBase is most widely used, followed by MongoDB... on the other end of the
spectrum, Cassandra is least used and broken. It has not been maintained
for quite some time... and yes this is reflected by use of Super Columns.
We are currently re-writing the backend as part of a GSoC project.


> How can even the file/avro
> support not even work?
>

Please log your issue(s) in Jira and I can try to reproduce it using 2.x
branch. I do not use this backend now when I have deployed 2.X. I was not
aware that it was broken.
Lewis


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney



-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: user Digest 3 Jun 2017 19:27:20 -0000 Issue 2758

2017-06-05 Thread lewis john mcgibbney
Hi Ed,
Disappointing to hear that this really got under your skin... never nice to
hear that frustration becomes the outcome rather than successfully running
the software. I've provided comments below

On Sat, Jun 3, 2017 at 12:27 PM,  wrote:

>
> From: Edward Capriolo 
> To: "user@nutch.apache.org" 
> Cc:
> Bcc:
> Date: Sat, 3 Jun 2017 15:27:06 -0400
> Subject: What up with 2.3.1 ?
>
> Nutch 2.3.1, I have to say, I do not even understand it as a release.
>

This could be understood... as a previous (historical) user of the Nutch
1.X series... you seem to have prior expectations which are/were based on a
simplified technology stack. Nutch 2.X is aimed at using a different stack
and focuses on use of more modern storage solutions as you've found out. It
has never really been touted as the go-to Nutch branch... you will notice
that Nutch 1.X is the mainstream (master) branch. You'll also see, that
over a number of years, the message has been consistent... Nutch 1.X is the
go-to software both for users of source and release artifacts.


>
> First, I attempted to ...



If you want to use Nutch 2.3.1 with HBase, you should use the backend
datastore support which ships with the release announcement. That is as
follows

Apache Avro 1.7.6
Apache Hadoop 1.2.1 and 2.5.2
Apache HBase 0.98.8-hadoop2 (although also tested with 1.X)
Apache Cassandra 2.0.2
Apache Solr 4.10.3
MongoDB 2.6.X
Apache Accumlo 1.5.1
Apache Spark 1.4.1

I've tried my best, alongside several others over at the Gora community, to
ensure all of these datastores are documented over at
http://gora.apache.org/current/index.html#gora-modules.
It should be noted that since then, Gora master branch contains datastore
version upgrades for nearly every datastore.


>
>
>
> I just do not get the entire 2.3.1 release. It is very frustrating.


Yes, as I said this is disappointing to see that you struggled so much with
this. I've tried to make best efforts to ensure our Nutch2 tutorial is
up-to-date
https://wiki.apache.org/nutch/Nutch2Tutorial


> The
> webui's tend to fire blank pages with no stack traces.


Please feel free to log issues... if it is broken then we can try to fix
it. Without some Jira issue or debug information then we don't know it is
broken.


> Its unclear why
> backends that do not work are even documented.


HBase is most widely used, followed by MongoDB... on the other end of the
spectrum, Cassandra is least used and broken. It has not been maintained
for quite some time... and yes this is reflected by use of Super Columns.
We are currently re-writing the backend as part of a GSoC project.


> How can even the file/avro
> support not even work?
>

Please log your issue(s) in Jira and I can try to reproduce it using 2.x
branch. I do not use this backend now when I have deployed 2.X. I was not
aware that it was broken.
Lewis


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney