[jira] [Commented] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-06-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355396#comment-15355396
 ] 

ASF GitHub Bot commented on NUTCH-2234:
---

Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/118


> Upgrade to elasticsearch 2.1.1
> --
>
> Key: NUTCH-2234
> URL: https://issues.apache.org/jira/browse/NUTCH-2234
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 1.11
>Reporter: Tien Nguyen Manh
>Assignee: Markus Jelsma
> Fix For: 1.13
>
> Attachments: NUTCH-2234.patch
>
>
> Currently we use elasticsearch 1.x, We should upgrade to 2.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301990#comment-15301990
 ] 

ASF GitHub Bot commented on NUTCH-2234:
---

Github user naegelejd commented on a diff in the pull request:

https://github.com/apache/nutch/pull/118#discussion_r64734526
  
--- Diff: ivy/ivy.xml ---
@@ -105,6 +105,10 @@



+   
--- End diff --

The tomcat dependencies were previously pulled in by Hadoop 2.4.0. They are 
needed for the protocol-http[client] JUnit tests using JSP.


> Upgrade to elasticsearch 2.1.1
> --
>
> Key: NUTCH-2234
> URL: https://issues.apache.org/jira/browse/NUTCH-2234
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 1.11
>Reporter: Tien Nguyen Manh
>Assignee: Markus Jelsma
> Fix For: 1.13
>
> Attachments: NUTCH-2234.patch
>
>
> Currently we use elasticsearch 1.x, We should upgrade to 2.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301536#comment-15301536
 ] 

ASF GitHub Bot commented on NUTCH-2234:
---

Github user lewismc commented on a diff in the pull request:

https://github.com/apache/nutch/pull/118#discussion_r64692054
  
--- Diff: ivy/ivy.xml ---
@@ -105,6 +105,10 @@



+   
--- End diff --

Why are these Tomcat dependencies added?


> Upgrade to elasticsearch 2.1.1
> --
>
> Key: NUTCH-2234
> URL: https://issues.apache.org/jira/browse/NUTCH-2234
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 1.11
>Reporter: Tien Nguyen Manh
>Assignee: Markus Jelsma
> Fix For: 1.13
>
> Attachments: NUTCH-2234.patch
>
>
> Currently we use elasticsearch 1.x, We should upgrade to 2.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300627#comment-15300627
 ] 

ASF GitHub Bot commented on NUTCH-2234:
---

GitHub user naegelejd opened a pull request:

https://github.com/apache/nutch/pull/118

fix for NUTCH-2234 and NUTCH-2236

Upgrade Elasticsearch and Lucene dependencies, which, in turn, requires 
updates to Guava and Hadoop dependencies:

- Elasticsearch 1.4.1 -> Elasticsearch 2.3.3
- Lucene 4.10.2 -> 5.5.0
- Guava 16.0.1 -> Guava 18.0
- Hadoop 2.4.0 -> 2.7.2

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/naegelejd/nutch NUTCH-2234

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/118.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #118


commit 31e738a014576d8a4d4c8e8d3a0fc8d9fe5f8077
Author: Joseph Naegele 
Date:   2016-05-25T18:27:31Z

fix for NUTCH-2234 and NUTCH-2236

upgrades Elasticsearch and Lucene dependencies, which, in turn,
requires updates to Guava and Hadoop dependencies:

- Elasticsearch 1.4.1 -> Elasticsearch 2.3.3
- Lucene 4.10.2 -> 5.5.0
- Guava 16.0.1 -> Guava 18.0
- Hadoop 2.4.0 -> 2.7.2




> Upgrade to elasticsearch 2.1.1
> --
>
> Key: NUTCH-2234
> URL: https://issues.apache.org/jira/browse/NUTCH-2234
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 1.11
>Reporter: Tien Nguyen Manh
>Assignee: Markus Jelsma
> Fix For: 1.13
>
> Attachments: NUTCH-2234.patch
>
>
> Currently we use elasticsearch 1.x, We should upgrade to 2.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-05-25 Thread Joseph Naegele (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300590#comment-15300590
 ] 

Joseph Naegele commented on NUTCH-2234:
---

Understood. The update to Lucene analyzers requires minor programmatic API 
changes in scoring-similarity, but nothing big. None of the indexers have 
tests, so I'm testing indexer-elastic manually for now. Unfortunately updating 
Elasticsearch breaks the plugin due to differences in guava versions: 
indexer-elastic depends on guava-18.0, which it declares in its plugin.xml, but 
guava-16.0.1 is a Nutch-wide dependency (for Hadoop). We avoided this issue in 
the past by also updating Nutch's Hadoop dependency from 2.4.0 -> 2.7.1, which 
is why Tien created NUTCH-2246. I'll open the PR with all aforementioned 
dependency updates.

> Upgrade to elasticsearch 2.1.1
> --
>
> Key: NUTCH-2234
> URL: https://issues.apache.org/jira/browse/NUTCH-2234
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 1.11
>Reporter: Tien Nguyen Manh
>Assignee: Markus Jelsma
> Fix For: 1.13
>
> Attachments: NUTCH-2234.patch
>
>
> Currently we use elasticsearch 1.x, We should upgrade to 2.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-05-24 Thread Joseph Naegele (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298924#comment-15298924
 ] 

Joseph Naegele commented on NUTCH-2234:
---

Hmm I'm a bit confused. ES 2.3.3 depends on Lucene 5.5.0 libraries. It appears 
indexer-solr does not depend on Lucene, only Solrj. lucene-analyzers-common 
4.10.2 is a Nutch-wide dependency in ivy/ivy.xml, but it appears to only be 
used by plugins: indexer-elastic, parsefilter-naivebayes, and 
scoring-similarity, of which indexer-elastic and parsefilter-naivebayes specify 
their Lucene dependencies in their own plugin.xml (scoring-similarity appears 
to rely on lucene-core 4.10.2 being a transitive dependency through 
lucene-analyzers-common. Changing the lucene version in ivy/ivy.xml requires 
changes to the scoring-similarity plugin, which I think should be its own issue.



> Upgrade to elasticsearch 2.1.1
> --
>
> Key: NUTCH-2234
> URL: https://issues.apache.org/jira/browse/NUTCH-2234
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 1.11
>Reporter: Tien Nguyen Manh
>Assignee: Markus Jelsma
> Fix For: 1.13
>
> Attachments: NUTCH-2234.patch
>
>
> Currently we use elasticsearch 1.x, We should upgrade to 2.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-05-24 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298842#comment-15298842
 ] 

Lewis John McGibbney commented on NUTCH-2234:
-

bq. I can update the patch or open a PR on Github.
Please do. Please make sure that you run tests as the dependencies have caught 
us out before. Please also consider that with indexer-solr we want to keep 
indexer-elastic and indexer-solr (and any other indexers) relying upon the same 
underlying version of Lucene if possible.

> Upgrade to elasticsearch 2.1.1
> --
>
> Key: NUTCH-2234
> URL: https://issues.apache.org/jira/browse/NUTCH-2234
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 1.11
>Reporter: Tien Nguyen Manh
>Assignee: Markus Jelsma
> Fix For: 1.13
>
> Attachments: NUTCH-2234.patch
>
>
> Currently we use elasticsearch 1.x, We should upgrade to 2.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-05-24 Thread Joseph Naegele (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298791#comment-15298791
 ] 

Joseph Naegele commented on NUTCH-2234:
---

Since this also adds support for multiple, comma-separated Elasticsearch hosts 
in {{elastic.host}}, the description {{nutch-default.xml}} should be updated 
accordingly. Is there any reason not to update this to use the most recent 
version of Elasticsearch (2.3.3)? I can update the patch or open a PR on Github.

> Upgrade to elasticsearch 2.1.1
> --
>
> Key: NUTCH-2234
> URL: https://issues.apache.org/jira/browse/NUTCH-2234
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 1.11
>Reporter: Tien Nguyen Manh
>Assignee: Markus Jelsma
> Fix For: 1.13
>
> Attachments: NUTCH-2234.patch
>
>
> Currently we use elasticsearch 1.x, We should upgrade to 2.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-02-28 Thread Tien Nguyen Manh (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171264#comment-15171264
 ] 

Tien Nguyen Manh commented on NUTCH-2234:
-

elasticsearch 2.1.1 use httpclient 4.3.6

> Upgrade to elasticsearch 2.1.1
> --
>
> Key: NUTCH-2234
> URL: https://issues.apache.org/jira/browse/NUTCH-2234
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 1.11
>Reporter: Tien Nguyen Manh
>Assignee: Markus Jelsma
> Fix For: 1.12
>
> Attachments: NUTCH-2234.patch
>
>
> Currently we use elasticsearch 1.x, We should upgrade to 2.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-02-26 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169521#comment-15169521
 ] 

Lewis John McGibbney commented on NUTCH-2234:
-

Out or curiosity. What versions of httpcore and httpclient does 2.X use?

> Upgrade to elasticsearch 2.1.1
> --
>
> Key: NUTCH-2234
> URL: https://issues.apache.org/jira/browse/NUTCH-2234
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 1.11
>Reporter: Tien Nguyen Manh
>Assignee: Markus Jelsma
> Fix For: 1.12
>
> Attachments: NUTCH-2234.patch
>
>
> Currently we use elasticsearch 1.x, We should upgrade to 2.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-02-26 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169182#comment-15169182
 ] 

Markus Jelsma commented on NUTCH-2234:
--

Nice! I'll get this in once i have that Git thing sorted out

> Upgrade to elasticsearch 2.1.1
> --
>
> Key: NUTCH-2234
> URL: https://issues.apache.org/jira/browse/NUTCH-2234
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 1.11
>Reporter: Tien Nguyen Manh
>Assignee: Markus Jelsma
> Fix For: 1.12
>
> Attachments: NUTCH-2234.patch
>
>
> Currently we use elasticsearch 1.x, We should upgrade to 2.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-02-26 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169167#comment-15169167
 ] 

Otis Gospodnetic commented on NUTCH-2234:
-

+1, works for us.

> Upgrade to elasticsearch 2.1.1
> --
>
> Key: NUTCH-2234
> URL: https://issues.apache.org/jira/browse/NUTCH-2234
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 1.11
>Reporter: Tien Nguyen Manh
> Attachments: NUTCH-2234.patch
>
>
> Currently we use elasticsearch 1.x, We should upgrade to 2.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)