[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-06 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696868#comment-17696868
 ] 

Hudson commented on NUTCH-2920:
---

SUCCESS: Integrated in Jenkins build Nutch » Nutch-trunk #95 (See 
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/95/])
NUTCH-2920 -- first working attempt at migrating ElasticsearchIndexWriter to 
OpenSearch (snagel: 
[https://github.com/apache/nutch/commit/ca3824fd98290dd7806752decfab6eb9e3b3b569])
* (add) src/plugin/indexer-opensearch-1x/plugin.xml
* (add) src/plugin/indexer-opensearch-1x/ivy.xml
* (add) 
src/plugin/indexer-opensearch-1x/src/java/org/apache/nutch/indexwriter/opensearch1x/OpenSearch1xIndexWriter.java
* (add) 
src/plugin/indexer-opensearch-1x/src/java/org/apache/nutch/indexwriter/opensearch1x/package-info.java
* (add) 
src/plugin/indexer-opensearch-1x/src/java/org/apache/nutch/indexwriter/opensearch1x/OpenSearch1xConstants.java
* (edit) src/plugin/build.xml
* (edit) conf/index-writers.xml.template
* (add) src/plugin/indexer-opensearch-1x/build.xml
* (edit) LICENSE-binary
* (add) src/plugin/indexer-opensearch-1x/build-ivy.xml
* (add) src/plugin/indexer-opensearch-1x/README.md
* (edit) NOTICE-binary
NUTCH-2920 -- fix imports (snagel: 
[https://github.com/apache/nutch/commit/6e149f4954a0b7b21120b8e1467a07a82c60e66e])
* (edit) 
src/plugin/indexer-opensearch-1x/src/java/org/apache/nutch/indexwriter/opensearch1x/OpenSearch1xIndexWriter.java
NUTCH-2920 -- add keystore for 2-way tls; add back in no-tls option with a 
stern warning and possibly helpful links. (snagel: 
[https://github.com/apache/nutch/commit/f6b17177ad6049b5642d9510cb60fe0a1d3b5f1c])
* (edit) 
src/plugin/indexer-opensearch-1x/src/java/org/apache/nutch/indexwriter/opensearch1x/OpenSearch1xConstants.java
* (edit) 
src/plugin/indexer-opensearch-1x/src/java/org/apache/nutch/indexwriter/opensearch1x/OpenSearch1xIndexWriter.java
NUTCH-2920 -- improve handling for missing trust.store.path in the 
index-writers.xml (snagel: 
[https://github.com/apache/nutch/commit/5fc2839c447a1b3695b4bcb507d428d32ff27281])
* (edit) 
src/plugin/indexer-opensearch-1x/src/java/org/apache/nutch/indexwriter/opensearch1x/OpenSearch1xIndexWriter.java
NUTCH-2920 -- improve username/pw logic and update README.md (snagel: 
[https://github.com/apache/nutch/commit/71fabb2a87ff81b78997133ab7c790afa4ea6157])
* (edit) 
src/plugin/indexer-opensearch-1x/src/java/org/apache/nutch/indexwriter/opensearch1x/OpenSearch1xIndexWriter.java
* (edit) src/plugin/indexer-opensearch-1x/README.md


> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Affects Versions: 1.20
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696856#comment-17696856
 ] 

ASF GitHub Bot commented on NUTCH-2920:
---

sebastian-nagel commented on PR #761:
URL: https://github.com/apache/nutch/pull/761#issuecomment-1455893629

   Merged. After reading about the [OpenSearch 
releases](https://opensearch.org/releases.html), I agree to include the version 
number in the plugin's name. It would also allow us to start working on a 2x 
plugin now. But maybe we keep naming the plugin supporting the latest version 
without using a specific version number?




> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696836#comment-17696836
 ] 

ASF GitHub Bot commented on NUTCH-2920:
---

sebastian-nagel merged PR #761:
URL: https://github.com/apache/nutch/pull/761




> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696325#comment-17696325
 ] 

ASF GitHub Bot commented on NUTCH-2920:
---

sebastian-nagel commented on PR #761:
URL: https://github.com/apache/nutch/pull/761#issuecomment-1454129142

   Well, don't really know. Depends on how long we want to maintain it - 
upgrading and testing (manually, as of now). And also, how long users may want 
to keep it running.
   
   For the Solr and Elastic indexers we always supported only one version (a 
recent one, but rarely the latest).
   
   > using testcontainers over on Tika to test integration with OpenSearch and 
Solr
   
   Same for [StormCrawler](https://github.com/Digitalpebble/storm-crawler/). 
Might be worth to have a look at the OpenSearch (2.5) module over there.
   




> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696320#comment-17696320
 ] 

ASF GitHub Bot commented on NUTCH-2920:
---

tballison commented on PR #761:
URL: https://github.com/apache/nutch/pull/761#issuecomment-1454110886

   Sorry, what I intended by my question was: are you all ok with version 
numbers in the module name.  The current code is deprecated for 3x so I think 
we'll need to have a 3x at some point.  Or, do we want to target, say 2.x as 
'indexer-opensearch' and hope it supports 1.x, etc...
   




> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696318#comment-17696318
 ] 

ASF GitHub Bot commented on NUTCH-2920:
---

tballison commented on PR #761:
URL: https://github.com/apache/nutch/pull/761#issuecomment-1454107912

   I've run this with both the docker "getting started" OpenSearch example  
(e.g. `docker run -d -p 127.0.0.1:9200:9200 -p 127.0.0.1:9600:9600 -e 
"discovery.type=single-node" opensearchproject/opensearch:1.3.8` with 
username=admin and pw=admin) and with the example for running OpenSearch 
locally with the trust store in the [post 
above](https://opensearch.org/blog/connecting-java-high-level-rest-client-with-opensearch-over-https/)
   
   The more tests the merrier! 
   
   We're using testcontainers over on Tika to test integration with OpenSearch 
and Solr.  Resource and time intensive, but really, really valuable to have 
those.  I can try to draft some testcontainers tests in a separate PR.




> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696315#comment-17696315
 ] 

ASF GitHub Bot commented on NUTCH-2920:
---

sebastian-nagel commented on PR #761:
URL: https://github.com/apache/nutch/pull/761#issuecomment-145413

   > ok with the *-1x design?
   
   Yes. As said, I haven't tested it with a running OpenSearch instance. If you 
can confirm that docs/fields appear in the index as expected, that's fine. 
Otherwise, I can try to run an OpenSearch instance (would be my first time), 
index a bulk of docs using various index filter plugins (add to the property 
plugin.includes `index-(basic|more|anchor|geoip)`). This would be a more 
realistic test scenario.




> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696296#comment-17696296
 ] 

ASF GitHub Bot commented on NUTCH-2920:
---

tballison commented on PR #761:
URL: https://github.com/apache/nutch/pull/761#issuecomment-1454059036

   To confirm, you're all ok with the *-1x design?  Less than ideal, but I 
think it is useful.




> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696294#comment-17696294
 ] 

ASF GitHub Bot commented on NUTCH-2920:
---

tballison commented on code in PR #761:
URL: https://github.com/apache/nutch/pull/761#discussion_r1124942828


##
NOTICE-binary:
##
@@ -1021,6 +1021,10 @@ mapdb (http://www.mapdb.org)
 webarchive-commons (https://github.com/iipc/webarchive-commons)
 - license: The Apache Software License, Version 2.0
 
+# org.opensearch.client:opensearch-rest-high-level-client

Review Comment:
   +1, nothing to do, tho, on this PR, right?





> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696293#comment-17696293
 ] 

ASF GitHub Bot commented on NUTCH-2920:
---

tballison commented on code in PR #761:
URL: https://github.com/apache/nutch/pull/761#discussion_r1124942563


##
src/plugin/build.xml:
##
@@ -54,6 +54,7 @@
 
 
 
+

Review Comment:
   Done.





> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696284#comment-17696284
 ] 

ASF GitHub Bot commented on NUTCH-2920:
---

sebastian-nagel commented on code in PR #761:
URL: https://github.com/apache/nutch/pull/761#discussion_r1124754751


##
src/plugin/build.xml:
##
@@ -54,6 +54,7 @@
 
 
 
+

Review Comment:
   (if `ant clean` is called in the folder `src/plugin`)
   
   In addition, the plugin should be added to the build.xml (targets javadoc, 
eclipse and release), same as other plugins.





> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696224#comment-17696224
 ] 

ASF GitHub Bot commented on NUTCH-2920:
---

sebastian-nagel commented on code in PR #761:
URL: https://github.com/apache/nutch/pull/761#discussion_r1124705690


##
src/plugin/build.xml:
##
@@ -54,6 +54,7 @@
 
 
 
+

Review Comment:
   Should also add the new plugin to the target "clean".



##
src/plugin/build.xml:
##
@@ -54,6 +54,7 @@
 
 
 
+

Review Comment:
   (if `ant clean` is called in the folder `src/plugin`)
   
   In addition, the plugin should be added to the build.xml (targets javadoc, 
elipse and release), same as other plugins.



##
NOTICE-binary:
##
@@ -1021,6 +1021,10 @@ mapdb (http://www.mapdb.org)
 webarchive-commons (https://github.com/iipc/webarchive-commons)
 - license: The Apache Software License, Version 2.0
 
+# org.opensearch.client:opensearch-rest-high-level-client

Review Comment:
   Excellent! - but we should seriously automatize keeping the licenses 
up-to-date, see NUTCH-2981.





> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695267#comment-17695267
 ] 

ASF GitHub Bot commented on NUTCH-2920:
---

tballison commented on PR #761:
URL: https://github.com/apache/nutch/pull/761#issuecomment-1450693830

   K, I think this is ready for review.  I'm happy for any and all input!




> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-01 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695152#comment-17695152
 ] 

Tim Allison commented on NUTCH-2920:


Current proposal is to go with the high level rest client for 1.x for now and 
cheer on the successful completion of 
https://github.com/opensearch-project/opensearch-java/issues/181.

> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-01 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695148#comment-17695148
 ] 

Tim Allison commented on NUTCH-2920:


Well, that was a funny notion...

Turns out there is no BulkProcessor currently in the regular java-client (only 
exists in the high level java client) -- 
https://github.com/opensearch-project/opensearch-java/issues/181

So, we can make bulk requests with the basic java client, but we'd have to 
cache the bulk operations and have logic for when to run the operations.

The BulkProcessor takes care of all of this and has triggers for when to send 
the bulk data (size or time) and has retry logic and some other useful things.

This means that we'd have to reimplement that functionality, which I did on 
Tika ... and I don't want to do again. LOL...

> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-01 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695096#comment-17695096
 ] 

Tim Allison commented on NUTCH-2920:


My initial PR was a simple copy+paste with a few modifications of the 
ElasticsearchIndexWriter.  Part of that was to make review easier, and part of 
that was that I saw that the lower level java rest client was in beta and that 
OpenSearch was recommending still using the high-level rest client 
(https://opensearch.org/docs/1.2/clients/java/). 

In thinking about this more, I realize that this "beta" message was for 1.2.  
It is gone in 1.3 (https://opensearch.org/docs/1.3/clients/java/). Further, the 
high level rest client is deprecated in 2.x and will be removed in 3.x.

I'm going to rework the PR to use the more modern client.  This will make 
migrating to 2.x easier and hopefully require far fewer dependencies in 1.x?

> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693459#comment-17693459
 ] 

ASF GitHub Bot commented on NUTCH-2920:
---

tballison commented on PR #761:
URL: https://github.com/apache/nutch/pull/761#issuecomment-1445065255

   Needs more work on tls vs basic auth etc. Converting to draft.  Will update 
on Monday.




> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-02-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693335#comment-17693335
 ] 

ASF GitHub Bot commented on NUTCH-2920:
---

tballison commented on PR #761:
URL: https://github.com/apache/nutch/pull/761#issuecomment-1444379112

   I'm less than entirely thrilled with using stored strings for credentials, 
but that's where we were with Elasticsearch.  Again, if there's a better way, 
please let me know.




> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-02-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1769#comment-1769
 ] 

ASF GitHub Bot commented on NUTCH-2920:
---

tballison commented on PR #761:
URL: https://github.com/apache/nutch/pull/761#issuecomment-1444377846

   The fiddly part (for me) was setting up the rest client to deal with a trust 
store.
   
   I followed 
https://opensearch.org/blog/connecting-java-high-level-rest-client-with-opensearch-over-https/
 
   
   I had to update the `keystore` command in that blog post like so: `keytool 
-importcert -file root_ca.der -keystore my-store.jks -storepass mystorepass 
-alias opensearch`
   
   If there's a better way of handling this, let me know.




> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-02-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693330#comment-17693330
 ] 

ASF GitHub Bot commented on NUTCH-2920:
---

tballison opened a new pull request, #761:
URL: https://github.com/apache/nutch/pull/761

   …iter to OpenSearch
   
   Thanks for your contribution to [Apache Nutch](https://nutch.apache.org/)! 
Your help is appreciated!
   
   Before opening the pull request, please verify that
   * there is an open issue on the [Nutch issue 
tracker](https://issues.apache.org/jira/projects/NUTCH) which describes the 
problem or the improvement. We cannot accept pull requests without an issue 
because the change wouldn't be listed in the release notes.
   * the issue ID (`NUTCH-2920`)
 - is referenced in the title of the pull request
 - and placed in front of your commit messages surrounded by square 
brackets (`[NUTCH-2920] Issue or pull request title`)
   * commits are squashed into a single one (or few commits for larger changes)
   * Java source code follows [Nutch Eclipse Code Formatting 
rules](https://github.com/apache/nutch/blob/master/eclipse-codeformat.xml)
   * Nutch is successfully built and unit tests pass by running `ant clean 
runtime test`
   * there should be no conflicts when merging the pull request branch into the 
*recent* master branch. If there are conflicts, please try to rebase the pull 
request branch on top of a freshly pulled master branch.
   * if new dependencies are added,
 - are these dependencies licensed in a way that is compatible for 
inclusion under [ASF 
2.0](https://www.apache.org/legal/resolved.html#category-a)?
 - are `LICENSE-binary` and `NOTICE-binary` updated accordingly?
   
   We will be able to faster integrate your pull request if these conditions 
are met. If you have any questions how to fix your problem or about using Nutch 
in general, please sign up for the [Nutch mailing 
list](https://nutch.apache.org/mailing_lists.html). Thanks!
   




> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)