[jira] [Created] (NUTCH-2989) Can't have username/pw AND https on elastic-indexer?!
Tim Allison created NUTCH-2989: -- Summary: Can't have username/pw AND https on elastic-indexer?! Key: NUTCH-2989 URL: https://issues.apache.org/jira/browse/NUTCH-2989 Project: Nutch Issue Type: Task Reporter: Tim Allison While working on NUTCH-2920, I copied+pasted the elastic indexer. As part of that process, I noticed that basic auth doesn't work with https. {code:java} if (auth) { restClientBuilder .setHttpClientConfigCallback(new HttpClientConfigCallback() { @Override public HttpAsyncClientBuilder customizeHttpClient( HttpAsyncClientBuilder arg0) { return arg0.setDefaultCredentialsProvider(credentialsProvider); } }); } // In case of HTTPS, set the client up for ignoring problems with self-signed // certificates and stuff if ("https".equals(scheme)) { try { SSLContextBuilder sslBuilder = SSLContexts.custom(); sslBuilder.loadTrustMaterial(null, new TrustSelfSignedStrategy()); final SSLContext sslContext = sslBuilder.build(); restClientBuilder.setHttpClientConfigCallback(new HttpClientConfigCallback() { @Override public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) { // ignore issues with self-signed certificates httpClientBuilder.setSSLHostnameVerifier(NoopHostnameVerifier.INSTANCE); return httpClientBuilder.setSSLContext(sslContext); } }); } catch (Exception e) { LOG.error("Error setting up SSLContext because: " + e.getMessage(), e); } } {code} On NUTCH-2920, I fixed this for the opensearch-indexer by adding another {{if (auth)}} statement under the {{https}} branch. If this is an actual issue, I'm happy to open a PR. If I've misunderstood the code or the design, please close as "not a problem". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin
[ https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695267#comment-17695267 ] ASF GitHub Bot commented on NUTCH-2920: --- tballison commented on PR #761: URL: https://github.com/apache/nutch/pull/761#issuecomment-1450693830 K, I think this is ready for review. I'm happy for any and all input! > Implement a indexer-opensearch plugin > - > > Key: NUTCH-2920 > URL: https://issues.apache.org/jira/browse/NUTCH-2920 > Project: Nutch > Issue Type: New Feature > Components: plugin >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Major > Fix For: 1.20 > > > We will be moving to AWS-managed OpenSearch in the near term and I would like > to index our content there. > As of writing the OpenSearch project has published two plugin versions under > thw Apache License v2 so far > https://github.com/opensearch-project/opensearch-java/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [nutch] tballison commented on pull request #761: NUTCH-2920 -- add an OpenSearchIndexWriter
tballison commented on PR #761: URL: https://github.com/apache/nutch/pull/761#issuecomment-1450693830 K, I think this is ready for review. I'm happy for any and all input! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?
[ https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-2988. Resolution: Duplicate Duplicate. Sorry! > Elasticsearch 7.13.2 compatible with ASL 2.0? > - > > Key: NUTCH-2988 > URL: https://issues.apache.org/jira/browse/NUTCH-2988 > Project: Nutch > Issue Type: Task >Reporter: Tim Allison >Priority: Minor > Attachments: LICENSE.txt > > > In the latest release of at least the 1.x branch of Nutch, the elasticsearch > high level java client is at 7.13.2, which is after the great schism. Or, > the last purely ASL 2.0 license was in 7.10.2. > So, do we need to downgrade to 7.10.2 or is Elasticsearch's new licensing > plan suitable to be released within an ASF project? > Or, is the client as opposed to the main search project still actually ASL > 2.0? > Ref: https://github.com/elastic/elasticsearch/blob/v7.13.2/LICENSE.txt -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (NUTCH-2927) indexer-elastic: use Java API client
[ https://issues.apache.org/jira/browse/NUTCH-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695217#comment-17695217 ] Tim Allison edited comment on NUTCH-2927 at 3/1/23 5:26 PM: Over on NUTCH-2920 , I stumbled into the blocker that [BulkProcessor doesn't yet exist for this client|https://issues.apache.org/jira/browse/NUTCH-2920?focusedCommentId=17695148&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17695148] in OpenSearch. This is also the case for Elasticsearch: https://github.com/elastic/elasticsearch-java/issues/108 See the link on NUTCH-2920 for why this is important. It is. was (Author: talli...@mitre.org): Over on NUTCH-2920 , I stumbled into the blocker that [BulkProcessor doesn't yet exist for this client in OpenSearch|https://issues.apache.org/jira/browse/NUTCH-2920?focusedCommentId=17695148&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17695148]. This is also the case for Elasticsearch: https://github.com/elastic/elasticsearch-java/issues/108 See the link on NUTCH-2920 for why this is important. It is. > indexer-elastic: use Java API client > > > Key: NUTCH-2927 > URL: https://issues.apache.org/jira/browse/NUTCH-2927 > Project: Nutch > Issue Type: Improvement > Components: indexer, plugin >Affects Versions: 1.18 >Reporter: Sebastian Nagel >Priority: Major > Labels: help-wanted > Fix For: 1.20 > > > See Lewis comment in [PR > #713|https://github.com/apache/nutch/pull/703#issuecomment-1008159052] > (NUTCH-2903): "High Level REST Client was deprecated in ES 7.15.0 in favor of > the [Java API > Client|https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/index.html]"; -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NUTCH-2927) indexer-elastic: use Java API client
[ https://issues.apache.org/jira/browse/NUTCH-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695217#comment-17695217 ] Tim Allison commented on NUTCH-2927: Over on NUTCH-2920 , I stumbled into the blocker that [BulkProcessor doesn't yet exist for this client in OpenSearch|https://issues.apache.org/jira/browse/NUTCH-2920?focusedCommentId=17695148&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17695148]. This is also the case for Elasticsearch: https://github.com/elastic/elasticsearch-java/issues/108 See the link on NUTCH-2920 for why this is important. It is. > indexer-elastic: use Java API client > > > Key: NUTCH-2927 > URL: https://issues.apache.org/jira/browse/NUTCH-2927 > Project: Nutch > Issue Type: Improvement > Components: indexer, plugin >Affects Versions: 1.18 >Reporter: Sebastian Nagel >Priority: Major > Labels: help-wanted > Fix For: 1.20 > > > See Lewis comment in [PR > #713|https://github.com/apache/nutch/pull/703#issuecomment-1008159052] > (NUTCH-2903): "High Level REST Client was deprecated in ES 7.15.0 in favor of > the [Java API > Client|https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/index.html]"; -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin
[ https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695152#comment-17695152 ] Tim Allison commented on NUTCH-2920: Current proposal is to go with the high level rest client for 1.x for now and cheer on the successful completion of https://github.com/opensearch-project/opensearch-java/issues/181. > Implement a indexer-opensearch plugin > - > > Key: NUTCH-2920 > URL: https://issues.apache.org/jira/browse/NUTCH-2920 > Project: Nutch > Issue Type: New Feature > Components: plugin >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Major > Fix For: 1.20 > > > We will be moving to AWS-managed OpenSearch in the near term and I would like > to index our content there. > As of writing the OpenSearch project has published two plugin versions under > thw Apache License v2 so far > https://github.com/opensearch-project/opensearch-java/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin
[ https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695148#comment-17695148 ] Tim Allison commented on NUTCH-2920: Well, that was a funny notion... Turns out there is no BulkProcessor currently in the regular java-client (only exists in the high level java client) -- https://github.com/opensearch-project/opensearch-java/issues/181 So, we can make bulk requests with the basic java client, but we'd have to cache the bulk operations and have logic for when to run the operations. The BulkProcessor takes care of all of this and has triggers for when to send the bulk data (size or time) and has retry logic and some other useful things. This means that we'd have to reimplement that functionality, which I did on Tika ... and I don't want to do again. LOL... > Implement a indexer-opensearch plugin > - > > Key: NUTCH-2920 > URL: https://issues.apache.org/jira/browse/NUTCH-2920 > Project: Nutch > Issue Type: New Feature > Components: plugin >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Major > Fix For: 1.20 > > > We will be moving to AWS-managed OpenSearch in the near term and I would like > to index our content there. > As of writing the OpenSearch project has published two plugin versions under > thw Apache License v2 so far > https://github.com/opensearch-project/opensearch-java/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin
[ https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695096#comment-17695096 ] Tim Allison commented on NUTCH-2920: My initial PR was a simple copy+paste with a few modifications of the ElasticsearchIndexWriter. Part of that was to make review easier, and part of that was that I saw that the lower level java rest client was in beta and that OpenSearch was recommending still using the high-level rest client (https://opensearch.org/docs/1.2/clients/java/). In thinking about this more, I realize that this "beta" message was for 1.2. It is gone in 1.3 (https://opensearch.org/docs/1.3/clients/java/). Further, the high level rest client is deprecated in 2.x and will be removed in 3.x. I'm going to rework the PR to use the more modern client. This will make migrating to 2.x easier and hopefully require far fewer dependencies in 1.x? > Implement a indexer-opensearch plugin > - > > Key: NUTCH-2920 > URL: https://issues.apache.org/jira/browse/NUTCH-2920 > Project: Nutch > Issue Type: New Feature > Components: plugin >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Major > Fix For: 1.20 > > > We will be moving to AWS-managed OpenSearch in the near term and I would like > to index our content there. > As of writing the OpenSearch project has published two plugin versions under > thw Apache License v2 so far > https://github.com/opensearch-project/opensearch-java/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?
[ https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694949#comment-17694949 ] Sebastian Nagel commented on NUTCH-2988: Yes, this is a blocker for 1.20. There are at least two ways to address it (I'd opt for the second solution): - remove indexer-elastic from the binary package (NUTCH-2960) - use the Java API client (NUTCH-2927) > Elasticsearch 7.13.2 compatible with ASL 2.0? > - > > Key: NUTCH-2988 > URL: https://issues.apache.org/jira/browse/NUTCH-2988 > Project: Nutch > Issue Type: Task >Reporter: Tim Allison >Priority: Minor > Attachments: LICENSE.txt > > > In the latest release of at least the 1.x branch of Nutch, the elasticsearch > high level java client is at 7.13.2, which is after the great schism. Or, > the last purely ASL 2.0 license was in 7.10.2. > So, do we need to downgrade to 7.10.2 or is Elasticsearch's new licensing > plan suitable to be released within an ASF project? > Or, is the client as opposed to the main search project still actually ASL > 2.0? > Ref: https://github.com/elastic/elasticsearch/blob/v7.13.2/LICENSE.txt -- This message was sent by Atlassian Jira (v8.20.10#820010)