[jira] [Created] (NUTCH-2989) Can't have username/pw AND https on elastic-indexer?!

2023-03-01 Thread Tim Allison (Jira)
Tim Allison created NUTCH-2989:
--

 Summary: Can't have username/pw AND https on elastic-indexer?!
 Key: NUTCH-2989
 URL: https://issues.apache.org/jira/browse/NUTCH-2989
 Project: Nutch
  Issue Type: Task
Reporter: Tim Allison


While working on NUTCH-2920, I copied+pasted the elastic indexer.  As part of 
that process, I noticed that basic auth doesn't work with https.


{code:java}
if (auth) {
restClientBuilder
.setHttpClientConfigCallback(new HttpClientConfigCallback() {
  @Override
  public HttpAsyncClientBuilder customizeHttpClient(
  HttpAsyncClientBuilder arg0) {
return arg0.setDefaultCredentialsProvider(credentialsProvider);
  }
});
  }

  // In case of HTTPS, set the client up for ignoring problems with 
self-signed
  // certificates and stuff
  if ("https".equals(scheme)) {
try {
  SSLContextBuilder sslBuilder = SSLContexts.custom();
  sslBuilder.loadTrustMaterial(null, new TrustSelfSignedStrategy());
  final SSLContext sslContext = sslBuilder.build();

  restClientBuilder.setHttpClientConfigCallback(new 
HttpClientConfigCallback() {
@Override
public HttpAsyncClientBuilder 
customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) {
  // ignore issues with self-signed certificates
  
httpClientBuilder.setSSLHostnameVerifier(NoopHostnameVerifier.INSTANCE);
  return httpClientBuilder.setSSLContext(sslContext);
}
  });
} catch (Exception e) {
  LOG.error("Error setting up SSLContext because: " + e.getMessage(), 
e);
}
  }
{code}

On NUTCH-2920, I fixed this for the opensearch-indexer by adding another {{if 
(auth)}} statement under the {{https}} branch.

If this is an actual issue, I'm happy to open a PR.  If I've misunderstood the 
code or the design, please close as "not a problem".




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695267#comment-17695267
 ] 

ASF GitHub Bot commented on NUTCH-2920:
---

tballison commented on PR #761:
URL: https://github.com/apache/nutch/pull/761#issuecomment-1450693830

   K, I think this is ready for review.  I'm happy for any and all input!




> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [nutch] tballison commented on pull request #761: NUTCH-2920 -- add an OpenSearchIndexWriter

2023-03-01 Thread via GitHub


tballison commented on PR #761:
URL: https://github.com/apache/nutch/pull/761#issuecomment-1450693830

   K, I think this is ready for review.  I'm happy for any and all input!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?

2023-03-01 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved NUTCH-2988.

Resolution: Duplicate

Duplicate.  Sorry!

> Elasticsearch 7.13.2 compatible with ASL 2.0?
> -
>
> Key: NUTCH-2988
> URL: https://issues.apache.org/jira/browse/NUTCH-2988
> Project: Nutch
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Minor
> Attachments: LICENSE.txt
>
>
> In the latest release of at least the 1.x branch of Nutch, the elasticsearch 
> high level java client is at 7.13.2, which is after the great schism.  Or, 
> the last purely ASL 2.0 license was in 7.10.2.
> So, do we need to downgrade to 7.10.2 or is Elasticsearch's new licensing 
> plan suitable to be released within an ASF project?
> Or, is the client as opposed to the main search project still actually ASL 
> 2.0?
> Ref: https://github.com/elastic/elasticsearch/blob/v7.13.2/LICENSE.txt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (NUTCH-2927) indexer-elastic: use Java API client

2023-03-01 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695217#comment-17695217
 ] 

Tim Allison edited comment on NUTCH-2927 at 3/1/23 5:26 PM:


Over on NUTCH-2920 , I stumbled into the blocker that [BulkProcessor doesn't 
yet exist for this 
client|https://issues.apache.org/jira/browse/NUTCH-2920?focusedCommentId=17695148&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17695148]
 in OpenSearch.  This is also the case for Elasticsearch: 
https://github.com/elastic/elasticsearch-java/issues/108

See the link on NUTCH-2920 for why this is important.  It is. 


was (Author: talli...@mitre.org):
Over on NUTCH-2920 , I stumbled into the blocker that [BulkProcessor doesn't 
yet exist for this client in 
OpenSearch|https://issues.apache.org/jira/browse/NUTCH-2920?focusedCommentId=17695148&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17695148].
  This is also the case for Elasticsearch: 
https://github.com/elastic/elasticsearch-java/issues/108

See the link on NUTCH-2920 for why this is important.  It is. 

> indexer-elastic: use Java API client
> 
>
> Key: NUTCH-2927
> URL: https://issues.apache.org/jira/browse/NUTCH-2927
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer, plugin
>Affects Versions: 1.18
>Reporter: Sebastian Nagel
>Priority: Major
>  Labels: help-wanted
> Fix For: 1.20
>
>
> See Lewis comment in [PR 
> #713|https://github.com/apache/nutch/pull/703#issuecomment-1008159052] 
> (NUTCH-2903): "High Level REST Client was deprecated in ES 7.15.0 in favor of 
> the [Java API 
> Client|https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/index.html]";



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2927) indexer-elastic: use Java API client

2023-03-01 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695217#comment-17695217
 ] 

Tim Allison commented on NUTCH-2927:


Over on NUTCH-2920 , I stumbled into the blocker that [BulkProcessor doesn't 
yet exist for this client in 
OpenSearch|https://issues.apache.org/jira/browse/NUTCH-2920?focusedCommentId=17695148&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17695148].
  This is also the case for Elasticsearch: 
https://github.com/elastic/elasticsearch-java/issues/108

See the link on NUTCH-2920 for why this is important.  It is. 

> indexer-elastic: use Java API client
> 
>
> Key: NUTCH-2927
> URL: https://issues.apache.org/jira/browse/NUTCH-2927
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer, plugin
>Affects Versions: 1.18
>Reporter: Sebastian Nagel
>Priority: Major
>  Labels: help-wanted
> Fix For: 1.20
>
>
> See Lewis comment in [PR 
> #713|https://github.com/apache/nutch/pull/703#issuecomment-1008159052] 
> (NUTCH-2903): "High Level REST Client was deprecated in ES 7.15.0 in favor of 
> the [Java API 
> Client|https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/index.html]";



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-01 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695152#comment-17695152
 ] 

Tim Allison commented on NUTCH-2920:


Current proposal is to go with the high level rest client for 1.x for now and 
cheer on the successful completion of 
https://github.com/opensearch-project/opensearch-java/issues/181.

> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-01 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695148#comment-17695148
 ] 

Tim Allison commented on NUTCH-2920:


Well, that was a funny notion...

Turns out there is no BulkProcessor currently in the regular java-client (only 
exists in the high level java client) -- 
https://github.com/opensearch-project/opensearch-java/issues/181

So, we can make bulk requests with the basic java client, but we'd have to 
cache the bulk operations and have logic for when to run the operations.

The BulkProcessor takes care of all of this and has triggers for when to send 
the bulk data (size or time) and has retry logic and some other useful things.

This means that we'd have to reimplement that functionality, which I did on 
Tika ... and I don't want to do again. LOL...

> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-01 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695096#comment-17695096
 ] 

Tim Allison commented on NUTCH-2920:


My initial PR was a simple copy+paste with a few modifications of the 
ElasticsearchIndexWriter.  Part of that was to make review easier, and part of 
that was that I saw that the lower level java rest client was in beta and that 
OpenSearch was recommending still using the high-level rest client 
(https://opensearch.org/docs/1.2/clients/java/). 

In thinking about this more, I realize that this "beta" message was for 1.2.  
It is gone in 1.3 (https://opensearch.org/docs/1.3/clients/java/). Further, the 
high level rest client is deprecated in 2.x and will be removed in 3.x.

I'm going to rework the PR to use the more modern client.  This will make 
migrating to 2.x easier and hopefully require far fewer dependencies in 1.x?

> Implement a indexer-opensearch plugin
> -
>
> Key: NUTCH-2920
> URL: https://issues.apache.org/jira/browse/NUTCH-2920
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We will be moving to AWS-managed OpenSearch in the near term and I would like 
> to index our content there.
> As of writing the OpenSearch project has published two plugin versions under 
> thw Apache License v2 so far
> https://github.com/opensearch-project/opensearch-java/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?

2023-03-01 Thread Sebastian Nagel (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694949#comment-17694949
 ] 

Sebastian Nagel commented on NUTCH-2988:


Yes, this is a blocker for 1.20. There are at least two ways to address it (I'd 
opt for the second solution):
- remove indexer-elastic from the binary package (NUTCH-2960)
- use the Java API client (NUTCH-2927)

> Elasticsearch 7.13.2 compatible with ASL 2.0?
> -
>
> Key: NUTCH-2988
> URL: https://issues.apache.org/jira/browse/NUTCH-2988
> Project: Nutch
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Minor
> Attachments: LICENSE.txt
>
>
> In the latest release of at least the 1.x branch of Nutch, the elasticsearch 
> high level java client is at 7.13.2, which is after the great schism.  Or, 
> the last purely ASL 2.0 license was in 7.10.2.
> So, do we need to downgrade to 7.10.2 or is Elasticsearch's new licensing 
> plan suitable to be released within an ASF project?
> Or, is the client as opposed to the main search project still actually ASL 
> 2.0?
> Ref: https://github.com/elastic/elasticsearch/blob/v7.13.2/LICENSE.txt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)