[jira] [Comment Edited] (BEAM-3199) Upgrade to Elasticsearch 6.x

2018-05-29 Thread Tim Robertson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494141#comment-16494141
 ] 

Tim Robertson edited comment on BEAM-3199 at 5/29/18 7:53 PM:
--

This is fabulous to see.  I've also been in the ES issues and a few comments.

* I see your comment about the FluentBackoff not being serializable.  I'd 
suggest copying the approach from [SorlIO 
here|https://github.com/apache/beam/blob/master/sdks/java/io/solr/src/main/java/org/apache/beam/sdk/io/solr/SolrIO.java#L778]
 which is consistent with JdbcIO.
* Repeating [~echauchot] but can we make sure that the dynamic routing for 
index and document ID are included please as they are necessary for updates 
(upserts to be precise)? I know type is being dropped in ES so that can go 
[BEAM-3201] 
* Partial update support [is about to be merged | 
https://github.com/apache/beam/pull/5463] to fix [BEAM-4389] as well and is 
something I know one team rely on already
* We might want to consider the discussion on SolrIO versions 6&7 [BEAM-3947] 
when considering packaging (single/multiple modules) so we are consistent.

I'll be happy to help out of course - and thanks for sharing this.



was (Author: timrobertson100):
This is fabulous to see.  I've also been in the ES issues and a few comments.

* I see your comment about the FluentBackoff not being serializable.  I'd 
suggest copying the approach from [SorlIO 
here|https://github.com/apache/beam/blob/master/sdks/java/io/solr/src/main/java/org/apache/beam/sdk/io/solr/SolrIO.java#L778]
 which is consistent with JdbcIO.
* Repeating [~echauchot] but can we make sure that the dynamic routing for 
index and document ID are included please as they are necessary for updates 
(upserts to be precise)? I know type is being dropped in ES so that can go 
[BEAM-3201] 
* Partial update support [is about to be merged | 
https://github.com/apache/beam/pull/5463] to fix [BEAM-4389] as well and is 
something I know one team rely on already
* We might want to consider SolrIO v6&7 discussion [BEAM-3947] when considering 
packaging as one or several modules so we are consistent.

I'll be happy to help out of course - and thanks for sharing this.


> Upgrade to Elasticsearch 6.x
> 
>
> Key: BEAM-3199
> URL: https://issues.apache.org/jira/browse/BEAM-3199
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Jean-Baptiste Onofré
>Assignee: Jeroen Steggink
>Priority: Major
>
> Elasticsearch 6.x is now GA. As it's fully compatible with Elasticsearch 5.x, 
> it makes sense to upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-3199) Upgrade to Elasticsearch 6.x

2018-05-29 Thread Tim Robertson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494141#comment-16494141
 ] 

Tim Robertson edited comment on BEAM-3199 at 5/29/18 7:52 PM:
--

This is fabulous to see.  I've also been in the ES issues and a few comments.

* I see your comment about the FluentBackoff not being serializable.  I'd 
suggest copying the approach from [SorlIO 
here|https://github.com/apache/beam/blob/master/sdks/java/io/solr/src/main/java/org/apache/beam/sdk/io/solr/SolrIO.java#L778]
 which is consistent with JdbcIO.
* Repeating [~echauchot] but can we make sure that the dynamic routing for 
index and document ID are included please as they are necessary for updates 
(upserts to be precise)? I know type is being dropped in ES so that can go 
[BEAM-3201] 
* Partial update support [is about to be merged | 
https://github.com/apache/beam/pull/5463] to fix [BEAM-4389] as well and is 
something I know one team rely on already
* We might want to consider SolrIO v6&7 discussion [BEAM-3947] when considering 
packaging as one or several modules so we are consistent.

I'll be happy to help out of course - and thanks for sharing this.



was (Author: timrobertson100):
This is fabulous to see.  I've also been in the ES issues and a few comments.

* I see your comment about the FluentBackoff not being serializable.  I'd 
suggest copying the approach from [SorlIO 
here|https://github.com/apache/beam/blob/master/sdks/java/io/solr/src/main/java/org/apache/beam/sdk/io/solr/SolrIO.java#L778]
 which is consistent with JdbcIO.
* Repeating [~echauchot] but can we make sure that the dynamic routing for 
index and document ID are included please as they are necessary for updates 
(upserts to be precise)? I know type is being dropped in ES so that can go 
[BEAM-3201] 
* Partial update support [is about to be merged | 
https://github.com/apache/beam/pull/5463] to fix [BEAM-4389] as well and is 
something I know one team rely on already

I'll be happy to help out of course.


> Upgrade to Elasticsearch 6.x
> 
>
> Key: BEAM-3199
> URL: https://issues.apache.org/jira/browse/BEAM-3199
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Jean-Baptiste Onofré
>Assignee: Jeroen Steggink
>Priority: Major
>
> Elasticsearch 6.x is now GA. As it's fully compatible with Elasticsearch 5.x, 
> it makes sense to upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-3199) Upgrade to Elasticsearch 6.x

2018-05-29 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493531#comment-16493531
 ] 

Ismaël Mejía edited comment on BEAM-3199 at 5/29/18 1:37 PM:
-

Please sync with [~echauchot] on this. to maximize reuse between the modules + 
equality API-wise.

For the IT I assigned you to BEAM-4398 (please unassign if you don't plan to 
work on this). Please try to reuse the existing infra as much as possible, 
worth taking a look at:
https://beam.apache.org/documentation/io/testing/#i-o-transform-integration-tests

And discussing if doubts on slack with [~ŁukaszG]



was (Author: iemejia):
Please sync with [~echauchot] on this. to maximize reuse between the modules + 
equality API-wise.

For the IT feel free to reassign yourself BEAM-4398 however as much as possible 
try to reuse the existing infra, worth taking a look at:
https://beam.apache.org/documentation/io/testing/#i-o-transform-integration-tests

And discussing if doubts on slack with [~ŁukaszG]


> Upgrade to Elasticsearch 6.x
> 
>
> Key: BEAM-3199
> URL: https://issues.apache.org/jira/browse/BEAM-3199
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Jean-Baptiste Onofré
>Assignee: Jeroen Steggink
>Priority: Major
>
> Elasticsearch 6.x is now GA. As it's fully compatible with Elasticsearch 5.x, 
> it makes sense to upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-3199) Upgrade to Elasticsearch 6.x

2018-05-29 Thread Jeroen Steggink (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493494#comment-16493494
 ] 

Jeroen Steggink edited comment on BEAM-3199 at 5/29/18 12:45 PM:
-

I have been working on a specific IO for Elasticsearch 6.x. It can be found 
here:
 [https://github.com/jsteggink/beam/tree/BEAM-3199]

Since people would still use the old ES versions (2.x and 5.x), it's a separate 
Maven module (elasticsearch-6). Furthermore, it tries to use the 
RestHighLevelClient where it can. This means a lot can be abstracted and 
optimizations can be done by ES. There are no more strings containing json, but 
uses ES objects for both Read and Write.

I'm still working on the Read parts and the integration tests. The integration 
tests require a lot of refactoring, since the use of ESIntegTestCase with the 
new RestHighLevelClient is not ideal. I would rather just do my own integration 
tests based on ElasticsearchIO and a live ES cluster using Docker or with 
Kubernetes.

Any help and review is welcome!

P.S. Thanks for my colleagues Fokko and Vincent for the first review of the 
Write part! 


was (Author: jeroens):
I have been working on a specific IO for Elasticsearch 6.x. It can be found 
here:
[https://github.com/jsteggink/beam/tree/BEAM-3199]

Since people would still use the old ES versions (2.x and 5.x), it's a separate 
Maven module (elasticsearch-6). Furthermore, it tries to use the 
RestHighLevelClient where it can. This means a lot can be abstracted and 
optimizations can be done by ES. There are no more strings containing json, but 
uses ES objects for both Read and Write.

I'm still working on the Read parts and the integration tests. The integration 
tests require a lot of refactoring, since the use of ESIntegTestCase with the 
new RestHighLevelClient is not ideal. I would rather just do my own integration 
tests based on ElasticsearchIO and a live ES cluster using Docker or with 
Kubernetes.

Any help and review is welcome!

> Upgrade to Elasticsearch 6.x
> 
>
> Key: BEAM-3199
> URL: https://issues.apache.org/jira/browse/BEAM-3199
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>
> Elasticsearch 6.x is now GA. As it's fully compatible with Elasticsearch 5.x, 
> it makes sense to upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)