[jira] [Commented] (BEAM-3199) Upgrade to Elasticsearch 6.x

2018-08-09 Thread Jeroen Steggink (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574451#comment-16574451
 ] 

Jeroen Steggink commented on BEAM-3199:
---

Hi [~timrobertson100],

Yes, we are using it in production successfully. However, it's undergoing some 
change. We are moving away from BoundedSource, just like SolrIO did. It makes 
it simpler and makes more sense.

I'll try and commit something soon.

Jeroen

> Upgrade to Elasticsearch 6.x
> 
>
> Key: BEAM-3199
> URL: https://issues.apache.org/jira/browse/BEAM-3199
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Jean-Baptiste Onofré
>Assignee: Jeroen Steggink
>Priority: Major
>
> Elasticsearch 6.x is now GA. As it's fully compatible with Elasticsearch 5.x, 
> it makes sense to upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-1240) Create RabbitMqIO

2018-08-02 Thread Jeroen Steggink (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566869#comment-16566869
 ] 

Jeroen Steggink commented on BEAM-1240:
---

Furthermore, after some testing, I see the getWatermark() method can return 
null. When running with the direct-runner, I get an NPE because the first time 
oldestTimeStamp is null. Maybe you ca add a null check and if null set it to 
Instant.now()?

[https://github.com/jbonofre/beam/blob/f1665f47ff10679a2a54c1879fce6d77151fa90a/sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java#L385]

Testing it with the flink local runner, I never see it calling the 
finalizeCheckpoint method in RabbitMQCheckpointMark. Which means, the RabbitMq 
messages are never acknowledged. Have you tried it? 

> Create RabbitMqIO
> -
>
> Key: BEAM-1240
> URL: https://issues.apache.org/jira/browse/BEAM-1240
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 2.6.0
>
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-1240) Create RabbitMqIO

2018-07-26 Thread Jeroen Steggink (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558231#comment-16558231
 ] 

Jeroen Steggink commented on BEAM-1240:
---

Is there a specific reason for using version 4.6.0 instead of the newer 5.3.0? 
I'm asking, because the QueingConsumer is deprecated since version 3 and is 
removed in version 5.

> Create RabbitMqIO
> -
>
> Key: BEAM-1240
> URL: https://issues.apache.org/jira/browse/BEAM-1240
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 2.6.0
>
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-1240) Create RabbitMqIO

2018-07-26 Thread Jeroen Steggink (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558231#comment-16558231
 ] 

Jeroen Steggink edited comment on BEAM-1240 at 7/26/18 12:06 PM:
-

Is there a specific reason for using version 4.6.0 of the RabbitMq amqp-client 
instead of the newer 5.3.0? I'm asking, because the QueingConsumer is 
deprecated since version 3 and is removed in version 5.


was (Author: jeroens):
Is there a specific reason for using version 4.6.0 of the RabbitMq ampq-client 
instead of the newer 5.3.0? I'm asking, because the QueingConsumer is 
deprecated since version 3 and is removed in version 5.

> Create RabbitMqIO
> -
>
> Key: BEAM-1240
> URL: https://issues.apache.org/jira/browse/BEAM-1240
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 2.6.0
>
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-1240) Create RabbitMqIO

2018-07-26 Thread Jeroen Steggink (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558231#comment-16558231
 ] 

Jeroen Steggink edited comment on BEAM-1240 at 7/26/18 12:06 PM:
-

Is there a specific reason for using version 4.6.0 of the RabbitMq ampq-client 
instead of the newer 5.3.0? I'm asking, because the QueingConsumer is 
deprecated since version 3 and is removed in version 5.


was (Author: jeroens):
Is there a specific reason for using version 4.6.0 instead of the newer 5.3.0? 
I'm asking, because the QueingConsumer is deprecated since version 3 and is 
removed in version 5.

> Create RabbitMqIO
> -
>
> Key: BEAM-1240
> URL: https://issues.apache.org/jira/browse/BEAM-1240
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 2.6.0
>
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4398) Change ElasticsearchIOITs to write-then-read Performance Tests

2018-06-07 Thread Jeroen Steggink (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504634#comment-16504634
 ] 

Jeroen Steggink commented on BEAM-4398:
---

This issue is assigned to me, however, I'm only working on the new version of 
Elasticsearch v6, not the old v2 and v5.

> Change ElasticsearchIOITs to write-then-read Performance Tests
> --
>
> Key: BEAM-4398
> URL: https://issues.apache.org/jira/browse/BEAM-4398
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Łukasz Gajowy
>Assignee: Jeroen Steggink
>Priority: Minor
>
> Elasticesarch IOITs are different than other IOITs (such as JdbcIOIT or 
> MongodbIOIT) and do not fulfil the rules described in the documentation: 
> [https://beam.apache.org/documentation/io/testing/#i-o-transform-integration-tests].
>  
> We should make it coherent with other tests, more specifically: 
>  - write them in writeThenReadAll style
>  - enable running them with Perfkit
>  - provide Jenkins jobs to run them periodically



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3199) Upgrade to Elasticsearch 6.x

2018-05-29 Thread Jeroen Steggink (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494201#comment-16494201
 ] 

Jeroen Steggink commented on BEAM-3199:
---

Thanks for all the info! 

[~timrobertson100], I have made type optional and default is set to _doc. Also 
id and index can be added for each insert, update and delete request.

I'll update this issue when I have significant changes or any questions. Cheers 
guys!

> Upgrade to Elasticsearch 6.x
> 
>
> Key: BEAM-3199
> URL: https://issues.apache.org/jira/browse/BEAM-3199
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Jean-Baptiste Onofré
>Assignee: Jeroen Steggink
>Priority: Major
>
> Elasticsearch 6.x is now GA. As it's fully compatible with Elasticsearch 5.x, 
> it makes sense to upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-3199) Upgrade to Elasticsearch 6.x

2018-05-29 Thread Jeroen Steggink (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493494#comment-16493494
 ] 

Jeroen Steggink edited comment on BEAM-3199 at 5/29/18 12:45 PM:
-

I have been working on a specific IO for Elasticsearch 6.x. It can be found 
here:
 [https://github.com/jsteggink/beam/tree/BEAM-3199]

Since people would still use the old ES versions (2.x and 5.x), it's a separate 
Maven module (elasticsearch-6). Furthermore, it tries to use the 
RestHighLevelClient where it can. This means a lot can be abstracted and 
optimizations can be done by ES. There are no more strings containing json, but 
uses ES objects for both Read and Write.

I'm still working on the Read parts and the integration tests. The integration 
tests require a lot of refactoring, since the use of ESIntegTestCase with the 
new RestHighLevelClient is not ideal. I would rather just do my own integration 
tests based on ElasticsearchIO and a live ES cluster using Docker or with 
Kubernetes.

Any help and review is welcome!

P.S. Thanks for my colleagues Fokko and Vincent for the first review of the 
Write part! 


was (Author: jeroens):
I have been working on a specific IO for Elasticsearch 6.x. It can be found 
here:
[https://github.com/jsteggink/beam/tree/BEAM-3199]

Since people would still use the old ES versions (2.x and 5.x), it's a separate 
Maven module (elasticsearch-6). Furthermore, it tries to use the 
RestHighLevelClient where it can. This means a lot can be abstracted and 
optimizations can be done by ES. There are no more strings containing json, but 
uses ES objects for both Read and Write.

I'm still working on the Read parts and the integration tests. The integration 
tests require a lot of refactoring, since the use of ESIntegTestCase with the 
new RestHighLevelClient is not ideal. I would rather just do my own integration 
tests based on ElasticsearchIO and a live ES cluster using Docker or with 
Kubernetes.

Any help and review is welcome!

> Upgrade to Elasticsearch 6.x
> 
>
> Key: BEAM-3199
> URL: https://issues.apache.org/jira/browse/BEAM-3199
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>
> Elasticsearch 6.x is now GA. As it's fully compatible with Elasticsearch 5.x, 
> it makes sense to upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3199) Upgrade to Elasticsearch 6.x

2018-05-29 Thread Jeroen Steggink (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493494#comment-16493494
 ] 

Jeroen Steggink commented on BEAM-3199:
---

I have been working on a specific IO for Elasticsearch 6.x. It can be found 
here:
[https://github.com/jsteggink/beam/tree/BEAM-3199]

Since people would still use the old ES versions (2.x and 5.x), it's a separate 
Maven module (elasticsearch-6). Furthermore, it tries to use the 
RestHighLevelClient where it can. This means a lot can be abstracted and 
optimizations can be done by ES. There are no more strings containing json, but 
uses ES objects for both Read and Write.

I'm still working on the Read parts and the integration tests. The integration 
tests require a lot of refactoring, since the use of ESIntegTestCase with the 
new RestHighLevelClient is not ideal. I would rather just do my own integration 
tests based on ElasticsearchIO and a live ES cluster using Docker or with 
Kubernetes.

Any help and review is welcome!

> Upgrade to Elasticsearch 6.x
> 
>
> Key: BEAM-3199
> URL: https://issues.apache.org/jira/browse/BEAM-3199
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>
> Elasticsearch 6.x is now GA. As it's fully compatible with Elasticsearch 5.x, 
> it makes sense to upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3201) ElasticsearchIO should allow the user to optionally pass id, type and index per document

2018-02-21 Thread Jeroen Steggink (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371179#comment-16371179
 ] 

Jeroen Steggink commented on BEAM-3201:
---

Sorry, to push this again. I really need this :) [~chet.aldrich], do you have 
an ETA? Otherwise I might build this myself.

> ElasticsearchIO should allow the user to optionally pass id, type and index 
> per document
> 
>
> Key: BEAM-3201
> URL: https://issues.apache.org/jira/browse/BEAM-3201
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Etienne Chauchot
>Assignee: Chet Aldrich
>Priority: Major
>
> *Dynamic documents id*: Today the ESIO only inserts the payload of the ES 
> documents. Elasticsearch generates a document id for each record inserted. So 
> each new insertion is considered as a new document. Users want to be able to 
> update documents using the IO. So, for the write part of the IO, users should 
> be able to provide a document id so that they could update already stored 
> documents. Providing an id for the documents could also help the user on 
> indempotency.
> *Dynamic ES type and ES index*: In some cases (streaming pipeline with high 
> throughput) partitioning the PCollection to allow to plug to different ESIO 
> instances (pointing to different index/type) is not very practical, the users 
> would like to be able to set ES index/type per document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3201) ElasticsearchIO should allow the user to optionally pass id, type and index per document

2018-01-16 Thread Jeroen Steggink (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327160#comment-16327160
 ] 

Jeroen Steggink commented on BEAM-3201:
---

Hi, what's the status for this issue? It's quite an important feature to be 
able to overwrite existing documents in the index. Can I be of any help? Code 
review?

> ElasticsearchIO should allow the user to optionally pass id, type and index 
> per document
> 
>
> Key: BEAM-3201
> URL: https://issues.apache.org/jira/browse/BEAM-3201
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Etienne Chauchot
>Assignee: Chet Aldrich
>Priority: Major
>
> *Dynamic documents id*: Today the ESIO only inserts the payload of the ES 
> documents. Elasticsearch generates a document id for each record inserted. So 
> each new insertion is considered as a new document. Users want to be able to 
> update documents using the IO. So, for the write part of the IO, users should 
> be able to provide a document id so that they could update already stored 
> documents. Providing an id for the documents could also help the user on 
> indempotency.
> *Dynamic ES type and ES index*: In some cases (streaming pipeline with high 
> throughput) partitioning the PCollection to allow to plug to different ESIO 
> instances (pointing to different index/type) is not very practical, the users 
> would like to be able to set ES index/type per document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3199) Upgrade to Elasticsearch 6.x

2017-12-11 Thread Jeroen Steggink (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286578#comment-16286578
 ] 

Jeroen Steggink commented on BEAM-3199:
---

Upgrading to Elasticsearch 6.x is not trivial. I would recommend splitting the 
Maven artifacts to have seperate for Elasticsearch 2.x - 5.x and Elasticsearch 
6.x, just like the testing modules are split now. Another caveat is the 
elasticsearch-tests-common artifact. It's not compatible with version 6.x since 
some of the queries were deprecated in version 5.x and removed in 6.x.

> Upgrade to Elasticsearch 6.x
> 
>
> Key: BEAM-3199
> URL: https://issues.apache.org/jira/browse/BEAM-3199
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>
> Elasticsearch 6.x is now GA. As it's fully compatible with Elasticsearch 5.x, 
> it makes sense to upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)