[jira] [Commented] (BEAM-3199) Upgrade to Elasticsearch 6.x
[ https://issues.apache.org/jira/browse/BEAM-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574451#comment-16574451 ] Jeroen Steggink commented on BEAM-3199: --- Hi [~timrobertson100], Yes, we are using it in production successfully. However, it's undergoing some change. We are moving away from BoundedSource, just like SolrIO did. It makes it simpler and makes more sense. I'll try and commit something soon. Jeroen > Upgrade to Elasticsearch 6.x > > > Key: BEAM-3199 > URL: https://issues.apache.org/jira/browse/BEAM-3199 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Reporter: Jean-Baptiste Onofré >Assignee: Jeroen Steggink >Priority: Major > > Elasticsearch 6.x is now GA. As it's fully compatible with Elasticsearch 5.x, > it makes sense to upgrade. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-1240) Create RabbitMqIO
[ https://issues.apache.org/jira/browse/BEAM-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566869#comment-16566869 ] Jeroen Steggink commented on BEAM-1240: --- Furthermore, after some testing, I see the getWatermark() method can return null. When running with the direct-runner, I get an NPE because the first time oldestTimeStamp is null. Maybe you ca add a null check and if null set it to Instant.now()? [https://github.com/jbonofre/beam/blob/f1665f47ff10679a2a54c1879fce6d77151fa90a/sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java#L385] Testing it with the flink local runner, I never see it calling the finalizeCheckpoint method in RabbitMQCheckpointMark. Which means, the RabbitMq messages are never acknowledged. Have you tried it? > Create RabbitMqIO > - > > Key: BEAM-1240 > URL: https://issues.apache.org/jira/browse/BEAM-1240 > Project: Beam > Issue Type: New Feature > Components: io-ideas >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré >Priority: Major > Fix For: 2.6.0 > > Time Spent: 11h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-1240) Create RabbitMqIO
[ https://issues.apache.org/jira/browse/BEAM-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558231#comment-16558231 ] Jeroen Steggink commented on BEAM-1240: --- Is there a specific reason for using version 4.6.0 instead of the newer 5.3.0? I'm asking, because the QueingConsumer is deprecated since version 3 and is removed in version 5. > Create RabbitMqIO > - > > Key: BEAM-1240 > URL: https://issues.apache.org/jira/browse/BEAM-1240 > Project: Beam > Issue Type: New Feature > Components: io-ideas >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré >Priority: Major > Fix For: 2.6.0 > > Time Spent: 11h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-1240) Create RabbitMqIO
[ https://issues.apache.org/jira/browse/BEAM-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558231#comment-16558231 ] Jeroen Steggink edited comment on BEAM-1240 at 7/26/18 12:06 PM: - Is there a specific reason for using version 4.6.0 of the RabbitMq amqp-client instead of the newer 5.3.0? I'm asking, because the QueingConsumer is deprecated since version 3 and is removed in version 5. was (Author: jeroens): Is there a specific reason for using version 4.6.0 of the RabbitMq ampq-client instead of the newer 5.3.0? I'm asking, because the QueingConsumer is deprecated since version 3 and is removed in version 5. > Create RabbitMqIO > - > > Key: BEAM-1240 > URL: https://issues.apache.org/jira/browse/BEAM-1240 > Project: Beam > Issue Type: New Feature > Components: io-ideas >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré >Priority: Major > Fix For: 2.6.0 > > Time Spent: 11h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-1240) Create RabbitMqIO
[ https://issues.apache.org/jira/browse/BEAM-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558231#comment-16558231 ] Jeroen Steggink edited comment on BEAM-1240 at 7/26/18 12:06 PM: - Is there a specific reason for using version 4.6.0 of the RabbitMq ampq-client instead of the newer 5.3.0? I'm asking, because the QueingConsumer is deprecated since version 3 and is removed in version 5. was (Author: jeroens): Is there a specific reason for using version 4.6.0 instead of the newer 5.3.0? I'm asking, because the QueingConsumer is deprecated since version 3 and is removed in version 5. > Create RabbitMqIO > - > > Key: BEAM-1240 > URL: https://issues.apache.org/jira/browse/BEAM-1240 > Project: Beam > Issue Type: New Feature > Components: io-ideas >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré >Priority: Major > Fix For: 2.6.0 > > Time Spent: 11h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4398) Change ElasticsearchIOITs to write-then-read Performance Tests
[ https://issues.apache.org/jira/browse/BEAM-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504634#comment-16504634 ] Jeroen Steggink commented on BEAM-4398: --- This issue is assigned to me, however, I'm only working on the new version of Elasticsearch v6, not the old v2 and v5. > Change ElasticsearchIOITs to write-then-read Performance Tests > -- > > Key: BEAM-4398 > URL: https://issues.apache.org/jira/browse/BEAM-4398 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Łukasz Gajowy >Assignee: Jeroen Steggink >Priority: Minor > > Elasticesarch IOITs are different than other IOITs (such as JdbcIOIT or > MongodbIOIT) and do not fulfil the rules described in the documentation: > [https://beam.apache.org/documentation/io/testing/#i-o-transform-integration-tests]. > > We should make it coherent with other tests, more specifically: > - write them in writeThenReadAll style > - enable running them with Perfkit > - provide Jenkins jobs to run them periodically -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3199) Upgrade to Elasticsearch 6.x
[ https://issues.apache.org/jira/browse/BEAM-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494201#comment-16494201 ] Jeroen Steggink commented on BEAM-3199: --- Thanks for all the info! [~timrobertson100], I have made type optional and default is set to _doc. Also id and index can be added for each insert, update and delete request. I'll update this issue when I have significant changes or any questions. Cheers guys! > Upgrade to Elasticsearch 6.x > > > Key: BEAM-3199 > URL: https://issues.apache.org/jira/browse/BEAM-3199 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Reporter: Jean-Baptiste Onofré >Assignee: Jeroen Steggink >Priority: Major > > Elasticsearch 6.x is now GA. As it's fully compatible with Elasticsearch 5.x, > it makes sense to upgrade. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-3199) Upgrade to Elasticsearch 6.x
[ https://issues.apache.org/jira/browse/BEAM-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493494#comment-16493494 ] Jeroen Steggink edited comment on BEAM-3199 at 5/29/18 12:45 PM: - I have been working on a specific IO for Elasticsearch 6.x. It can be found here: [https://github.com/jsteggink/beam/tree/BEAM-3199] Since people would still use the old ES versions (2.x and 5.x), it's a separate Maven module (elasticsearch-6). Furthermore, it tries to use the RestHighLevelClient where it can. This means a lot can be abstracted and optimizations can be done by ES. There are no more strings containing json, but uses ES objects for both Read and Write. I'm still working on the Read parts and the integration tests. The integration tests require a lot of refactoring, since the use of ESIntegTestCase with the new RestHighLevelClient is not ideal. I would rather just do my own integration tests based on ElasticsearchIO and a live ES cluster using Docker or with Kubernetes. Any help and review is welcome! P.S. Thanks for my colleagues Fokko and Vincent for the first review of the Write part! was (Author: jeroens): I have been working on a specific IO for Elasticsearch 6.x. It can be found here: [https://github.com/jsteggink/beam/tree/BEAM-3199] Since people would still use the old ES versions (2.x and 5.x), it's a separate Maven module (elasticsearch-6). Furthermore, it tries to use the RestHighLevelClient where it can. This means a lot can be abstracted and optimizations can be done by ES. There are no more strings containing json, but uses ES objects for both Read and Write. I'm still working on the Read parts and the integration tests. The integration tests require a lot of refactoring, since the use of ESIntegTestCase with the new RestHighLevelClient is not ideal. I would rather just do my own integration tests based on ElasticsearchIO and a live ES cluster using Docker or with Kubernetes. Any help and review is welcome! > Upgrade to Elasticsearch 6.x > > > Key: BEAM-3199 > URL: https://issues.apache.org/jira/browse/BEAM-3199 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré >Priority: Major > > Elasticsearch 6.x is now GA. As it's fully compatible with Elasticsearch 5.x, > it makes sense to upgrade. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3199) Upgrade to Elasticsearch 6.x
[ https://issues.apache.org/jira/browse/BEAM-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493494#comment-16493494 ] Jeroen Steggink commented on BEAM-3199: --- I have been working on a specific IO for Elasticsearch 6.x. It can be found here: [https://github.com/jsteggink/beam/tree/BEAM-3199] Since people would still use the old ES versions (2.x and 5.x), it's a separate Maven module (elasticsearch-6). Furthermore, it tries to use the RestHighLevelClient where it can. This means a lot can be abstracted and optimizations can be done by ES. There are no more strings containing json, but uses ES objects for both Read and Write. I'm still working on the Read parts and the integration tests. The integration tests require a lot of refactoring, since the use of ESIntegTestCase with the new RestHighLevelClient is not ideal. I would rather just do my own integration tests based on ElasticsearchIO and a live ES cluster using Docker or with Kubernetes. Any help and review is welcome! > Upgrade to Elasticsearch 6.x > > > Key: BEAM-3199 > URL: https://issues.apache.org/jira/browse/BEAM-3199 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré >Priority: Major > > Elasticsearch 6.x is now GA. As it's fully compatible with Elasticsearch 5.x, > it makes sense to upgrade. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3201) ElasticsearchIO should allow the user to optionally pass id, type and index per document
[ https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371179#comment-16371179 ] Jeroen Steggink commented on BEAM-3201: --- Sorry, to push this again. I really need this :) [~chet.aldrich], do you have an ETA? Otherwise I might build this myself. > ElasticsearchIO should allow the user to optionally pass id, type and index > per document > > > Key: BEAM-3201 > URL: https://issues.apache.org/jira/browse/BEAM-3201 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Reporter: Etienne Chauchot >Assignee: Chet Aldrich >Priority: Major > > *Dynamic documents id*: Today the ESIO only inserts the payload of the ES > documents. Elasticsearch generates a document id for each record inserted. So > each new insertion is considered as a new document. Users want to be able to > update documents using the IO. So, for the write part of the IO, users should > be able to provide a document id so that they could update already stored > documents. Providing an id for the documents could also help the user on > indempotency. > *Dynamic ES type and ES index*: In some cases (streaming pipeline with high > throughput) partitioning the PCollection to allow to plug to different ESIO > instances (pointing to different index/type) is not very practical, the users > would like to be able to set ES index/type per document. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3201) ElasticsearchIO should allow the user to optionally pass id, type and index per document
[ https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327160#comment-16327160 ] Jeroen Steggink commented on BEAM-3201: --- Hi, what's the status for this issue? It's quite an important feature to be able to overwrite existing documents in the index. Can I be of any help? Code review? > ElasticsearchIO should allow the user to optionally pass id, type and index > per document > > > Key: BEAM-3201 > URL: https://issues.apache.org/jira/browse/BEAM-3201 > Project: Beam > Issue Type: Improvement > Components: sdk-java-extensions >Reporter: Etienne Chauchot >Assignee: Chet Aldrich >Priority: Major > > *Dynamic documents id*: Today the ESIO only inserts the payload of the ES > documents. Elasticsearch generates a document id for each record inserted. So > each new insertion is considered as a new document. Users want to be able to > update documents using the IO. So, for the write part of the IO, users should > be able to provide a document id so that they could update already stored > documents. Providing an id for the documents could also help the user on > indempotency. > *Dynamic ES type and ES index*: In some cases (streaming pipeline with high > throughput) partitioning the PCollection to allow to plug to different ESIO > instances (pointing to different index/type) is not very practical, the users > would like to be able to set ES index/type per document. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3199) Upgrade to Elasticsearch 6.x
[ https://issues.apache.org/jira/browse/BEAM-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286578#comment-16286578 ] Jeroen Steggink commented on BEAM-3199: --- Upgrading to Elasticsearch 6.x is not trivial. I would recommend splitting the Maven artifacts to have seperate for Elasticsearch 2.x - 5.x and Elasticsearch 6.x, just like the testing modules are split now. Another caveat is the elasticsearch-tests-common artifact. It's not compatible with version 6.x since some of the queries were deprecated in version 5.x and removed in 6.x. > Upgrade to Elasticsearch 6.x > > > Key: BEAM-3199 > URL: https://issues.apache.org/jira/browse/BEAM-3199 > Project: Beam > Issue Type: Improvement > Components: sdk-java-extensions >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré > > Elasticsearch 6.x is now GA. As it's fully compatible with Elasticsearch 5.x, > it makes sense to upgrade. -- This message was sent by Atlassian JIRA (v6.4.14#64029)