[jira] [Commented] (BEAM-3201) ElasticsearchIO should allow the user to optionally pass id, type and index per document
[ https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415271#comment-16415271 ] Tim Robertson commented on BEAM-3201: - Thank you [~chet.aldrich]. I'll assign this to me and try and get it through review starting today. > ElasticsearchIO should allow the user to optionally pass id, type and index > per document > > > Key: BEAM-3201 > URL: https://issues.apache.org/jira/browse/BEAM-3201 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Reporter: Etienne Chauchot >Assignee: Tim Robertson >Priority: Major > > *Dynamic documents id*: Today the ESIO only inserts the payload of the ES > documents. Elasticsearch generates a document id for each record inserted. So > each new insertion is considered as a new document. Users want to be able to > update documents using the IO. So, for the write part of the IO, users should > be able to provide a document id so that they could update already stored > documents. Providing an id for the documents could also help the user on > indempotency. > *Dynamic ES type and ES index*: In some cases (streaming pipeline with high > throughput) partitioning the PCollection to allow to plug to different ESIO > instances (pointing to different index/type) is not very practical, the users > would like to be able to set ES index/type per document. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3201) ElasticsearchIO should allow the user to optionally pass id, type and index per document
[ https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414759#comment-16414759 ] Chet Aldrich commented on BEAM-3201: Oh ok, great, good to hear! > ElasticsearchIO should allow the user to optionally pass id, type and index > per document > > > Key: BEAM-3201 > URL: https://issues.apache.org/jira/browse/BEAM-3201 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Reporter: Etienne Chauchot >Assignee: Chet Aldrich >Priority: Major > > *Dynamic documents id*: Today the ESIO only inserts the payload of the ES > documents. Elasticsearch generates a document id for each record inserted. So > each new insertion is considered as a new document. Users want to be able to > update documents using the IO. So, for the write part of the IO, users should > be able to provide a document id so that they could update already stored > documents. Providing an id for the documents could also help the user on > indempotency. > *Dynamic ES type and ES index*: In some cases (streaming pipeline with high > throughput) partitioning the PCollection to allow to plug to different ESIO > instances (pointing to different index/type) is not very practical, the users > would like to be able to set ES index/type per document. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3201) ElasticsearchIO should allow the user to optionally pass id, type and index per document
[ https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414640#comment-16414640 ] Tim Robertson commented on BEAM-3201: - [~chet.aldrich] I really don't want to step on your toes, but I needed this functionality so have an implementation. I couldn't have done this so easily without your work (thanks!). [https://github.com/timrobertson100/beam/commit/a6002f1a4b8388e955e512281d38001ae828cdcf] The commit above needs a little bit of tidying as I have accidentally reformatted the whole SolrIO incorrectly - but it is late here and I'll do it tomorrow. I made the following changes from the approach [~chet.aldrich] had started: # Used a single Interface instead of 3 and I opted for a different signature # Used Jackson for JSON serde ## Removes need to bring in another dependency ## I _suspect_ is in wider use so might be better as it forms part of the public API # Added the ability to route to different types which was not yet implemented in your branch # Added sanity checking to the index field (ES requires lower case) # I opted for different test strategy to avoid using the deprecated DoFnTester Would it be ok with you [~chet.aldrich] / [~echauchot] if I put this up for a PR tomorrow after I tidy my reformatting error please? Other than formatting I think it is a complete solution to this issue with test coverage. > ElasticsearchIO should allow the user to optionally pass id, type and index > per document > > > Key: BEAM-3201 > URL: https://issues.apache.org/jira/browse/BEAM-3201 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Reporter: Etienne Chauchot >Assignee: Chet Aldrich >Priority: Major > > *Dynamic documents id*: Today the ESIO only inserts the payload of the ES > documents. Elasticsearch generates a document id for each record inserted. So > each new insertion is considered as a new document. Users want to be able to > update documents using the IO. So, for the write part of the IO, users should > be able to provide a document id so that they could update already stored > documents. Providing an id for the documents could also help the user on > indempotency. > *Dynamic ES type and ES index*: In some cases (streaming pipeline with high > throughput) partitioning the PCollection to allow to plug to different ESIO > instances (pointing to different index/type) is not very practical, the users > would like to be able to set ES index/type per document. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3201) ElasticsearchIO should allow the user to optionally pass id, type and index per document
[ https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409079#comment-16409079 ] Chet Aldrich commented on BEAM-3201: Hey all, sorry I kinda vanished, just been really busy. I'll get back on this. I'll open a PR as is to start and we can go from there. > ElasticsearchIO should allow the user to optionally pass id, type and index > per document > > > Key: BEAM-3201 > URL: https://issues.apache.org/jira/browse/BEAM-3201 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Reporter: Etienne Chauchot >Assignee: Chet Aldrich >Priority: Major > > *Dynamic documents id*: Today the ESIO only inserts the payload of the ES > documents. Elasticsearch generates a document id for each record inserted. So > each new insertion is considered as a new document. Users want to be able to > update documents using the IO. So, for the write part of the IO, users should > be able to provide a document id so that they could update already stored > documents. Providing an id for the documents could also help the user on > indempotency. > *Dynamic ES type and ES index*: In some cases (streaming pipeline with high > throughput) partitioning the PCollection to allow to plug to different ESIO > instances (pointing to different index/type) is not very practical, the users > would like to be able to set ES index/type per document. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3201) ElasticsearchIO should allow the user to optionally pass id, type and index per document
[ https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407630#comment-16407630 ] Etienne Chauchot commented on BEAM-3201: +1 to what [~timrobertson100] said. [~chet.aldrich] maybe you could open a PR with your branch as it is so that I can review it. If there are some updates to do during the review process I could do them as a PR on your PR branch if you have no time. > ElasticsearchIO should allow the user to optionally pass id, type and index > per document > > > Key: BEAM-3201 > URL: https://issues.apache.org/jira/browse/BEAM-3201 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Reporter: Etienne Chauchot >Assignee: Chet Aldrich >Priority: Major > > *Dynamic documents id*: Today the ESIO only inserts the payload of the ES > documents. Elasticsearch generates a document id for each record inserted. So > each new insertion is considered as a new document. Users want to be able to > update documents using the IO. So, for the write part of the IO, users should > be able to provide a document id so that they could update already stored > documents. Providing an id for the documents could also help the user on > indempotency. > *Dynamic ES type and ES index*: In some cases (streaming pipeline with high > throughput) partitioning the PCollection to allow to plug to different ESIO > instances (pointing to different index/type) is not very practical, the users > would like to be able to set ES index/type per document. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3201) ElasticsearchIO should allow the user to optionally pass id, type and index per document
[ https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407594#comment-16407594 ] Tim Robertson commented on BEAM-3201: - This one has gone quiet and is pretty important (also for failure / retry scenarios resulting in duplicate docs). [~chet.aldrich] - can you please post an update of your intentions? Your GH branch ([https://github.com/chetaldrich/beam/commits/beam-3201]) looks like it is reasonably well progressed - is there anything we can do to help? > ElasticsearchIO should allow the user to optionally pass id, type and index > per document > > > Key: BEAM-3201 > URL: https://issues.apache.org/jira/browse/BEAM-3201 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Reporter: Etienne Chauchot >Assignee: Chet Aldrich >Priority: Major > > *Dynamic documents id*: Today the ESIO only inserts the payload of the ES > documents. Elasticsearch generates a document id for each record inserted. So > each new insertion is considered as a new document. Users want to be able to > update documents using the IO. So, for the write part of the IO, users should > be able to provide a document id so that they could update already stored > documents. Providing an id for the documents could also help the user on > indempotency. > *Dynamic ES type and ES index*: In some cases (streaming pipeline with high > throughput) partitioning the PCollection to allow to plug to different ESIO > instances (pointing to different index/type) is not very practical, the users > would like to be able to set ES index/type per document. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3201) ElasticsearchIO should allow the user to optionally pass id, type and index per document
[ https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371179#comment-16371179 ] Jeroen Steggink commented on BEAM-3201: --- Sorry, to push this again. I really need this :) [~chet.aldrich], do you have an ETA? Otherwise I might build this myself. > ElasticsearchIO should allow the user to optionally pass id, type and index > per document > > > Key: BEAM-3201 > URL: https://issues.apache.org/jira/browse/BEAM-3201 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Reporter: Etienne Chauchot >Assignee: Chet Aldrich >Priority: Major > > *Dynamic documents id*: Today the ESIO only inserts the payload of the ES > documents. Elasticsearch generates a document id for each record inserted. So > each new insertion is considered as a new document. Users want to be able to > update documents using the IO. So, for the write part of the IO, users should > be able to provide a document id so that they could update already stored > documents. Providing an id for the documents could also help the user on > indempotency. > *Dynamic ES type and ES index*: In some cases (streaming pipeline with high > throughput) partitioning the PCollection to allow to plug to different ESIO > instances (pointing to different index/type) is not very practical, the users > would like to be able to set ES index/type per document. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3201) ElasticsearchIO should allow the user to optionally pass id, type and index per document
[ https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340712#comment-16340712 ] Etienne Chauchot commented on BEAM-3201: [~chet.aldrich], tell me when you are ready for the review > ElasticsearchIO should allow the user to optionally pass id, type and index > per document > > > Key: BEAM-3201 > URL: https://issues.apache.org/jira/browse/BEAM-3201 > Project: Beam > Issue Type: Improvement > Components: sdk-java-extensions >Reporter: Etienne Chauchot >Assignee: Chet Aldrich >Priority: Major > > *Dynamic documents id*: Today the ESIO only inserts the payload of the ES > documents. Elasticsearch generates a document id for each record inserted. So > each new insertion is considered as a new document. Users want to be able to > update documents using the IO. So, for the write part of the IO, users should > be able to provide a document id so that they could update already stored > documents. Providing an id for the documents could also help the user on > indempotency. > *Dynamic ES type and ES index*: In some cases (streaming pipeline with high > throughput) partitioning the PCollection to allow to plug to different ESIO > instances (pointing to different index/type) is not very practical, the users > would like to be able to set ES index/type per document. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3201) ElasticsearchIO should allow the user to optionally pass id, type and index per document
[ https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327588#comment-16327588 ] Chet Aldrich commented on BEAM-3201: Hey [~jeroens], I'm in progress on a PR for this, just need to actually get the rest of the code out the door. I'm gonna spend some time this week to knock this out, and then we can start code review. > ElasticsearchIO should allow the user to optionally pass id, type and index > per document > > > Key: BEAM-3201 > URL: https://issues.apache.org/jira/browse/BEAM-3201 > Project: Beam > Issue Type: Improvement > Components: sdk-java-extensions >Reporter: Etienne Chauchot >Assignee: Chet Aldrich >Priority: Major > > *Dynamic documents id*: Today the ESIO only inserts the payload of the ES > documents. Elasticsearch generates a document id for each record inserted. So > each new insertion is considered as a new document. Users want to be able to > update documents using the IO. So, for the write part of the IO, users should > be able to provide a document id so that they could update already stored > documents. Providing an id for the documents could also help the user on > indempotency. > *Dynamic ES type and ES index*: In some cases (streaming pipeline with high > throughput) partitioning the PCollection to allow to plug to different ESIO > instances (pointing to different index/type) is not very practical, the users > would like to be able to set ES index/type per document. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3201) ElasticsearchIO should allow the user to optionally pass id, type and index per document
[ https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327160#comment-16327160 ] Jeroen Steggink commented on BEAM-3201: --- Hi, what's the status for this issue? It's quite an important feature to be able to overwrite existing documents in the index. Can I be of any help? Code review? > ElasticsearchIO should allow the user to optionally pass id, type and index > per document > > > Key: BEAM-3201 > URL: https://issues.apache.org/jira/browse/BEAM-3201 > Project: Beam > Issue Type: Improvement > Components: sdk-java-extensions >Reporter: Etienne Chauchot >Assignee: Chet Aldrich >Priority: Major > > *Dynamic documents id*: Today the ESIO only inserts the payload of the ES > documents. Elasticsearch generates a document id for each record inserted. So > each new insertion is considered as a new document. Users want to be able to > update documents using the IO. So, for the write part of the IO, users should > be able to provide a document id so that they could update already stored > documents. Providing an id for the documents could also help the user on > indempotency. > *Dynamic ES type and ES index*: In some cases (streaming pipeline with high > throughput) partitioning the PCollection to allow to plug to different ESIO > instances (pointing to different index/type) is not very practical, the users > would like to be able to set ES index/type per document. -- This message was sent by Atlassian JIRA (v7.6.3#76005)