[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=336750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336750 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 31/Oct/19 12:15 Start Date: 31/Oct/19 12:15 Worklog Time Spent: 10m Work Description: timrobertson100 commented on pull request #9940: [BEAM-8306] estimate byte size by product count URL: https://github.com/apache/beam/pull/9940 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336750) Time Spent: 4h 10m (was: 4h) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=336729=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336729 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 31/Oct/19 11:28 Start Date: 31/Oct/19 11:28 Worklog Time Spent: 10m Work Description: timrobertson100 commented on issue #9940: [BEAM-8306] estimate byte size by product count URL: https://github.com/apache/beam/pull/9940#issuecomment-548326429 Thanks @derekunimarket for tidying up and moving to this new PR Waiting for green to merge. As per the previous PR we closed without merging the approach LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336729) Time Spent: 4h (was: 3h 50m) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=336720=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336720 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 31/Oct/19 10:44 Start Date: 31/Oct/19 10:44 Worklog Time Spent: 10m Work Description: timrobertson100 commented on issue #9940: [BEAM-8306] estimate byte size by product count URL: https://github.com/apache/beam/pull/9940#issuecomment-548310650 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336720) Time Spent: 3h 50m (was: 3h 40m) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=336719=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336719 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 31/Oct/19 10:43 Start Date: 31/Oct/19 10:43 Worklog Time Spent: 10m Work Description: timrobertson100 commented on pull request #9660: [BEAM-8306] improve estimation datasize elasticsearch io URL: https://github.com/apache/beam/pull/9660 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336719) Time Spent: 3h 40m (was: 3.5h) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=336718=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336718 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 31/Oct/19 10:43 Start Date: 31/Oct/19 10:43 Worklog Time Spent: 10m Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] improve estimation datasize elasticsearch io URL: https://github.com/apache/beam/pull/9660#issuecomment-548310564 Closing this unmerged and moving to #9940 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336718) Time Spent: 3.5h (was: 3h 20m) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=336485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336485 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 30/Oct/19 21:31 Start Date: 30/Oct/19 21:31 Worklog Time Spent: 10m Work Description: derekunimarket commented on issue #9660: [BEAM-8306] improve estimation datasize elasticsearch io URL: https://github.com/apache/beam/pull/9660#issuecomment-548121359 @timrobertson100 new PR https://github.com/apache/beam/pull/9940 to replace this. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336485) Time Spent: 3h 20m (was: 3h 10m) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=336483=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336483 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 30/Oct/19 21:29 Start Date: 30/Oct/19 21:29 Worklog Time Spent: 10m Work Description: derekunimarket commented on issue #9940: [BEAM-8306] estimate byte size by product count URL: https://github.com/apache/beam/pull/9940#issuecomment-548120580 R: @echauchot @timrobertson100 This PR is used to replace https://github.com/apache/beam/pull/9660. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336483) Time Spent: 3h 10m (was: 3h) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=336480=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336480 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 30/Oct/19 21:25 Start Date: 30/Oct/19 21:25 Worklog Time Spent: 10m Work Description: derekunimarket commented on pull request #9940: [BEAM-8306] estimate byte size by product count URL: https://github.com/apache/beam/pull/9940 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=336455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336455 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 30/Oct/19 20:51 Start Date: 30/Oct/19 20:51 Worklog Time Spent: 10m Work Description: derekunimarket commented on issue #9660: [BEAM-8306] improve estimation datasize elasticsearch io URL: https://github.com/apache/beam/pull/9660#issuecomment-548107025 @timrobertson100 Sorry, it's my bad. The problem is that I created my feature branch based our own patched 2.14.0. I can create a feature branch base on your master branch and cherry pick this fix into the feature branch. And I make a new PR and then it's all clear. Is that Ok? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336455) Time Spent: 2h 50m (was: 2h 40m) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=335762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335762 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 29/Oct/19 21:25 Start Date: 29/Oct/19 21:25 Worklog Time Spent: 10m Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] improve estimation datasize elasticsearch io URL: https://github.com/apache/beam/pull/9660#issuecomment-547635273 Sorry, I could be missing something as it's very late here but isn't it merged on master with [this PR](https://github.com/apache/beam/pull/9314) that was cherry-picked after accidentally going onto the 2.14.0 release? ETA: your branch is 1026 commits behind master. I think if you pull master and rebase you'll find you can reduce this to only the changes necessary for 8306. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 335762) Time Spent: 2h 40m (was: 2.5h) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=335761=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335761 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 29/Oct/19 21:22 Start Date: 29/Oct/19 21:22 Worklog Time Spent: 10m Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] improve estimation datasize elasticsearch io URL: https://github.com/apache/beam/pull/9660#issuecomment-547635273 Sorry, I could be missing something as it's very late here but isn't it merged on master with [this PR](https://github.com/apache/beam/pull/9314) that was cherry-picked after accidentally going onto the 2.14.0 release? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 335761) Time Spent: 2.5h (was: 2h 20m) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=335744=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335744 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 29/Oct/19 20:44 Start Date: 29/Oct/19 20:44 Worklog Time Spent: 10m Work Description: derekunimarket commented on issue #9660: [BEAM-8306] improve estimation datasize elasticsearch io URL: https://github.com/apache/beam/pull/9660#issuecomment-547621183 > BEAM-7916 @timrobertson100 I checked that master branch, confirms that "BEAM-7916" is not in master yet. That's reason we still still see "BEAM-7916" in the PR. The fix of BEAM-7916 is only merged into apache:release-2.14.0 currently. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 335744) Time Spent: 2h 20m (was: 2h 10m) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=335493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335493 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 29/Oct/19 12:12 Start Date: 29/Oct/19 12:12 Worklog Time Spent: 10m Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] improve estimation datasize elasticsearch io URL: https://github.com/apache/beam/pull/9660#issuecomment-547390388 @derekunimarket can you please rebase against the master to avoid the two commits as `BEAM-7916` is already merged? I'll then merge this and open an issue to handle alias indexes in ES. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 335493) Time Spent: 2h 10m (was: 2h) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=335492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335492 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 29/Oct/19 12:12 Start Date: 29/Oct/19 12:12 Worklog Time Spent: 10m Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] improve estimation datasize elasticsearch io URL: https://github.com/apache/beam/pull/9660#issuecomment-547390388 @derekunimarket can you please rebase against the master to avoid the two commits as `BEAM-7916` is already merged. I'll then merge this and open an issue to handle alias indexes in ES. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 335492) Time Spent: 2h (was: 1h 50m) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=335431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335431 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 29/Oct/19 09:13 Start Date: 29/Oct/19 09:13 Worklog Time Spent: 10m Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] improve estimation datasize elasticsearch io URL: https://github.com/apache/beam/pull/9660#issuecomment-547325482 This LGTM - waiting to see green to merge I am not entirely sure this will work when aliasing indexes. I don't believe you can get an estimate of size like this but as far as I can see the same is true in the current implementation too. I think we would need to determine the indices that form part of the alias and then accumulate the sizes. Surely out of the scope of this PR though. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 335431) Time Spent: 1h 50m (was: 1h 40m) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=335418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335418 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 29/Oct/19 08:32 Start Date: 29/Oct/19 08:32 Worklog Time Spent: 10m Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] improve estimation datasize elasticsearch io URL: https://github.com/apache/beam/pull/9660#issuecomment-547311091 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 335418) Time Spent: 1h 40m (was: 1.5h) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=335166=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335166 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 28/Oct/19 20:40 Start Date: 28/Oct/19 20:40 Worklog Time Spent: 10m Work Description: derekunimarket commented on issue #9660: [BEAM-8306] improve estimation datasize elasticsearch io URL: https://github.com/apache/beam/pull/9660#issuecomment-547136331 > I'll review this tomorrow. > @derekunimarket can you please reformat that first commit message to follow the project style? It's missing `[ ]` around the Jira number. Done. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 335166) Time Spent: 1.5h (was: 1h 20m) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=334929=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334929 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 28/Oct/19 13:12 Start Date: 28/Oct/19 13:12 Worklog Time Spent: 10m Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] improve estimation datasize elasticsearch io URL: https://github.com/apache/beam/pull/9660#issuecomment-546938979 I'll review this tomorrow. @derekunimarket can you please reformat that first commit message to follow the project style? It's missing `[ ]` around the Jira number. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 334929) Time Spent: 1h 20m (was: 1h 10m) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=334918=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334918 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 28/Oct/19 12:15 Start Date: 28/Oct/19 12:15 Worklog Time Spent: 10m Work Description: wscheep commented on issue #9660: [BEAM-8306] improve estimation datasize elasticsearch io URL: https://github.com/apache/beam/pull/9660#issuecomment-546919774 Sorry, I don't think I have enough experience to review this. Also, I changed jobs recently and am not working directly with BEAM and ES any more. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 334918) Time Spent: 1h 10m (was: 1h) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=330326=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330326 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 18/Oct/19 06:46 Start Date: 18/Oct/19 06:46 Worklog Time Spent: 10m Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] improve estimation datasize elasticsearch io URL: https://github.com/apache/beam/pull/9660#issuecomment-543547692 Urff - sorry we forgot this one @derekunimarket I am heading into a hectic week of funding board meetings/conference from tomorrow, so can't look until the week after next. I've set a reminder to do so, but hope someone can beforehand. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 330326) Time Spent: 1h (was: 50m) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=330218=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330218 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 18/Oct/19 00:41 Start Date: 18/Oct/19 00:41 Worklog Time Spent: 10m Work Description: derekunimarket commented on issue #9660: [BEAM-8306] improve estimation datasize elasticsearch io URL: https://github.com/apache/beam/pull/9660#issuecomment-543428248 Do you guys have time to review it now? Thanks a lot. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 330218) Time Spent: 50m (was: 40m) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=325119=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325119 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 08/Oct/19 14:47 Start Date: 08/Oct/19 14:47 Worklog Time Spent: 10m Work Description: echauchot commented on issue #9660: [BEAM-8306] improve estimation datasize elasticsearch io URL: https://github.com/apache/beam/pull/9660#issuecomment-539549366 Sorry I have don't have time to review this, @wscheep maybe ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 325119) Time Spent: 40m (was: 0.5h) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=319182=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319182 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 26/Sep/19 19:43 Start Date: 26/Sep/19 19:43 Worklog Time Spent: 10m Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] improve estimation datasize elasticsearch io URL: https://github.com/apache/beam/pull/9660#issuecomment-535657412 I don't have time to do a thorough review until next week but I looked over this for 10 mins and my general impression is it looks like a good addition. (Commit message is not formatted correctly) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 319182) Time Spent: 0.5h (was: 20m) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=319181=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319181 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 26/Sep/19 19:43 Start Date: 26/Sep/19 19:43 Worklog Time Spent: 10m Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] improve estimation datasize elasticsearch io URL: https://github.com/apache/beam/pull/9660#issuecomment-535657412 I don't have time to do a thorough review until next week but I looked over this and my general impression is it looks like a good addition. (Commit message is not formatted correctly) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 319181) Time Spent: 20m (was: 10m) > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=318943=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-318943 ] ASF GitHub Bot logged work on BEAM-8306: Author: ASF GitHub Bot Created on: 26/Sep/19 12:59 Start Date: 26/Sep/19 12:59 Worklog Time Spent: 10m Work Description: iemejia commented on issue #9660: [BEAM-8306] improve estimation datasize elasticsearch io URL: https://github.com/apache/beam/pull/9660#issuecomment-535490565 R: @echauchot @timrobertson100 Can some of you PTAL at this one, looks like an interesting improvement. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 318943) Remaining Estimate: 0h Time Spent: 10m > improve estimation of data byte size reading from source in ElasticsearchIO > --- > > Key: BEAM-8306 > URL: https://issues.apache.org/jira/browse/BEAM-8306 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.14.0 >Reporter: Derek He >Assignee: Derek He >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. > We expect it can be more accurate to split it base on query result size. > Currently, we have a big Elasticsearch index. But for query result, it only > contains a few documents in the index. ElasticsearchIO splits it into up > to1024 BoundedSources in Google dataflow. It takes long time to finish the > processing the small numbers of Elasticsearch document in Google dataflow. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)