[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=336750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336750
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 31/Oct/19 12:15
Start Date: 31/Oct/19 12:15
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on pull request #9940: 
[BEAM-8306] estimate byte size by product count
URL: https://github.com/apache/beam/pull/9940
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336750)
Time Spent: 4h 10m  (was: 4h)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=336729=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336729
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 31/Oct/19 11:28
Start Date: 31/Oct/19 11:28
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #9940: [BEAM-8306] 
estimate byte size by product count
URL: https://github.com/apache/beam/pull/9940#issuecomment-548326429
 
 
   Thanks @derekunimarket for tidying up and moving to this new PR
   
   Waiting for green to merge. As per the previous PR we closed without merging 
the approach LGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336729)
Time Spent: 4h  (was: 3h 50m)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=336720=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336720
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 31/Oct/19 10:44
Start Date: 31/Oct/19 10:44
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #9940: [BEAM-8306] 
estimate byte size by product count
URL: https://github.com/apache/beam/pull/9940#issuecomment-548310650
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336720)
Time Spent: 3h 50m  (was: 3h 40m)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=336719=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336719
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 31/Oct/19 10:43
Start Date: 31/Oct/19 10:43
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on pull request #9660: 
[BEAM-8306] improve estimation datasize elasticsearch io
URL: https://github.com/apache/beam/pull/9660
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336719)
Time Spent: 3h 40m  (was: 3.5h)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=336718=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336718
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 31/Oct/19 10:43
Start Date: 31/Oct/19 10:43
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] 
improve estimation datasize elasticsearch io
URL: https://github.com/apache/beam/pull/9660#issuecomment-548310564
 
 
   Closing this unmerged and moving to #9940 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336718)
Time Spent: 3.5h  (was: 3h 20m)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=336485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336485
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 30/Oct/19 21:31
Start Date: 30/Oct/19 21:31
Worklog Time Spent: 10m 
  Work Description: derekunimarket commented on issue #9660: [BEAM-8306] 
improve estimation datasize elasticsearch io
URL: https://github.com/apache/beam/pull/9660#issuecomment-548121359
 
 
   @timrobertson100
   new PR https://github.com/apache/beam/pull/9940 to replace this. Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336485)
Time Spent: 3h 20m  (was: 3h 10m)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=336483=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336483
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 30/Oct/19 21:29
Start Date: 30/Oct/19 21:29
Worklog Time Spent: 10m 
  Work Description: derekunimarket commented on issue #9940: [BEAM-8306] 
estimate byte size by product count
URL: https://github.com/apache/beam/pull/9940#issuecomment-548120580
 
 
   R: @echauchot @timrobertson100
   
   This PR is used to replace https://github.com/apache/beam/pull/9660. Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336483)
Time Spent: 3h 10m  (was: 3h)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=336480=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336480
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 30/Oct/19 21:25
Start Date: 30/Oct/19 21:25
Worklog Time Spent: 10m 
  Work Description: derekunimarket commented on pull request #9940: 
[BEAM-8306] estimate byte size by product count
URL: https://github.com/apache/beam/pull/9940
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=336455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336455
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 30/Oct/19 20:51
Start Date: 30/Oct/19 20:51
Worklog Time Spent: 10m 
  Work Description: derekunimarket commented on issue #9660: [BEAM-8306] 
improve estimation datasize elasticsearch io
URL: https://github.com/apache/beam/pull/9660#issuecomment-548107025
 
 
   @timrobertson100 
   Sorry, it's my bad. The problem is that I created my feature branch based 
our own patched 2.14.0.
   I can create a feature branch base on your master branch and cherry pick 
this fix into the feature branch.
   And I make a new PR and then it's all clear. Is that Ok? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336455)
Time Spent: 2h 50m  (was: 2h 40m)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=335762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335762
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 29/Oct/19 21:25
Start Date: 29/Oct/19 21:25
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] 
improve estimation datasize elasticsearch io
URL: https://github.com/apache/beam/pull/9660#issuecomment-547635273
 
 
   Sorry, I could be missing something as it's very late here but isn't it 
merged on master with [this PR](https://github.com/apache/beam/pull/9314) that 
was cherry-picked after accidentally going onto the 2.14.0 release? 
   
   ETA: your branch is 1026 commits behind master. I think if you pull master 
and rebase you'll find you can reduce this to only the changes necessary for 
8306.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335762)
Time Spent: 2h 40m  (was: 2.5h)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=335761=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335761
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 29/Oct/19 21:22
Start Date: 29/Oct/19 21:22
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] 
improve estimation datasize elasticsearch io
URL: https://github.com/apache/beam/pull/9660#issuecomment-547635273
 
 
   Sorry, I could be missing something as it's very late here but isn't it 
merged on master with [this PR](https://github.com/apache/beam/pull/9314) that 
was cherry-picked after accidentally going onto the 2.14.0 release? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335761)
Time Spent: 2.5h  (was: 2h 20m)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=335744=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335744
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 29/Oct/19 20:44
Start Date: 29/Oct/19 20:44
Worklog Time Spent: 10m 
  Work Description: derekunimarket commented on issue #9660: [BEAM-8306] 
improve estimation datasize elasticsearch io
URL: https://github.com/apache/beam/pull/9660#issuecomment-547621183
 
 
   > BEAM-7916
   
   @timrobertson100 I checked that master branch, confirms that "BEAM-7916" is 
not in master yet. That's reason we still still see "BEAM-7916" in the PR. The 
fix of BEAM-7916 is only merged into apache:release-2.14.0 currently.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335744)
Time Spent: 2h 20m  (was: 2h 10m)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=335493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335493
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 29/Oct/19 12:12
Start Date: 29/Oct/19 12:12
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] 
improve estimation datasize elasticsearch io
URL: https://github.com/apache/beam/pull/9660#issuecomment-547390388
 
 
   @derekunimarket can you please rebase against the master to avoid the two 
commits as `BEAM-7916` is already merged? I'll then merge this and open an 
issue to handle alias indexes in ES.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335493)
Time Spent: 2h 10m  (was: 2h)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=335492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335492
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 29/Oct/19 12:12
Start Date: 29/Oct/19 12:12
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] 
improve estimation datasize elasticsearch io
URL: https://github.com/apache/beam/pull/9660#issuecomment-547390388
 
 
   @derekunimarket can you please rebase against the master to avoid the two 
commits as `BEAM-7916` is already merged. I'll then merge this and open an 
issue to handle alias indexes in ES.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335492)
Time Spent: 2h  (was: 1h 50m)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=335431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335431
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 29/Oct/19 09:13
Start Date: 29/Oct/19 09:13
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] 
improve estimation datasize elasticsearch io
URL: https://github.com/apache/beam/pull/9660#issuecomment-547325482
 
 
   This LGTM - waiting to see green to merge
   
   I am not entirely sure this will work when aliasing indexes. I don't believe 
you can get an estimate of size like this but as far as I can see the same is 
true in the current implementation too. I think we would need to determine the 
indices that form part of the alias and then accumulate the sizes. Surely out 
of the scope of this PR though.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335431)
Time Spent: 1h 50m  (was: 1h 40m)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=335418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335418
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 29/Oct/19 08:32
Start Date: 29/Oct/19 08:32
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] 
improve estimation datasize elasticsearch io
URL: https://github.com/apache/beam/pull/9660#issuecomment-547311091
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335418)
Time Spent: 1h 40m  (was: 1.5h)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=335166=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335166
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 28/Oct/19 20:40
Start Date: 28/Oct/19 20:40
Worklog Time Spent: 10m 
  Work Description: derekunimarket commented on issue #9660: [BEAM-8306] 
improve estimation datasize elasticsearch io
URL: https://github.com/apache/beam/pull/9660#issuecomment-547136331
 
 
   > I'll review this tomorrow.
   > @derekunimarket can you please reformat that first commit message to 
follow the project style? It's missing `[ ]` around the Jira number.
   
   Done. Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335166)
Time Spent: 1.5h  (was: 1h 20m)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=334929=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334929
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 28/Oct/19 13:12
Start Date: 28/Oct/19 13:12
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] 
improve estimation datasize elasticsearch io
URL: https://github.com/apache/beam/pull/9660#issuecomment-546938979
 
 
   I'll review this tomorrow. 
   @derekunimarket can you please reformat that first commit message to follow 
the project style? It's missing `[ ]` around the Jira number.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334929)
Time Spent: 1h 20m  (was: 1h 10m)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=334918=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334918
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 28/Oct/19 12:15
Start Date: 28/Oct/19 12:15
Worklog Time Spent: 10m 
  Work Description: wscheep commented on issue #9660: [BEAM-8306] improve 
estimation datasize elasticsearch io
URL: https://github.com/apache/beam/pull/9660#issuecomment-546919774
 
 
   Sorry, I don't think I have enough experience to review this. Also, I 
changed jobs recently and am not working directly with BEAM and ES any more.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334918)
Time Spent: 1h 10m  (was: 1h)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=330326=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330326
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 18/Oct/19 06:46
Start Date: 18/Oct/19 06:46
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] 
improve estimation datasize elasticsearch io
URL: https://github.com/apache/beam/pull/9660#issuecomment-543547692
 
 
   Urff - sorry we forgot this one @derekunimarket 
   
   I am heading into a hectic week of funding board meetings/conference from 
tomorrow, so can't look until the week after next. I've set a reminder to do 
so, but hope someone can beforehand.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330326)
Time Spent: 1h  (was: 50m)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=330218=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330218
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 18/Oct/19 00:41
Start Date: 18/Oct/19 00:41
Worklog Time Spent: 10m 
  Work Description: derekunimarket commented on issue #9660: [BEAM-8306] 
improve estimation datasize elasticsearch io
URL: https://github.com/apache/beam/pull/9660#issuecomment-543428248
 
 
   Do you guys have time to review it now? Thanks a lot.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330218)
Time Spent: 50m  (was: 40m)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=325119=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325119
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 08/Oct/19 14:47
Start Date: 08/Oct/19 14:47
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #9660: [BEAM-8306] improve 
estimation datasize elasticsearch io
URL: https://github.com/apache/beam/pull/9660#issuecomment-539549366
 
 
   Sorry I have don't have time to review this,  @wscheep maybe ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325119)
Time Spent: 40m  (was: 0.5h)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=319182=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319182
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 26/Sep/19 19:43
Start Date: 26/Sep/19 19:43
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] 
improve estimation datasize elasticsearch io
URL: https://github.com/apache/beam/pull/9660#issuecomment-535657412
 
 
   I don't have time to do a thorough review until next week but I looked over 
this for 10 mins and my general impression is it looks like a good addition. 
   
   (Commit message is not formatted correctly)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 319182)
Time Spent: 0.5h  (was: 20m)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=319181=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319181
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 26/Sep/19 19:43
Start Date: 26/Sep/19 19:43
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #9660: [BEAM-8306] 
improve estimation datasize elasticsearch io
URL: https://github.com/apache/beam/pull/9660#issuecomment-535657412
 
 
   I don't have time to do a thorough review until next week but I looked over 
this and my general impression is it looks like a good addition. 
   
   (Commit message is not formatted correctly)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 319181)
Time Spent: 20m  (was: 10m)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=318943=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-318943
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 26/Sep/19 12:59
Start Date: 26/Sep/19 12:59
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #9660: [BEAM-8306] improve 
estimation datasize elasticsearch io
URL: https://github.com/apache/beam/pull/9660#issuecomment-535490565
 
 
   R: @echauchot @timrobertson100 
   Can some of you PTAL at this one, looks like an interesting improvement.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 318943)
Remaining Estimate: 0h
Time Spent: 10m

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)