[jira] [Work logged] (BEAM-3372) Duplicated 'zone' PipelineOption has inconsistent documentation

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3372?focusedWorklogId=318085=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-318085
 ]

ASF GitHub Bot logged work on BEAM-3372:


Author: ASF GitHub Bot
Created on: 25/Sep/19 05:12
Start Date: 25/Sep/19 05:12
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9578: 
[BEAM-3372] remove duplicated zone
URL: https://github.com/apache/beam/pull/9578
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 318085)
Time Spent: 1.5h  (was: 1h 20m)

> Duplicated 'zone' PipelineOption has inconsistent documentation
> ---
>
> Key: BEAM-3372
> URL: https://issues.apache.org/jira/browse/BEAM-3372
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, runner-dataflow
>Reporter: Scott Wegner
>Priority: Minor
>  Labels: ccoss2019, newbie, starter, test
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Two different PipelineOptions interfaces defined a 'zone' option: GcpOptions 
> [1] and DataflowWorkerPoolOptions [2]. It's not an error for an option to be 
> redefined, and internally Beam checks that the definitions are compatible.
> In this case the two 'zone' definitions are compatible but they have 
> different descriptions. This can be confusing as setting one will also impact 
> the other.
> We should make improvements around duplicate PipelineOptions definitions for 
> a given runner. In this case, I propose we:
> a) Update the @Description's so that they match.
> b) Mark one of them as @Deprecated with a link to the other. Migrate code 
> references and plan to remove it on the next major version.
> c) Add a test which checks all PipelineOptions on the DataflowRunner 
> classpath and verify that any duplicates have the properties above 
> (equivalent definitions including @Description, and only one non-@Deprecated 
> version)
> [1] 
> https://github.com/apache/beam/blob/670941961845593d9a7e09b17c1bd117f27bf579/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L95
> [2] 
> https://github.com/apache/beam/blob/670941961845593d9a7e09b17c1bd117f27bf579/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java#L175



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=318083=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-318083
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 25/Sep/19 05:03
Start Date: 25/Sep/19 05:03
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #9661: 
[BEAM-7389] Use includes for buttons
URL: https://github.com/apache/beam/pull/9661
 
 
   There are *no changes* to the actual content, just simplified the docs 
through includes. This also required updating the `md2ipynb` script to support 
Jekyll/Liquid syntax for some constructs.
   
   R: @aaltay , @rosetn 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-1296) Providing a small dataset for "Apache Beam Mobile Gaming Pipeline Examples"

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1296?focusedWorklogId=318082=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-318082
 ]

ASF GitHub Bot logged work on BEAM-1296:


Author: ASF GitHub Bot
Created on: 25/Sep/19 05:02
Start Date: 25/Sep/19 05:02
Worklog Time Spent: 10m 
  Work Description: angulartist commented on issue #9633: [BEAM-1296] 
Providing a small dataset for "Apache Beam Mobile Gaming …
URL: https://github.com/apache/beam/pull/9633#issuecomment-534853172
 
 
   Yeah that sounds like a good idea, file is kinda small so friends will be 
able to download it :fire:
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 318082)
Time Spent: 50m  (was: 40m)

> Providing a small dataset for "Apache Beam Mobile Gaming Pipeline Examples"
> ---
>
> Key: BEAM-1296
> URL: https://issues.apache.org/jira/browse/BEAM-1296
> Project: Beam
>  Issue Type: Wish
>  Components: examples-java
>Reporter: Keiji Yoshida
>Assignee: John Patoch
>Priority: Trivial
>  Labels: ccoss2019, newbie, starter
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> A dataset "gs://apache-beam-samples/game/gaming_data*.csv" for "Apache Beam 
> Mobile Gaming Pipeline Examples" is so huge (about 12 GB) and it takes long 
> time to download the dataset. It might pose difficulties to Apache Beam 
> beginners who want to try "Apache Beam Mobile Gaming Pipeline Examples" 
> quickly.
> How about providing a small dataset (say less than 1 GB) for this examples?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=318076=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-318076
 ]

ASF GitHub Bot logged work on BEAM-8213:


Author: ASF GitHub Bot
Created on: 25/Sep/19 04:53
Start Date: 25/Sep/19 04:53
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9642: [BEAM-8213] Split up 
monolithic python preCommit tests on jenkins
URL: https://github.com/apache/beam/pull/9642#issuecomment-534848347
 
 
   > I don't think we will have 1/5 reduction, since currently Python 2.7, 3.5, 
3.6, 3.7 precommits are already executing in parallel (by way of gradle 
parallelism) on the same Jenkins worker taking up one slot, so we will still 
have a parallel execution but will require 4x more slots.
   
   I believe he was saying that each of the 5 jobs will take on average 1/5th 
the time of the current monolithic job.  Each job will continue to run in 
parallel, so that I think this is basically true, excepting for the fact that 
there are some differences between the jobs (lint is fast, some python versions 
run IT, some don't), but I agree in principle.
   
   > Increasing slots per worker may help, but there are some potentially 
heavy-weight tests, such as portable python precommit tests that bring up 
Flink, that may cause jenkins VMs to OOM if we run a lot of them in parallel on 
the same VM.
   
   This PR does not split up the Python_Portable PreCommit tests that start up 
flink.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 318076)
Time Spent: 3h  (was: 2h 50m)

> Run and report python tox tasks separately within Jenkins
> -
>
> Key: BEAM-8213
> URL: https://issues.apache.org/jira/browse/BEAM-8213
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> As a python developer, the speed and comprehensibility of the jenkins 
> PreCommit job could be greatly improved.
> Here are some of the problems
> - when a lint job fails, it's not reported in the test results summary, so 
> even though the job is marked as failed, I see "Test Result (no failures)" 
> which is quite confusing
> - I have to wait for over an hour to discover the lint failed, which takes 
> about a minute to run on its own
> - The logs are a jumbled mess of all the different tasks running on top of 
> each other
> - The test results give no indication of which version of python they use.  I 
> click on Test results, then the test module, then the test class, then I see 
> 4 tests named the same thing.  I assume that the first is python 2.7, the 
> second is 3.5 and so on.   It takes 5 clicks and then reading the log output 
> to know which version of python a single error pertains to, then I need to 
> repeat for each failure.  This makes it very difficult to discover problems, 
> and deduce that they may have something to do with python version mismatches.
> I believe the solution to this is to split up the single monolithic python 
> PreCommit job into sub-jobs (possibly using a pipeline with steps).  This 
> would give us the following benefits:
> - sub job results should become available as they finish, so for example, 
> lint results should be available very early on
> - sub job results will be reported separately, and there will be a job for 
> each py2, py35, py36 and so on, so it will be clear when an error is related 
> to a particular python version
> - sub jobs without reports, like docs and lint, will have their own failure 
> status and logs, so when they fail it will be more obvious what went wrong.
> I'm happy to help out once I get some feedback on the desired way forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=318070=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-318070
 ]

ASF GitHub Bot logged work on BEAM-8213:


Author: ASF GitHub Bot
Created on: 25/Sep/19 04:40
Start Date: 25/Sep/19 04:40
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9642: [BEAM-8213] Split up 
monolithic python preCommit tests on jenkins
URL: https://github.com/apache/beam/pull/9642#issuecomment-534848347
 
 
   > I don't think we will have 1/5 reduction, since currently Python 2.7, 3.5, 
3.6, 3.7 precommits are already executing in parallel (by way of gradle 
parallelism) on the same Jenkins worker taking up one slot, so we will still 
have a parallel execution but will require 4x more slots.
   
   I believe he was saying that each of the 5 jobs will take on average 1/5th 
the time it took before.  Each job will continue to run in parallel, so that I 
think this is basically true, excepting for the fact that there are some 
differences between the jobs (lint is fast, some python versions run IT, some 
don't), but I agree in principle.
   
   > Increasing slots per worker may help, but there are some potentially 
heavy-weight tests, such as portable python precommit tests that bring up 
Flink, that may cause jenkins VMs to OOM if we run a lot of them in parallel on 
the same VM.
   
   This PR does not split up the Python_Portable PreCommit tests that start up 
flink.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 318070)
Time Spent: 2h 50m  (was: 2h 40m)

> Run and report python tox tasks separately within Jenkins
> -
>
> Key: BEAM-8213
> URL: https://issues.apache.org/jira/browse/BEAM-8213
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> As a python developer, the speed and comprehensibility of the jenkins 
> PreCommit job could be greatly improved.
> Here are some of the problems
> - when a lint job fails, it's not reported in the test results summary, so 
> even though the job is marked as failed, I see "Test Result (no failures)" 
> which is quite confusing
> - I have to wait for over an hour to discover the lint failed, which takes 
> about a minute to run on its own
> - The logs are a jumbled mess of all the different tasks running on top of 
> each other
> - The test results give no indication of which version of python they use.  I 
> click on Test results, then the test module, then the test class, then I see 
> 4 tests named the same thing.  I assume that the first is python 2.7, the 
> second is 3.5 and so on.   It takes 5 clicks and then reading the log output 
> to know which version of python a single error pertains to, then I need to 
> repeat for each failure.  This makes it very difficult to discover problems, 
> and deduce that they may have something to do with python version mismatches.
> I believe the solution to this is to split up the single monolithic python 
> PreCommit job into sub-jobs (possibly using a pipeline with steps).  This 
> would give us the following benefits:
> - sub job results should become available as they finish, so for example, 
> lint results should be available very early on
> - sub job results will be reported separately, and there will be a job for 
> each py2, py35, py36 and so on, so it will be clear when an error is related 
> to a particular python version
> - sub jobs without reports, like docs and lint, will have their own failure 
> status and logs, so when they fail it will be more obvious what went wrong.
> I'm happy to help out once I get some feedback on the desired way forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5820) Vendor Calcite

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=318038=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-318038
 ]

ASF GitHub Bot logged work on BEAM-5820:


Author: ASF GitHub Bot
Created on: 25/Sep/19 03:41
Start Date: 25/Sep/19 03:41
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9189: [BEAM-5820] Use 
vendored calcite
URL: https://github.com/apache/beam/pull/9189#issuecomment-534837694
 
 
   Nice work! This is huge.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 318038)
Time Spent: 16h 10m  (was: 16h)

> Vendor Calcite
> --
>
> Key: BEAM-5820
> URL: https://issues.apache.org/jira/browse/BEAM-5820
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kai Jiang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 16h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6684) BigQueryIO: Unable to create dataset "Location unknown is not yet publicly available

2019-09-24 Thread Innocent (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937337#comment-16937337
 ] 

Innocent commented on BEAM-6684:


[~pabloem] is there any steps to reproduce this issue?

> BigQueryIO: Unable to create dataset "Location unknown is not yet publicly 
> available
> 
>
> Key: BEAM-6684
> URL: https://issues.apache.org/jira/browse/BEAM-6684
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.10.0
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>
> My understanding is that BigQueryIO runs the query, writes the output to a 
> temp dataset, and then extracts the temp dataset to GCS. This means the 
> location of the temp dataset (if not manually set) is determined by the 
> tables referenced in the query. This is confirmed in the source code for 
> BigQueryIO: 
> https://github.com/apache/beam/blob/v2.6.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQuerySource.java#L111
> So I would expect that the temp dataset should also be created in the US 
> location, or default to the US. Instead, it appears to be defaulting to 
> "unknown" (at least some of the time), therefore causing the whole Dataflow 
> job to fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8164) Correct document for building the python SDK harness container

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8164?focusedWorklogId=317986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317986
 ]

ASF GitHub Bot logged work on BEAM-8164:


Author: ASF GitHub Bot
Created on: 25/Sep/19 01:57
Start Date: 25/Sep/19 01:57
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #9536:  
[BEAM-8164][website] Correct document for building the python SDK ha…
URL: https://github.com/apache/beam/pull/9536#issuecomment-534817098
 
 
   Thanks for the remind @ibzib  I have update the PR and rebase the code.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317986)
Time Spent: 1h 10m  (was: 1h)

> Correct document for building the python SDK harness container
> --
>
> Key: BEAM-8164
> URL: https://issues.apache.org/jira/browse/BEAM-8164
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In the runner document, it is described that we can use the command: 
> `./gradlew :sdks:python:container:docker` 
>  to Build the SDK harness container, see 
> ([https://beam.apache.org/documentation/runners/flink/)].
> However, the docker config has been removed with the latest python3 docker 
> related commit [1] the command would failed with the following error message.
> {code:java}
>  > Task :sdks:python:container:docker FAILED
>  FAILURE: Build failed with an exception.
>  * What went wrong:
>  Execution failed for task ':sdks:python:container:docker'.
>  > name is a required docker configuration item.{code}
> I think we should also adapt the document with command: `./gradlew 
> :sdks:python:container:py2:docker`? Or add the config when run 
> `:sdks:python:container:docker` auto run all the python version docker?
>  
> What do you think?
>  
> [1] 
> [https://github.com/apache/beam/commit/47feeafb21023e2a60ae51737cc4000a2033719c#diff-1bc5883bcfcc9e883ab7df09e4dcddb0L63]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8301) Argument inference breaks on incomparable types as defaults.

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8301?focusedWorklogId=317955=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317955
 ]

ASF GitHub Bot logged work on BEAM-8301:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:57
Start Date: 25/Sep/19 00:57
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #9657: 
[BEAM-8301] Cherry-pick default argument comparison fixes.
URL: https://github.com/apache/beam/pull/9657
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317955)
Time Spent: 1.5h  (was: 1h 20m)

> Argument inference breaks on incomparable types as defaults.
> 
>
> Key: BEAM-8301
> URL: https://issues.apache.org/jira/browse/BEAM-8301
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0
>Reporter: Robert Bradshaw
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> A common culprit is numpy arrays, e.g.
> {code:python}
> class MyDoFn(beam.DoFn):
>   def process(element, arg=np.ndarray(...)):
> ... 
> {code}
> This bug was introduced as part of [BEAM-7060].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5820) Vendor Calcite

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=317956=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317956
 ]

ASF GitHub Bot logged work on BEAM-5820:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:57
Start Date: 25/Sep/19 00:57
Worklog Time Spent: 10m 
  Work Description: vectorijk commented on issue #9189: [BEAM-5820] Use 
vendored calcite
URL: https://github.com/apache/beam/pull/9189#issuecomment-534804951
 
 
   Thanks Luke!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317956)
Time Spent: 16h  (was: 15h 50m)

> Vendor Calcite
> --
>
> Key: BEAM-5820
> URL: https://issues.apache.org/jira/browse/BEAM-5820
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kai Jiang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 16h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8160) Add instructions about how to set FnApi multi-threads/processes

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8160?focusedWorklogId=317953=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317953
 ]

ASF GitHub Bot logged work on BEAM-8160:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:45
Start Date: 25/Sep/19 00:45
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on issue #9628: [BEAM-8160] Add 
FnApi execution mode instruction
URL: https://github.com/apache/beam/pull/9628#issuecomment-534802621
 
 
   Thanks @aaltay for fix and merge. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317953)
Time Spent: 50m  (was: 40m)

> Add instructions about how to set FnApi multi-threads/processes
> ---
>
> Key: BEAM-8160
> URL: https://issues.apache.org/jira/browse/BEAM-8160
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add instructions to Beam site or Beam wiki for easy discovery.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317939=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317939
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:43
Start Date: 25/Sep/19 00:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327884859
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam;>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
+
+To specify an older Python SDK version, like 2.3.0, build the container with 
the `docker-tag` option:
+
+```
+./gradlew docker -Pdocker-tag=2.3.0
 
 Review comment:
   Let's use one of language specific commands here, we don't encourage users 
to use this command.
   And change tag to a different one, date would be a good example.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317939)
Time Spent: 3h  (was: 2h 50m)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317952=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317952
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:43
Start Date: 25/Sep/19 00:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327886607
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam;>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
+
+To specify an older Python SDK version, like 2.3.0, build the container with 
the `docker-tag` option:
+
+```
+./gradlew docker -Pdocker-tag=2.3.0
+```
+
+To change the `docker` target, build the container with the 
`docker-repository-root` option:
+
+```
+./gradlew docker -Pdocker-repository-root=$LOCATION
+```
+
+## Customizing container images
+
+You can add extra dependencies or serialization files to container images so 
the execution engine doesn't need them.
+
+To customize a container image, either:
+* [Write a new](#writing-new-dockerfiles) 
[Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the 
original
+* [Modify](#modifying-dockerfiles) the [original 
Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile)
 and reimage the container
+
+It's often easier to write a new Dockerfile, but you can customize anything, 
including the base OS, by modifying the original.
+
+### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles}
+
+
+
+Pull a https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release;>prebuilt
 SDK container image for your target language and version.
+
+
+https://docs.docker.com/develop/develop-images/dockerfile_best-practices/;>Write
 a new Dockerfile that https://docs.docker.com/engine/reference/builder/#from;>designates 
the original as its https://docs.docker.com/glossary/?term=parent%20image;>parent
+
+
+Build a child image: docker build -f /path/to/new/Dockerfile
+
+
+
+### Modifying the original Dockerfile {#modifying-dockerfiles}
+
+1. Pull the [prebuilt SDK container 
image](https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release)
 for your target language and version
 
 Review comment:
   We don't need to pull 

[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317935=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317935
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:43
Start Date: 25/Sep/19 00:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327883788
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam;>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
 
 Review comment:
   default version and location -> default repository and tag.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317935)
Time Spent: 2.5h  (was: 2h 20m)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317949=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317949
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:43
Start Date: 25/Sep/19 00:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327886782
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam;>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
+
+To specify an older Python SDK version, like 2.3.0, build the container with 
the `docker-tag` option:
+
+```
+./gradlew docker -Pdocker-tag=2.3.0
+```
+
+To change the `docker` target, build the container with the 
`docker-repository-root` option:
+
+```
+./gradlew docker -Pdocker-repository-root=$LOCATION
+```
+
+## Customizing container images
+
+You can add extra dependencies or serialization files to container images so 
the execution engine doesn't need them.
+
+To customize a container image, either:
+* [Write a new](#writing-new-dockerfiles) 
[Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the 
original
+* [Modify](#modifying-dockerfiles) the [original 
Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile)
 and reimage the container
+
+It's often easier to write a new Dockerfile, but you can customize anything, 
including the base OS, by modifying the original.
+
+### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles}
+
+
+
+Pull a https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release;>prebuilt
 SDK container image for your target language and version.
 
 Review comment:
   The link should be `https://cloud.docker.com/u/apachebeam/repository/list`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317949)
Time Spent: 3h 40m  (was: 3.5h)

> Document custom docker containers
> -
>
> Key: BEAM-8209
>  

[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317946=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317946
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:43
Start Date: 25/Sep/19 00:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327885141
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam;>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
+
+To specify an older Python SDK version, like 2.3.0, build the container with 
the `docker-tag` option:
+
+```
+./gradlew docker -Pdocker-tag=2.3.0
+```
+
+To change the `docker` target, build the container with the 
`docker-repository-root` option:
 
 Review comment:
   target -> repository
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317946)
Time Spent: 3h 20m  (was: 3h 10m)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317940=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317940
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:43
Start Date: 25/Sep/19 00:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327875371
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam;>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
 
 Review comment:
   In order to make it consist with python command, it'd be better to change 
this to 
   `./gradlew :sdks:java:container:docker`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317940)
Time Spent: 3h 10m  (was: 3h)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317944=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317944
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:43
Start Date: 25/Sep/19 00:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327885258
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam;>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
+
+To specify an older Python SDK version, like 2.3.0, build the container with 
the `docker-tag` option:
+
+```
+./gradlew docker -Pdocker-tag=2.3.0
+```
+
+To change the `docker` target, build the container with the 
`docker-repository-root` option:
+
+```
+./gradlew docker -Pdocker-repository-root=$LOCATION
 
 Review comment:
   same here, let's use one of language specific commands here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317944)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317937=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317937
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:43
Start Date: 25/Sep/19 00:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327887183
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam;>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
+
+To specify an older Python SDK version, like 2.3.0, build the container with 
the `docker-tag` option:
+
+```
+./gradlew docker -Pdocker-tag=2.3.0
+```
+
+To change the `docker` target, build the container with the 
`docker-repository-root` option:
+
+```
+./gradlew docker -Pdocker-repository-root=$LOCATION
+```
+
+## Customizing container images
+
+You can add extra dependencies or serialization files to container images so 
the execution engine doesn't need them.
+
+To customize a container image, either:
+* [Write a new](#writing-new-dockerfiles) 
[Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the 
original
+* [Modify](#modifying-dockerfiles) the [original 
Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile)
 and reimage the container
+
+It's often easier to write a new Dockerfile, but you can customize anything, 
including the base OS, by modifying the original.
+
+### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles}
+
+
+
+Pull a https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release;>prebuilt
 SDK container image for your target language and version.
+
+
+https://docs.docker.com/develop/develop-images/dockerfile_best-practices/;>Write
 a new Dockerfile that https://docs.docker.com/engine/reference/builder/#from;>designates 
the original as its https://docs.docker.com/glossary/?term=parent%20image;>parent
+
+
+Build a child image: docker build -f /path/to/new/Dockerfile
+
+
+
+### Modifying the original Dockerfile {#modifying-dockerfiles}
+
+1. Pull the [prebuilt SDK container 
image](https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release)
 for your target language and version
+2. Customize the 

[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317951=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317951
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:43
Start Date: 25/Sep/19 00:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327889107
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam;>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
+
+To specify an older Python SDK version, like 2.3.0, build the container with 
the `docker-tag` option:
+
+```
+./gradlew docker -Pdocker-tag=2.3.0
+```
+
+To change the `docker` target, build the container with the 
`docker-repository-root` option:
+
+```
+./gradlew docker -Pdocker-repository-root=$LOCATION
+```
+
+## Customizing container images
+
+You can add extra dependencies or serialization files to container images so 
the execution engine doesn't need them.
+
+To customize a container image, either:
+* [Write a new](#writing-new-dockerfiles) 
[Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the 
original
+* [Modify](#modifying-dockerfiles) the [original 
Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile)
 and reimage the container
+
+It's often easier to write a new Dockerfile, but you can customize anything, 
including the base OS, by modifying the original.
+
+### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles}
+
+
+
+Pull a https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release;>prebuilt
 SDK container image for your target language and version.
+
+
+https://docs.docker.com/develop/develop-images/dockerfile_best-practices/;>Write
 a new Dockerfile that https://docs.docker.com/engine/reference/builder/#from;>designates 
the original as its https://docs.docker.com/glossary/?term=parent%20image;>parent
+
+
+Build a child image: docker build -f /path/to/new/Dockerfile
+
+
+
+### Modifying the original Dockerfile {#modifying-dockerfiles}
+
+1. Pull the [prebuilt SDK container 
image](https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release)
 for your target language and version
+2. Customize the 

[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317950=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317950
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:43
Start Date: 25/Sep/19 00:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327889215
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam;>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
+
+To specify an older Python SDK version, like 2.3.0, build the container with 
the `docker-tag` option:
+
+```
+./gradlew docker -Pdocker-tag=2.3.0
+```
+
+To change the `docker` target, build the container with the 
`docker-repository-root` option:
+
+```
+./gradlew docker -Pdocker-repository-root=$LOCATION
+```
+
+## Customizing container images
+
+You can add extra dependencies or serialization files to container images so 
the execution engine doesn't need them.
+
+To customize a container image, either:
+* [Write a new](#writing-new-dockerfiles) 
[Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the 
original
+* [Modify](#modifying-dockerfiles) the [original 
Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile)
 and reimage the container
+
+It's often easier to write a new Dockerfile, but you can customize anything, 
including the base OS, by modifying the original.
+
+### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles}
+
+
+
+Pull a https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release;>prebuilt
 SDK container image for your target language and version.
+
+
+https://docs.docker.com/develop/develop-images/dockerfile_best-practices/;>Write
 a new Dockerfile that https://docs.docker.com/engine/reference/builder/#from;>designates 
the original as its https://docs.docker.com/glossary/?term=parent%20image;>parent
+
+
+Build a child image: docker build -f /path/to/new/Dockerfile
+
+
+
+### Modifying the original Dockerfile {#modifying-dockerfiles}
+
+1. Pull the [prebuilt SDK container 
image](https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release)
 for your target language and version
+2. Customize the 

[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317947=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317947
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:43
Start Date: 25/Sep/19 00:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327878719
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam;>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
 
 Review comment:
   default Docker registry is Docker Hub.
   version -> tag
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317947)
Time Spent: 3.5h  (was: 3h 20m)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317934=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317934
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:43
Start Date: 25/Sep/19 00:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327873509
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
 
 Review comment:
   We push images to Docker Hub with `apachebeam` repository now, not Bintray 
with `apache`. 
   Users don't need to create accounts to pull images.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317934)
Time Spent: 2h 20m  (was: 2h 10m)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317942=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317942
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:43
Start Date: 25/Sep/19 00:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327878351
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam;>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
 
 Review comment:
   Let's use these examples.
   ```
 REPOSITORY  TAG  IMAGE ID  
   CREATED SIZE
   apachebeam/java_sdk latest  16ca619d489e
2 weeks ago550MB
   apachebeam/python2.7_sdklatest  b6fb40539c29
2 weeks ago   1.78GB
   apachebeam/python3.5_sdklatest  bae309000d09
2 weeks ago   1.85GB
   apachebeam/python3.6_sdklatest  42faad307d1a
2 weeks ago   1.86GB
   apachebeam/python3.7_sdklatest  18267df54139
2 weeks ago   1.86GB
   apachebeam/go_sdk   latest  30cf602e9763
2 weeks ago124MB
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317942)
Time Spent: 3h 10m  (was: 3h)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317941=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317941
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:43
Start Date: 25/Sep/19 00:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327874238
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam;>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
 
 Review comment:
   I think it's better to remove this section, it may confuse users. Instead, 
guiding users directly to following section is more practical.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317941)
Time Spent: 3h 10m  (was: 3h)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317945=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317945
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:43
Start Date: 25/Sep/19 00:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327875577
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam;>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
 
 Review comment:
   In order to make it consist with python command, it'd be better to change 
this to
   `./gradlew :sdks:go:container:docker`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317945)
Time Spent: 3h 20m  (was: 3h 10m)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317936=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317936
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:43
Start Date: 25/Sep/19 00:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327888412
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam;>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
+
+To specify an older Python SDK version, like 2.3.0, build the container with 
the `docker-tag` option:
+
+```
+./gradlew docker -Pdocker-tag=2.3.0
+```
+
+To change the `docker` target, build the container with the 
`docker-repository-root` option:
+
+```
+./gradlew docker -Pdocker-repository-root=$LOCATION
+```
+
+## Customizing container images
+
+You can add extra dependencies or serialization files to container images so 
the execution engine doesn't need them.
+
+To customize a container image, either:
+* [Write a new](#writing-new-dockerfiles) 
[Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the 
original
+* [Modify](#modifying-dockerfiles) the [original 
Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile)
 and reimage the container
+
+It's often easier to write a new Dockerfile, but you can customize anything, 
including the base OS, by modifying the original.
+
+### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles}
+
+
+
+Pull a https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release;>prebuilt
 SDK container image for your target language and version.
+
+
+https://docs.docker.com/develop/develop-images/dockerfile_best-practices/;>Write
 a new Dockerfile that https://docs.docker.com/engine/reference/builder/#from;>designates 
the original as its https://docs.docker.com/glossary/?term=parent%20image;>parent
+
+
+Build a child image: docker build -f /path/to/new/Dockerfile
+
+
+
+### Modifying the original Dockerfile {#modifying-dockerfiles}
+
+1. Pull the [prebuilt SDK container 
image](https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release)
 for your target language and version
+2. Customize the 

[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317938=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317938
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:43
Start Date: 25/Sep/19 00:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327883557
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam;>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
 
 Review comment:
   This should be `index.docker.io/apachebeam`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317938)
Time Spent: 3h  (was: 2h 50m)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317943=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317943
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:43
Start Date: 25/Sep/19 00:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327888664
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam;>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
+
+To specify an older Python SDK version, like 2.3.0, build the container with 
the `docker-tag` option:
+
+```
+./gradlew docker -Pdocker-tag=2.3.0
+```
+
+To change the `docker` target, build the container with the 
`docker-repository-root` option:
+
+```
+./gradlew docker -Pdocker-repository-root=$LOCATION
+```
+
+## Customizing container images
+
+You can add extra dependencies or serialization files to container images so 
the execution engine doesn't need them.
+
+To customize a container image, either:
+* [Write a new](#writing-new-dockerfiles) 
[Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the 
original
+* [Modify](#modifying-dockerfiles) the [original 
Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile)
 and reimage the container
+
+It's often easier to write a new Dockerfile, but you can customize anything, 
including the base OS, by modifying the original.
+
+### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles}
+
+
+
+Pull a https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release;>prebuilt
 SDK container image for your target language and version.
+
+
+https://docs.docker.com/develop/develop-images/dockerfile_best-practices/;>Write
 a new Dockerfile that https://docs.docker.com/engine/reference/builder/#from;>designates 
the original as its https://docs.docker.com/glossary/?term=parent%20image;>parent
+
+
+Build a child image: docker build -f /path/to/new/Dockerfile
+
+
+
+### Modifying the original Dockerfile {#modifying-dockerfiles}
+
+1. Pull the [prebuilt SDK container 
image](https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release)
 for your target language and version
+2. Customize the 

[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317948=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317948
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:43
Start Date: 25/Sep/19 00:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327887723
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam;>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
+
+To specify an older Python SDK version, like 2.3.0, build the container with 
the `docker-tag` option:
+
+```
+./gradlew docker -Pdocker-tag=2.3.0
+```
+
+To change the `docker` target, build the container with the 
`docker-repository-root` option:
+
+```
+./gradlew docker -Pdocker-repository-root=$LOCATION
+```
+
+## Customizing container images
+
+You can add extra dependencies or serialization files to container images so 
the execution engine doesn't need them.
+
+To customize a container image, either:
+* [Write a new](#writing-new-dockerfiles) 
[Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the 
original
+* [Modify](#modifying-dockerfiles) the [original 
Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile)
 and reimage the container
+
+It's often easier to write a new Dockerfile, but you can customize anything, 
including the base OS, by modifying the original.
+
+### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles}
+
+
+
+Pull a https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release;>prebuilt
 SDK container image for your target language and version.
+
+
+https://docs.docker.com/develop/develop-images/dockerfile_best-practices/;>Write
 a new Dockerfile that https://docs.docker.com/engine/reference/builder/#from;>designates 
the original as its https://docs.docker.com/glossary/?term=parent%20image;>parent
+
+
+Build a child image: docker build -f /path/to/new/Dockerfile
+
+
+
+### Modifying the original Dockerfile {#modifying-dockerfiles}
+
+1. Pull the [prebuilt SDK container 
image](https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release)
 for your target language and version
+2. Customize the 

[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317932=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317932
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:42
Start Date: 25/Sep/19 00:42
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327874981
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam;>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
 
 Review comment:
   This command would not work anymore. We should use following commands.
   To create images for all versions of python: `./gradlew 
:sdks:python:container buildAll`
   To create an image for a certain python version: `./gradlew 
:sdks:python:container:{version}:docker` where version is [py2,  py35, py36, 
py37].
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317932)
Time Spent: 2h  (was: 1h 50m)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317931=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317931
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:42
Start Date: 25/Sep/19 00:42
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327888242
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam;>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
+
+To specify an older Python SDK version, like 2.3.0, build the container with 
the `docker-tag` option:
+
+```
+./gradlew docker -Pdocker-tag=2.3.0
+```
+
+To change the `docker` target, build the container with the 
`docker-repository-root` option:
+
+```
+./gradlew docker -Pdocker-repository-root=$LOCATION
+```
+
+## Customizing container images
+
+You can add extra dependencies or serialization files to container images so 
the execution engine doesn't need them.
+
+To customize a container image, either:
+* [Write a new](#writing-new-dockerfiles) 
[Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the 
original
+* [Modify](#modifying-dockerfiles) the [original 
Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile)
 and reimage the container
+
+It's often easier to write a new Dockerfile, but you can customize anything, 
including the base OS, by modifying the original.
+
+### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles}
+
+
+
+Pull a https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release;>prebuilt
 SDK container image for your target language and version.
+
+
+https://docs.docker.com/develop/develop-images/dockerfile_best-practices/;>Write
 a new Dockerfile that https://docs.docker.com/engine/reference/builder/#from;>designates 
the original as its https://docs.docker.com/glossary/?term=parent%20image;>parent
+
+
+Build a child image: docker build -f /path/to/new/Dockerfile
+
+
+
+### Modifying the original Dockerfile {#modifying-dockerfiles}
+
+1. Pull the [prebuilt SDK container 
image](https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release)
 for your target language and version
+2. Customize the 

[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317933=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317933
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:43
Start Date: 25/Sep/19 00:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327883934
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam;>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
+
+To specify an older Python SDK version, like 2.3.0, build the container with 
the `docker-tag` option:
 
 Review comment:
   The purpose of using `docker-tag` is to overwrite default tag, not used to 
specify an older Python SDK version. Python version should be specified at 
gradlew command.
   This can be changed to something like
   ```
   To overwrite the default image tag, we can use `docker-tag` option to pass a 
new tag.
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317933)
Time Spent: 2h 10m  (was: 2h)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1296) Providing a small dataset for "Apache Beam Mobile Gaming Pipeline Examples"

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1296?focusedWorklogId=317930=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317930
 ]

ASF GitHub Bot logged work on BEAM-1296:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:38
Start Date: 25/Sep/19 00:38
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9633: [BEAM-1296] Providing 
a small dataset for "Apache Beam Mobile Gaming …
URL: https://github.com/apache/beam/pull/9633#issuecomment-534801347
 
 
   I can push your file to the GCS bucket. I am just wondering if you think 
that's a good idea? If so, we'd have to amend the docstring.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317930)
Time Spent: 40m  (was: 0.5h)

> Providing a small dataset for "Apache Beam Mobile Gaming Pipeline Examples"
> ---
>
> Key: BEAM-1296
> URL: https://issues.apache.org/jira/browse/BEAM-1296
> Project: Beam
>  Issue Type: Wish
>  Components: examples-java
>Reporter: Keiji Yoshida
>Assignee: John Patoch
>Priority: Trivial
>  Labels: ccoss2019, newbie, starter
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> A dataset "gs://apache-beam-samples/game/gaming_data*.csv" for "Apache Beam 
> Mobile Gaming Pipeline Examples" is so huge (about 12 GB) and it takes long 
> time to download the dataset. It might pose difficulties to Apache Beam 
> beginners who want to try "Apache Beam Mobile Gaming Pipeline Examples" 
> quickly.
> How about providing a small dataset (say less than 1 GB) for this examples?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1296) Providing a small dataset for "Apache Beam Mobile Gaming Pipeline Examples"

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1296?focusedWorklogId=317929=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317929
 ]

ASF GitHub Bot logged work on BEAM-1296:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:37
Start Date: 25/Sep/19 00:37
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9633: [BEAM-1296] Providing 
a small dataset for "Apache Beam Mobile Gaming …
URL: https://github.com/apache/beam/pull/9633#issuecomment-534801219
 
 
   Thanks for helping to generate the data. Maybe we should push it to a GCS 
bucket instead of keeping it in the Github repo?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317929)
Time Spent: 0.5h  (was: 20m)

> Providing a small dataset for "Apache Beam Mobile Gaming Pipeline Examples"
> ---
>
> Key: BEAM-1296
> URL: https://issues.apache.org/jira/browse/BEAM-1296
> Project: Beam
>  Issue Type: Wish
>  Components: examples-java
>Reporter: Keiji Yoshida
>Assignee: John Patoch
>Priority: Trivial
>  Labels: ccoss2019, newbie, starter
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> A dataset "gs://apache-beam-samples/game/gaming_data*.csv" for "Apache Beam 
> Mobile Gaming Pipeline Examples" is so huge (about 12 GB) and it takes long 
> time to download the dataset. It might pose difficulties to Apache Beam 
> beginners who want to try "Apache Beam Mobile Gaming Pipeline Examples" 
> quickly.
> How about providing a small dataset (say less than 1 GB) for this examples?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8311) Fix python mongodbio display data type

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8311?focusedWorklogId=317928=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317928
 ]

ASF GitHub Bot logged work on BEAM-8311:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:35
Start Date: 25/Sep/19 00:35
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9656: [BEAM-8311] 
Fix py mongodbio display data
URL: https://github.com/apache/beam/pull/9656
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317928)
Time Spent: 40m  (was: 0.5h)

> Fix python mongodbio display data type
> --
>
> Key: BEAM-8311
> URL: https://issues.apache.org/jira/browse/BEAM-8311
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When I try to write document to mongodb through
>  | "Write User Doc to Mongo" >> beam.io.WriteToMongoDB(uri=MONGO_URI,
>db="dbname",
>coll="col_name"
>))
> Error {{ValueError: Invalid DisplayDataItem. Value {} is of an unsupported 
> type.}}
>  ERROR:root:Error while visiting Write User Doc to Mongo/ParDo(_WriteMongoFn)
> Traceback (most recent call last):
>   File "beam_home.py", line 317, in 
> run()
>   File "beam_home.py", line 312, in run
> p.run().wait_until_finish()
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 406, in run
> self._options).run(False)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 419, in run
> return self.runner.run_pipeline(self, self._options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 469, in run_pipeline
> super(DataflowRunner, self).run_pipeline(pipeline, options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 
> 158, in run_pipeline
> pipeline.visit(RunVisitor(self))
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 447, in visit
> self._root_transform().visit(visitor, self, visited)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 824, in visit
> part.visit(visitor, pipeline, visited)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 824, in visit
> part.visit(visitor, pipeline, visited)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 827, in visit
> visitor.visit_transform(self)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 
> 153, in visit_transform
> self.runner.run_transform(transform_node, options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 
> 196, in run_transform
> return m(transform_node, options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 807, in run_ParDo
> transform_node.transform.output_tags)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 590, in _add_step
> DisplayData.create_from(transform_node.transform).items])
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/transforms/display.py", 
> line 274, in get_dict
> self.is_valid()
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/transforms/display.py", 
> line 246, in is_valid
> .format(self.value))
> ValueError: Invalid DisplayDataItem. Value {} is of an unsupported type.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8312) Flink portable pipeline jars do not need to stage artifacts remotely

2019-09-24 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8312:
--
Description: 
Currently, Flink job jars re-stage all artifacts at runtime (on the JobManager) 
by using the usual BeamFileSystemArtifactRetrievalService [1]. However, since 
the manifest and all the artifacts live on the classpath of the jar, and 
everything from the classpath is copied to the Flink workers anyway [2], it 
should not be necessary to do additional artifact staging. We could replace 
BeamFileSystemArtifactRetrievalService in this case with a simple 
ArtifactRetrievalService that just pulls the artifacts from the classpath.

 

 [1] 
[https://github.com/apache/beam/blob/340c3202b1e5824b959f5f9f626e4c7c7842a3cb/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactRetrievalService.java]

[2] 
[https://github.com/apache/beam/blob/2f1b56ccc506054e40afe4793a8b556e872e1865/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java#L93]

  was:
Currently, Flink job jars stage all artifacts by using the usual 
BeamFileSystemArtifactRetrievalService [1]. However, since the manifest and all 
the artifacts live on the classpath of the jar, and everything from the 
classpath is copied to the Flink workers anyway, it should not be necessary to 
do additional artifact staging. We could replace 
BeamFileSystemArtifactRetrievalService in this case with a simple 
ArtifactRetrievalService that just pulls the artifacts from the classpath.

 

 [1] 
[https://github.com/apache/beam/blob/340c3202b1e5824b959f5f9f626e4c7c7842a3cb/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactRetrievalService.java]


> Flink portable pipeline jars do not need to stage artifacts remotely
> 
>
> Key: BEAM-8312
> URL: https://issues.apache.org/jira/browse/BEAM-8312
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>
> Currently, Flink job jars re-stage all artifacts at runtime (on the 
> JobManager) by using the usual BeamFileSystemArtifactRetrievalService [1]. 
> However, since the manifest and all the artifacts live on the classpath of 
> the jar, and everything from the classpath is copied to the Flink workers 
> anyway [2], it should not be necessary to do additional artifact staging. We 
> could replace BeamFileSystemArtifactRetrievalService in this case with a 
> simple ArtifactRetrievalService that just pulls the artifacts from the 
> classpath.
>  
>  [1] 
> [https://github.com/apache/beam/blob/340c3202b1e5824b959f5f9f626e4c7c7842a3cb/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactRetrievalService.java]
> [2] 
> [https://github.com/apache/beam/blob/2f1b56ccc506054e40afe4793a8b556e872e1865/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java#L93]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8312) Flink portable pipeline jars do not need to stage artifacts remotely

2019-09-24 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937279#comment-16937279
 ] 

Kyle Weaver commented on BEAM-8312:
---

cc [~angoenka]

> Flink portable pipeline jars do not need to stage artifacts remotely
> 
>
> Key: BEAM-8312
> URL: https://issues.apache.org/jira/browse/BEAM-8312
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>
> Currently, Flink job jars stage all artifacts by using the usual 
> BeamFileSystemArtifactRetrievalService [1]. However, since the manifest and 
> all the artifacts live on the classpath of the jar, and everything from the 
> classpath is copied to the Flink workers anyway, it should not be necessary 
> to do additional artifact staging. We could replace 
> BeamFileSystemArtifactRetrievalService in this case with a simple 
> ArtifactRetrievalService that just pulls the artifacts from the classpath.
>  
>  [1] 
> [https://github.com/apache/beam/blob/340c3202b1e5824b959f5f9f626e4c7c7842a3cb/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactRetrievalService.java]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8312) Flink portable pipeline jars do not need to stage artifacts remotely

2019-09-24 Thread Kyle Weaver (Jira)
Kyle Weaver created BEAM-8312:
-

 Summary: Flink portable pipeline jars do not need to stage 
artifacts remotely
 Key: BEAM-8312
 URL: https://issues.apache.org/jira/browse/BEAM-8312
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Kyle Weaver
Assignee: Kyle Weaver


Currently, Flink job jars stage all artifacts by using the usual 
BeamFileSystemArtifactRetrievalService [1]. However, since the manifest and all 
the artifacts live on the classpath of the jar, and everything from the 
classpath is copied to the Flink workers anyway, it should not be necessary to 
do additional artifact staging. We could replace 
BeamFileSystemArtifactRetrievalService in this case with a simple 
ArtifactRetrievalService that just pulls the artifacts from the classpath.

 

 [1] 
[https://github.com/apache/beam/blob/340c3202b1e5824b959f5f9f626e4c7c7842a3cb/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactRetrievalService.java]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8301) Argument inference breaks on incomparable types as defaults.

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8301?focusedWorklogId=317921=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317921
 ]

ASF GitHub Bot logged work on BEAM-8301:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:09
Start Date: 25/Sep/19 00:09
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #9657: [BEAM-8301] 
Cherry-pick default argument comparison fixes.
URL: https://github.com/apache/beam/pull/9657#issuecomment-534795175
 
 
   SGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317921)
Time Spent: 1h 20m  (was: 1h 10m)

> Argument inference breaks on incomparable types as defaults.
> 
>
> Key: BEAM-8301
> URL: https://issues.apache.org/jira/browse/BEAM-8301
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0
>Reporter: Robert Bradshaw
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> A common culprit is numpy arrays, e.g.
> {code:python}
> class MyDoFn(beam.DoFn):
>   def process(element, arg=np.ndarray(...)):
> ... 
> {code}
> This bug was introduced as part of [BEAM-7060].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7739) Add ValueState in Python sdk

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7739?focusedWorklogId=317920=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317920
 ]

ASF GitHub Bot logged work on BEAM-7739:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:08
Start Date: 25/Sep/19 00:08
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9067: [BEAM-7739] 
Implement ReadModifyWriteState in Python SDK
URL: https://github.com/apache/beam/pull/9067#issuecomment-534795078
 
 
   Sound good to me. It'd be less work to swap steps (1) and (2). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317920)
Time Spent: 4h  (was: 3h 50m)

> Add ValueState in Python sdk
> 
>
> Key: BEAM-7739
> URL: https://issues.apache.org/jira/browse/BEAM-7739
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rakesh Kumar
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently ValueState is missing from Python Sdks but it is existing in Java 
> sdks. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8301) Argument inference breaks on incomparable types as defaults.

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8301?focusedWorklogId=317919=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317919
 ]

ASF GitHub Bot logged work on BEAM-8301:


Author: ASF GitHub Bot
Created on: 25/Sep/19 00:06
Start Date: 25/Sep/19 00:06
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9657: [BEAM-8301] 
Cherry-pick default argument comparison fixes.
URL: https://github.com/apache/beam/pull/9657#issuecomment-534794615
 
 
   I'll let you merge when you're ready, as you're managing the release. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317919)
Time Spent: 1h 10m  (was: 1h)

> Argument inference breaks on incomparable types as defaults.
> 
>
> Key: BEAM-8301
> URL: https://issues.apache.org/jira/browse/BEAM-8301
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0
>Reporter: Robert Bradshaw
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> A common culprit is numpy arrays, e.g.
> {code:python}
> class MyDoFn(beam.DoFn):
>   def process(element, arg=np.ndarray(...)):
> ... 
> {code}
> This bug was introduced as part of [BEAM-7060].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8299) Upgrade Jackson to version 2.9.10

2019-09-24 Thread Mark Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Liu resolved BEAM-8299.

Resolution: Fixed

> Upgrade Jackson to version 2.9.10
> -
>
> Key: BEAM-8299
> URL: https://issues.apache.org/jira/browse/BEAM-8299
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> [Jackson 2.9.10 addresses multiple CVE 
> issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.10] from 
> previous Jackson versions, so we need to upgrade it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8311) Fix python mongodbio display data type

2019-09-24 Thread Yichi Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937272#comment-16937272
 ] 

Yichi Zhang commented on BEAM-8311:
---

[https://github.com/apache/beam/pull/9656] needs to be cherry-picked as well.

> Fix python mongodbio display data type
> --
>
> Key: BEAM-8311
> URL: https://issues.apache.org/jira/browse/BEAM-8311
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When I try to write document to mongodb through
>  | "Write User Doc to Mongo" >> beam.io.WriteToMongoDB(uri=MONGO_URI,
>db="dbname",
>coll="col_name"
>))
> Error {{ValueError: Invalid DisplayDataItem. Value {} is of an unsupported 
> type.}}
>  ERROR:root:Error while visiting Write User Doc to Mongo/ParDo(_WriteMongoFn)
> Traceback (most recent call last):
>   File "beam_home.py", line 317, in 
> run()
>   File "beam_home.py", line 312, in run
> p.run().wait_until_finish()
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 406, in run
> self._options).run(False)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 419, in run
> return self.runner.run_pipeline(self, self._options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 469, in run_pipeline
> super(DataflowRunner, self).run_pipeline(pipeline, options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 
> 158, in run_pipeline
> pipeline.visit(RunVisitor(self))
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 447, in visit
> self._root_transform().visit(visitor, self, visited)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 824, in visit
> part.visit(visitor, pipeline, visited)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 824, in visit
> part.visit(visitor, pipeline, visited)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 827, in visit
> visitor.visit_transform(self)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 
> 153, in visit_transform
> self.runner.run_transform(transform_node, options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 
> 196, in run_transform
> return m(transform_node, options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 807, in run_ParDo
> transform_node.transform.output_tags)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 590, in _add_step
> DisplayData.create_from(transform_node.transform).items])
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/transforms/display.py", 
> line 274, in get_dict
> self.is_valid()
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/transforms/display.py", 
> line 246, in is_valid
> .format(self.value))
> ValueError: Invalid DisplayDataItem. Value {} is of an unsupported type.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6923?focusedWorklogId=317917=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317917
 ]

ASF GitHub Bot logged work on BEAM-6923:


Author: ASF GitHub Bot
Created on: 24/Sep/19 23:52
Start Date: 24/Sep/19 23:52
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9647: [BEAM-6923] 
limit number of concurrent artifact write to 8
URL: https://github.com/apache/beam/pull/9647#discussion_r327880376
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java
 ##
 @@ -227,9 +235,11 @@ public void onNext(PutArtifactRequest putArtifactRequest) 
{
   encodedFileName(metadata.getMetadata()), 
StandardResolveOptions.RESOLVE_FILE);
   LOG.debug(
   "Going to stage artifact {} to {}.", 
metadata.getMetadata().getName(), artifactId);
-  artifactWritableByteChannel = FileSystems.create(artifactId, 
MimeTypes.BINARY);
   hasher = Hashing.sha256().newHasher();
+  permittedConcurrentWrite.acquire();
 
 Review comment:
   Since gRPC does a lot on a single thread since it multiplexes multiple RPCs 
on a single channel, you may be blocking other RPCs.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317917)
Time Spent: 1h 50m  (was: 1h 40m)

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Assignee: Ankur Goenka
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump 
> size-sorted.png
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6923?focusedWorklogId=317916=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317916
 ]

ASF GitHub Bot logged work on BEAM-6923:


Author: ASF GitHub Bot
Created on: 24/Sep/19 23:52
Start Date: 24/Sep/19 23:52
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9647: [BEAM-6923] 
limit number of concurrent artifact write to 8
URL: https://github.com/apache/beam/pull/9647#discussion_r327877831
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java
 ##
 @@ -77,6 +78,13 @@
   public static final String MANIFEST = "MANIFEST";
   public static final String ARTIFACTS = "artifacts";
 
+  private final Semaphore permittedConcurrentWrite;
+
+  public BeamFileSystemArtifactStagingService() {
+super();
 
 Review comment:
   ```suggestion
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317916)
Time Spent: 1h 40m  (was: 1.5h)

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Assignee: Ankur Goenka
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump 
> size-sorted.png
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8299) Upgrade Jackson to version 2.9.10

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8299?focusedWorklogId=317906=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317906
 ]

ASF GitHub Bot logged work on BEAM-8299:


Author: ASF GitHub Bot
Created on: 24/Sep/19 23:44
Start Date: 24/Sep/19 23:44
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #9637: 
[release-2.16.0][BEAM-8299] Upgrade Jackson to version 2.9.10
URL: https://github.com/apache/beam/pull/9637
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317906)
Time Spent: 2.5h  (was: 2h 20m)

> Upgrade Jackson to version 2.9.10
> -
>
> Key: BEAM-8299
> URL: https://issues.apache.org/jira/browse/BEAM-8299
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> [Jackson 2.9.10 addresses multiple CVE 
> issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.10] from 
> previous Jackson versions, so we need to upgrade it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8301) Argument inference breaks on incomparable types as defaults.

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8301?focusedWorklogId=317900=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317900
 ]

ASF GitHub Bot logged work on BEAM-8301:


Author: ASF GitHub Bot
Created on: 24/Sep/19 23:23
Start Date: 24/Sep/19 23:23
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9657: [BEAM-8301] 
Cherry-pick default argument comparison fixes.
URL: https://github.com/apache/beam/pull/9657#issuecomment-534785562
 
 
   Execution failed for task 
':sdks:python:test-suites:tox:py37:setupVirtualenv'.
   > Process 'command 'sh'' finished with non-zero exit value 1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317900)
Time Spent: 50m  (was: 40m)

> Argument inference breaks on incomparable types as defaults.
> 
>
> Key: BEAM-8301
> URL: https://issues.apache.org/jira/browse/BEAM-8301
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0
>Reporter: Robert Bradshaw
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> A common culprit is numpy arrays, e.g.
> {code:python}
> class MyDoFn(beam.DoFn):
>   def process(element, arg=np.ndarray(...)):
> ... 
> {code}
> This bug was introduced as part of [BEAM-7060].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8301) Argument inference breaks on incomparable types as defaults.

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8301?focusedWorklogId=317901=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317901
 ]

ASF GitHub Bot logged work on BEAM-8301:


Author: ASF GitHub Bot
Created on: 24/Sep/19 23:23
Start Date: 24/Sep/19 23:23
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9657: [BEAM-8301] 
Cherry-pick default argument comparison fixes.
URL: https://github.com/apache/beam/pull/9657#issuecomment-534785590
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317901)
Time Spent: 1h  (was: 50m)

> Argument inference breaks on incomparable types as defaults.
> 
>
> Key: BEAM-8301
> URL: https://issues.apache.org/jira/browse/BEAM-8301
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0
>Reporter: Robert Bradshaw
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> A common culprit is numpy arrays, e.g.
> {code:python}
> class MyDoFn(beam.DoFn):
>   def process(element, arg=np.ndarray(...)):
> ... 
> {code}
> This bug was introduced as part of [BEAM-7060].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8299) Upgrade Jackson to version 2.9.10

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8299?focusedWorklogId=317899=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317899
 ]

ASF GitHub Bot logged work on BEAM-8299:


Author: ASF GitHub Bot
Created on: 24/Sep/19 23:21
Start Date: 24/Sep/19 23:21
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #9637: 
[release-2.16.0][BEAM-8299] Upgrade Jackson to version 2.9.10
URL: https://github.com/apache/beam/pull/9637#issuecomment-534785104
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317899)
Time Spent: 2h 20m  (was: 2h 10m)

> Upgrade Jackson to version 2.9.10
> -
>
> Key: BEAM-8299
> URL: https://issues.apache.org/jira/browse/BEAM-8299
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> [Jackson 2.9.10 addresses multiple CVE 
> issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.10] from 
> previous Jackson versions, so we need to upgrade it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6923?focusedWorklogId=317893=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317893
 ]

ASF GitHub Bot logged work on BEAM-6923:


Author: ASF GitHub Bot
Created on: 24/Sep/19 23:06
Start Date: 24/Sep/19 23:06
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9647: [BEAM-6923] limit 
number of concurrent artifact write to 8
URL: https://github.com/apache/beam/pull/9647#issuecomment-534781826
 
 
   Fair enough : )
   
   On Tue, Sep 24, 2019 at 3:29 PM Ankur  wrote:
   
   > *@angoenka* commented on this pull request.
   > --
   >
   > In
   > 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java
   > :
   >
   > > @@ -77,6 +78,13 @@
   >public static final String MANIFEST = "MANIFEST";
   >public static final String ARTIFACTS = "artifacts";
   >
   > +  private final Semaphore permittedConcurrentWrite;
   > +
   > +  public BeamFileSystemArtifactStagingService() {
   > +super();
   > +permittedConcurrentWrite = new Semaphore(8);
   >
   > The name of the variable permittedConcurrentWrite seems to be descriptive
   > enough hence i avoided creating a new constant.
   > Let me know if you think it will be better to create a constant.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or mute the thread
   > 

   > .
   >
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317893)
Time Spent: 1.5h  (was: 1h 20m)

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Assignee: Ankur Goenka
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump 
> size-sorted.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian Jira

[jira] [Comment Edited] (BEAM-644) Primitive to shift the watermark while assigning timestamps

2019-09-24 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937240#comment-16937240
 ] 

Luke Cwik edited comment on BEAM-644 at 9/24/19 10:54 PM:
--

SplittableDoFn will allow one to control the watermark but you will always be 
limited by how far the watermark has been advanced of upstream transforms. This 
will allow you to have scenarios like Read(PubsubA) --> Read(PubsubB) where 
PubsubA outputs topic names and PubsubB reads messages from the topics produced 
by PubsubA. The watermark of PubsubA will be the upperbound for how far 
PubsubB's watermark can advance. This is still somewhat up for discussion as 
there could be scenarios where you want the downstream transform to be able to 
advance the watermark further then the upstream transforms watermark but this 
leads to usability questions around what should be dropped and where (is this 
now considered late data?).


was (Author: lcwik):
SplittableDoFn will allow one to control the watermark but you will always be 
limited by how far the watermark has been advanced of upstream transforms. This 
will allow you to have scenarios like Read(PubsubA) --> Read(PubsubB) where 
PubsubA outputs topic names and PubsubB reads messages from the topics produced 
by PubsubA. The watermark of PubsubA will be the upperbound for how far 
PubsubB's watermark can advance. This is still somewhat up for discussion as 
there could be scenarios where you want the downstream transform to be able to 
advance the watermark further then the upstream transforms watermark but this 
leads to usability questions around what should be dropped and where.

> Primitive to shift the watermark while assigning timestamps
> ---
>
> Key: BEAM-644
> URL: https://issues.apache.org/jira/browse/BEAM-644
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Kenneth Knowles
>Priority: Major
>
> There is a general need, especially important in the presence of 
> SplittableDoFn, to be able to assign new timestamps to elements without 
> making them late or droppable.
>  - DoFn.withAllowedTimestampSkew is inadequate, because it simply allows one 
> to produce late data, but does not allow one to shift the watermark so the 
> new data is on-time.
>  - For a SplittableDoFn, one may receive an element such as the name of a log 
> file that contains elements for the day preceding the log file. The timestamp 
> on the filename must currently be the beginning of the log. If such elements 
> are constantly flowing, it may be OK, but since we don't know that element is 
> coming, in that absence of data, the watermark may advance. We need a way to 
> keep it far enough back even in the absence of data holding it back.
> One idea is a new primitive ShiftWatermark / AdjustTimestamps with the 
> following pieces:
>  - A constant duration (positive or negative) D by which to shift the 
> watermark.
>  - A function from TimestampedElement to new timestamp that is >= t + D
> So, for example, AdjustTimestamps(<-60 minutes>, f) would allow f to make 
> timestamps up to 60 minutes earlier.
> With this primitive added, outputWithTimestamp and withAllowedTimestampSkew 
> could be removed, simplifying DoFn.
> Alternatively, all of this functionality could be bolted on to DoFn.
> This ticket is not a proposal, but a record of the issue and ideas that were 
> mentioned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-644) Primitive to shift the watermark while assigning timestamps

2019-09-24 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937240#comment-16937240
 ] 

Luke Cwik commented on BEAM-644:


SplittableDoFn will allow one to control the watermark but you will always be 
limited by how far the watermark has been advanced of upstream transforms. This 
will allow you to have scenarios like Read(PubsubA) --> Read(PubsubB) where 
PubsubA outputs topic names and PubsubB reads messages from the topics produced 
by PubsubA. The watermark of PubsubA will be the upperbound for how far 
PubsubB's watermark can advance. This is still somewhat up for discussion as 
there could be scenarios where you want the downstream transform to be able to 
advance the watermark further then the upstream transforms watermark but this 
leads to usability questions around what should be dropped and where.

> Primitive to shift the watermark while assigning timestamps
> ---
>
> Key: BEAM-644
> URL: https://issues.apache.org/jira/browse/BEAM-644
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Kenneth Knowles
>Priority: Major
>
> There is a general need, especially important in the presence of 
> SplittableDoFn, to be able to assign new timestamps to elements without 
> making them late or droppable.
>  - DoFn.withAllowedTimestampSkew is inadequate, because it simply allows one 
> to produce late data, but does not allow one to shift the watermark so the 
> new data is on-time.
>  - For a SplittableDoFn, one may receive an element such as the name of a log 
> file that contains elements for the day preceding the log file. The timestamp 
> on the filename must currently be the beginning of the log. If such elements 
> are constantly flowing, it may be OK, but since we don't know that element is 
> coming, in that absence of data, the watermark may advance. We need a way to 
> keep it far enough back even in the absence of data holding it back.
> One idea is a new primitive ShiftWatermark / AdjustTimestamps with the 
> following pieces:
>  - A constant duration (positive or negative) D by which to shift the 
> watermark.
>  - A function from TimestampedElement to new timestamp that is >= t + D
> So, for example, AdjustTimestamps(<-60 minutes>, f) would allow f to make 
> timestamps up to 60 minutes earlier.
> With this primitive added, outputWithTimestamp and withAllowedTimestampSkew 
> could be removed, simplifying DoFn.
> Alternatively, all of this functionality could be bolted on to DoFn.
> This ticket is not a proposal, but a record of the issue and ideas that were 
> mentioned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6923?focusedWorklogId=317890=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317890
 ]

ASF GitHub Bot logged work on BEAM-6923:


Author: ASF GitHub Bot
Created on: 24/Sep/19 22:28
Start Date: 24/Sep/19 22:28
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #9647: [BEAM-6923] limit 
number of concurrent artifact write to 8
URL: https://github.com/apache/beam/pull/9647#issuecomment-534773332
 
 
   I searched a bit but could't find good documentation and usage.
   Their is also an ongoing issue open on grps for limiting concurrent 
connection https://github.com/grpc/grpc-java/issues/1886
   Do you have any pointers which we can use?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317890)
Time Spent: 1h 20m  (was: 1h 10m)

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Assignee: Ankur Goenka
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump 
> size-sorted.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6923?focusedWorklogId=317889=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317889
 ]

ASF GitHub Bot logged work on BEAM-6923:


Author: ASF GitHub Bot
Created on: 24/Sep/19 22:28
Start Date: 24/Sep/19 22:28
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #9647: [BEAM-6923] 
limit number of concurrent artifact write to 8
URL: https://github.com/apache/beam/pull/9647#discussion_r327860963
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java
 ##
 @@ -77,6 +78,13 @@
   public static final String MANIFEST = "MANIFEST";
   public static final String ARTIFACTS = "artifacts";
 
+  private final Semaphore permittedConcurrentWrite;
+
+  public BeamFileSystemArtifactStagingService() {
+super();
+permittedConcurrentWrite = new Semaphore(8);
 
 Review comment:
   The name of the variable `permittedConcurrentWrite` seems to be descriptive 
enough hence i avoided creating a new constant.
   Let me know if you think it will be better to create a constant.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317889)
Time Spent: 1h 10m  (was: 1h)

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Assignee: Ankur Goenka
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump 
> size-sorted.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8212) StatefulParDoFn creates GC timers for every record

2019-09-24 Thread Akshay Iyangar (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937228#comment-16937228
 ] 

Akshay Iyangar commented on BEAM-8212:
--

 
{code:java}
public class TestDecodeTimer {
  @Test
  public void gctimerValue() throws IOException, ClassNotFoundException {

StateNamespace stateNamespace = 
StateNamespaces.window(GlobalWindow.Coder.INSTANCE, GlobalWindow.INSTANCE);

String GC_TIMER_ID = "__StatefulParDoGcTimerId";
//timerInternals.setTimer(
//StateNamespaces.window(windowCoder, window), GC_TIMER_ID, 
gcTime, TimeDomain.EVENT_TIME);

ByteArrayOutputStream outStream = new ByteArrayOutputStream();
StringUtf8Coder.of().encode(GC_TIMER_ID, outStream);
StringUtf8Coder.of().encode(stateNamespace.stringKey(), outStream);

System.out.println("The output stream is :"+ outStream.toString()); // 
__StatefulParDoGcTimerId//
//We need to find what the hex value representation of this is
String encode = BaseEncoding.base16().encode(outStream.toByteArray());
System.out.println("The encoded string is " + encode); 
//185F5F537461746566756C506172446F476354696D65724964022F2F
// We need everything after this as that is the gctimer and check what the 
value is for it also remove the eventime.

ByteArrayOutputStream outStream1 = new ByteArrayOutputStream();
StringUtf8Coder.of().encode(TimeDomain.EVENT_TIME.toString(), outStream1);
String encode1 = BaseEncoding.base16().encode(outStream1.toByteArray());
System.out.println("The encoded1 string is " + encode1); 
//0A4556454E545F54494D45
System.out.println("Total Length of the encode key: "+ outStream.size());

//Example key
String decode = 
"008020C49BA0BCF7F901006A6176612E6E696F2E4865617042797465427565F20100010C0107313831303639000C0100185F5F537461746566756C506172446F476354696D65724964022F2F8020C49BA0BCF7F80A4556454E545F54494D45";

//So the timer is whatever is between these two 
185F5F537461746566756C506172446F476354696D65724964022F2F and 
0A4556454E545F54494D45 viz 8020C49BA0BCF7F8
Instant timeDecode = InstantCoder.of().decode(new 
ByteArrayInputStream(BaseEncoding.base16().decode(
"8020C49BA0BCF7F8")));

System.out.println("GC timer for Global Window is" +timeDecode); 
//294247-01-10T04:00:54.775Z 
//This is nothing but +infinity and thus these timers would never be 
cleaned as the window never closes.

//just cross verify
System.out.println("MAX value" + BoundedWindow.TIMESTAMP_MAX_VALUE);
System.out.println("MAX: "+GlobalWindow.TIMESTAMP_MAX_VALUE);

  }
}
{code}
So I just wrote a test to verify what the values are that are being generated 
for each of the events. just took one key from rocksdb to analyze and the timer 
is +Infinity or GlobalWindow.TIMESTAMP_MAX_VALUE which makes sense as it's a 
global window.

 

I also went ahead and disabled the timers for global windows to do some 
benchmarking and found that now rocksdb doesn't generate any state for 
WindowDoFnOperator something that was previously generated as below.
{code:java}
/rocksdb/job__op_WindowDoFnOperator_e2c1f521beded61187c1d16f3c146358__3_3__uuid_4e0e102b-ffcd-4111-80f7-b9a8f318d04a/db
{code}
 

Also, I didn't see any keys associated with timers in the StatefulParDoFn .. 
{code:java}
rocksdb_ldb --db=db --column_family=_timer_state/event_beam-timer scan 
--max_keys=100 --key_hex
{code}
returned me zero keys. 

 

I'm running the pipeline to get the exact benchmarks and will keep you updated 
but one thing right of the bat is that we see fewer state operators that mean 
rocksdb will have more memory to play with as it has operator * parallelism # 
of states less as compared to the previous run. w.rt to the speed and 
throughput of the pipeline will update shortly.

 

 

 

 

> StatefulParDoFn creates GC timers for every record 
> ---
>
> Key: BEAM-8212
> URL: https://issues.apache.org/jira/browse/BEAM-8212
> Project: Beam
>  Issue Type: Bug
>  Components: beam-community
>Reporter: Akshay Iyangar
>Assignee: Aizhamal Nurmamat kyzy
>Priority: Major
>
> Hi 
> So currently the StatefulParDoFn create timers for all the records.
> [https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/StatefulDoFnRunner.java#L211]
> This becomes a problem if you are using GlobalWindows for streaming where 
> these timers get created and never get closed since the window will never 
> close.
> This is a problem especially if your memory bound in rocksDB where these 
> timers take up potential space and sloe the pipelines considerably.
> Was wondering that if the pipeline runs in global windows we should avoid 
> adding timers to it at all?
>  
>  
>  



--
This 

[jira] [Work logged] (BEAM-8311) Fix python mongodbio display data type

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8311?focusedWorklogId=317887=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317887
 ]

ASF GitHub Bot logged work on BEAM-8311:


Author: ASF GitHub Bot
Created on: 24/Sep/19 22:17
Start Date: 24/Sep/19 22:17
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9656: [BEAM-8311] Fix py 
mongodbio display data
URL: https://github.com/apache/beam/pull/9656#issuecomment-534770589
 
 
   LGTM. I'll merge after precommits are passing.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317887)
Time Spent: 0.5h  (was: 20m)

> Fix python mongodbio display data type
> --
>
> Key: BEAM-8311
> URL: https://issues.apache.org/jira/browse/BEAM-8311
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When I try to write document to mongodb through
>  | "Write User Doc to Mongo" >> beam.io.WriteToMongoDB(uri=MONGO_URI,
>db="dbname",
>coll="col_name"
>))
> Error {{ValueError: Invalid DisplayDataItem. Value {} is of an unsupported 
> type.}}
>  ERROR:root:Error while visiting Write User Doc to Mongo/ParDo(_WriteMongoFn)
> Traceback (most recent call last):
>   File "beam_home.py", line 317, in 
> run()
>   File "beam_home.py", line 312, in run
> p.run().wait_until_finish()
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 406, in run
> self._options).run(False)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 419, in run
> return self.runner.run_pipeline(self, self._options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 469, in run_pipeline
> super(DataflowRunner, self).run_pipeline(pipeline, options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 
> 158, in run_pipeline
> pipeline.visit(RunVisitor(self))
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 447, in visit
> self._root_transform().visit(visitor, self, visited)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 824, in visit
> part.visit(visitor, pipeline, visited)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 824, in visit
> part.visit(visitor, pipeline, visited)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 827, in visit
> visitor.visit_transform(self)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 
> 153, in visit_transform
> self.runner.run_transform(transform_node, options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 
> 196, in run_transform
> return m(transform_node, options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 807, in run_ParDo
> transform_node.transform.output_tags)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 590, in _add_step
> DisplayData.create_from(transform_node.transform).items])
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/transforms/display.py", 
> line 274, in get_dict
> self.is_valid()
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/transforms/display.py", 
> line 246, in is_valid
> .format(self.value))
> ValueError: Invalid DisplayDataItem. Value {} is of an unsupported type.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8301) Argument inference breaks on incomparable types as defaults.

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8301?focusedWorklogId=317874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317874
 ]

ASF GitHub Bot logged work on BEAM-8301:


Author: ASF GitHub Bot
Created on: 24/Sep/19 21:52
Start Date: 24/Sep/19 21:52
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9657: [BEAM-8301] 
Cherry-pick default argument comparison fixes.
URL: https://github.com/apache/beam/pull/9657
 
 
   This is https://github.com/apache/beam/pull/9641/commits and 
https://github.com/apache/beam/pull/9627/commits
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8301) Argument inference breaks on incomparable types as defaults.

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8301?focusedWorklogId=317862=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317862
 ]

ASF GitHub Bot logged work on BEAM-8301:


Author: ASF GitHub Bot
Created on: 24/Sep/19 21:31
Start Date: 24/Sep/19 21:31
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9641: [BEAM-8301] 
Fix incomparable defaults.
URL: https://github.com/apache/beam/pull/9641
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317862)
Time Spent: 0.5h  (was: 20m)

> Argument inference breaks on incomparable types as defaults.
> 
>
> Key: BEAM-8301
> URL: https://issues.apache.org/jira/browse/BEAM-8301
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0
>Reporter: Robert Bradshaw
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> A common culprit is numpy arrays, e.g.
> {code:python}
> class MyDoFn(beam.DoFn):
>   def process(element, arg=np.ndarray(...)):
> ... 
> {code}
> This bug was introduced as part of [BEAM-7060].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8311) Fix python mongodbio display data type

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8311?focusedWorklogId=317860=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317860
 ]

ASF GitHub Bot logged work on BEAM-8311:


Author: ASF GitHub Bot
Created on: 24/Sep/19 21:30
Start Date: 24/Sep/19 21:30
Worklog Time Spent: 10m 
  Work Description: y1chi commented on pull request #9656: [BEAM-8311] Fix 
py mongodbio display data
URL: https://github.com/apache/beam/pull/9656
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8311) Fix python mongodbio display data type

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8311?focusedWorklogId=317861=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317861
 ]

ASF GitHub Bot logged work on BEAM-8311:


Author: ASF GitHub Bot
Created on: 24/Sep/19 21:30
Start Date: 24/Sep/19 21:30
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #9656: [BEAM-8311] Fix py 
mongodbio display data
URL: https://github.com/apache/beam/pull/9656#issuecomment-534756715
 
 
   R: @pabloem 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317861)
Time Spent: 20m  (was: 10m)

> Fix python mongodbio display data type
> --
>
> Key: BEAM-8311
> URL: https://issues.apache.org/jira/browse/BEAM-8311
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When I try to write document to mongodb through
>  | "Write User Doc to Mongo" >> beam.io.WriteToMongoDB(uri=MONGO_URI,
>db="dbname",
>coll="col_name"
>))
> Error {{ValueError: Invalid DisplayDataItem. Value {} is of an unsupported 
> type.}}
>  ERROR:root:Error while visiting Write User Doc to Mongo/ParDo(_WriteMongoFn)
> Traceback (most recent call last):
>   File "beam_home.py", line 317, in 
> run()
>   File "beam_home.py", line 312, in run
> p.run().wait_until_finish()
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 406, in run
> self._options).run(False)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 419, in run
> return self.runner.run_pipeline(self, self._options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 469, in run_pipeline
> super(DataflowRunner, self).run_pipeline(pipeline, options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 
> 158, in run_pipeline
> pipeline.visit(RunVisitor(self))
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 447, in visit
> self._root_transform().visit(visitor, self, visited)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 824, in visit
> part.visit(visitor, pipeline, visited)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 824, in visit
> part.visit(visitor, pipeline, visited)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 827, in visit
> visitor.visit_transform(self)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 
> 153, in visit_transform
> self.runner.run_transform(transform_node, options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 
> 196, in run_transform
> return m(transform_node, options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 807, in run_ParDo
> transform_node.transform.output_tags)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 590, in _add_step
> DisplayData.create_from(transform_node.transform).items])
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/transforms/display.py", 
> line 274, in get_dict
> self.is_valid()
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/transforms/display.py", 
> line 246, in is_valid
> .format(self.value))
> ValueError: Invalid DisplayDataItem. Value {} is of an unsupported type.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8277) Make docker build quicker

2019-09-24 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8277:
--
Description: 
Building the Python SDK harness container takes minutes on my machine.

 

Possible lead: "We spend mins pulling cmd/beamctl deps."

[https://github.com/apache/beam/blob/47feeafb21023e2a60ae51737cc4000a2033719c/sdks/python/container/build.gradle#L38]

  was:
Building the Python SDK harness container takes minutes on my machine.

Possible lead: "We spend mins pulling cmd/beamctl deps."

[https://github.com/apache/beam/blob/47feeafb21023e2a60ae51737cc4000a2033719c/sdks/python/container/build.gradle#L38]


> Make docker build quicker
> -
>
> Key: BEAM-8277
> URL: https://issues.apache.org/jira/browse/BEAM-8277
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>
> Building the Python SDK harness container takes minutes on my machine.
>  
> Possible lead: "We spend mins pulling cmd/beamctl deps."
> [https://github.com/apache/beam/blob/47feeafb21023e2a60ae51737cc4000a2033719c/sdks/python/container/build.gradle#L38]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8277) Make docker build quicker

2019-09-24 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8277:
--
Description: 
Building the Python SDK harness container takes minutes on my machine.

```

./gradlew :sdks:python:container:buildAll

BUILD SUCCESSFUL in 9m 33s

```

Possible lead: "We spend mins pulling cmd/beamctl deps."

[https://github.com/apache/beam/blob/47feeafb21023e2a60ae51737cc4000a2033719c/sdks/python/container/build.gradle#L38]

  was:
Building the Python SDK harness container takes minutes on my machine.

 

Possible lead: "We spend mins pulling cmd/beamctl deps."

[https://github.com/apache/beam/blob/47feeafb21023e2a60ae51737cc4000a2033719c/sdks/python/container/build.gradle#L38]


> Make docker build quicker
> -
>
> Key: BEAM-8277
> URL: https://issues.apache.org/jira/browse/BEAM-8277
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>
> Building the Python SDK harness container takes minutes on my machine.
> ```
> ./gradlew :sdks:python:container:buildAll
> BUILD SUCCESSFUL in 9m 33s
> ```
> Possible lead: "We spend mins pulling cmd/beamctl deps."
> [https://github.com/apache/beam/blob/47feeafb21023e2a60ae51737cc4000a2033719c/sdks/python/container/build.gradle#L38]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8300) KinesisIO.write causes NPE as the producer is null

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8300?focusedWorklogId=317856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317856
 ]

ASF GitHub Bot logged work on BEAM-8300:


Author: ASF GitHub Bot
Created on: 24/Sep/19 21:20
Start Date: 24/Sep/19 21:20
Worklog Time Spent: 10m 
  Work Description: jhalaria commented on pull request #9640: [BEAM-8300]: 
KinesisIO.write throws NPE because producer is null
URL: https://github.com/apache/beam/pull/9640#discussion_r327839729
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 ##
 @@ -633,6 +634,13 @@ private synchronized void initKinesisProducer() {
 producer = spec.getAWSClientsProvider().createKinesisProducer(config);
   }
 
+  private void readObject(ObjectInputStream is) throws IOException, 
ClassNotFoundException {
+is.defaultReadObject();
+if (producer == null) {
 
 Review comment:
   @aromanenko-dev - Please look at the changes one more time. Thank you.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317856)
Time Spent: 1h 40m  (was: 1.5h)

> KinesisIO.write causes NPE as the producer is null
> --
>
> Key: BEAM-8300
> URL: https://issues.apache.org/jira/browse/BEAM-8300
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.15.0
>Reporter: Ankit Jhalaria
>Assignee: Ankit Jhalaria
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> While using KinesisIO.write(), we encountered a NPE with the following stack 
> trace 
> {code:java}
> org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.run(UnboundedSourceWrapper.java:297)\n\tat
>  
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)\n\tat
>  
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)\n\tat
>  
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)\n\tat
>  
> org.apache.flink.streaming.runtime.tasks.StoppableSourceStreamTask.run(StoppableSourceStreamTask.java:45)\n\tat
>  
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)\n\tat
>  org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)\n\tat 
> java.lang.Thread.run(Thread.java:748)\nCaused by: 
> java.lang.NullPointerException: null\n\tat 
> org.apache.beam.sdk.io.kinesis.KinesisIO$Write$KinesisWriterFn.flushBundle(KinesisIO.java:685)\n\tat
>  
> org.apache.beam.sdk.io.kinesis.KinesisIO$Write$KinesisWriterFn.finishBundle(KinesisIO.java:669){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5820) Vendor Calcite

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=317854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317854
 ]

ASF GitHub Bot logged work on BEAM-5820:


Author: ASF GitHub Bot
Created on: 24/Sep/19 21:17
Start Date: 24/Sep/19 21:17
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9189: [BEAM-5820] Use 
vendored calcite
URL: https://github.com/apache/beam/pull/9189#issuecomment-534752397
 
 
   Thanks Luke!
   
   
   LGTM to the content of changes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317854)
Time Spent: 15h 50m  (was: 15h 40m)

> Vendor Calcite
> --
>
> Key: BEAM-5820
> URL: https://issues.apache.org/jira/browse/BEAM-5820
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kai Jiang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6668) use add experiment methods (Java and Python)

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6668?focusedWorklogId=317846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317846
 ]

ASF GitHub Bot logged work on BEAM-6668:


Author: ASF GitHub Bot
Created on: 24/Sep/19 21:11
Start Date: 24/Sep/19 21:11
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9445: [BEAM-6668] 
Use add experiment methods
URL: https://github.com/apache/beam/pull/9445#discussion_r327836431
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java
 ##
 @@ -327,18 +328,15 @@ public Job translate(List packages) {
   // not enabled.
   if (options.isEnableStreamingEngine()) {
 List experiments = options.getExperiments();
-if (experiments == null) {
-  experiments = new ArrayList();
-} else {
-  experiments = new ArrayList(experiments);
-}
 if (!experiments.contains(GcpOptions.STREAMING_ENGINE_EXPERIMENT)) {
   experiments.add(GcpOptions.STREAMING_ENGINE_EXPERIMENT);
 }
 if (!experiments.contains(GcpOptions.WINDMILL_SERVICE_EXPERIMENT)) {
   experiments.add(GcpOptions.WINDMILL_SERVICE_EXPERIMENT);
 }
-options.setExperiments(experiments);
+experiments
+.parallelStream()
 
 Review comment:
   parallel stream? I don't think we need to use multiple threads to do this.
   
   Also, streams are [significantly 
slower](https://medium.com/@milan.mimica/slow-like-a-stream-fast-like-a-loop-524f70391182)
 then for loops even in JDK 12. You need to have 100k+ elements for parallel 
stream to beat stream and millions for parallel stream to beat for loop.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317846)
Time Spent: 2h 20m  (was: 2h 10m)

> use add experiment methods (Java and Python)
> 
>
> Key: BEAM-6668
> URL: https://issues.apache.org/jira/browse/BEAM-6668
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp, sdk-py-core
>Reporter: Udi Meiri
>Priority: Minor
>  Labels: beginner, easyfix, newbie
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Python:
> Convert instances of experiments.append(...)
> to debug_options.add_experiment(...)
> Java:
> Use ExperimentalOptions.addExperiment(...)
> instead of getExperiments(), modify, setExperiments() pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8021) Add Automatic-Module-Name headers for Beam Java modules

2019-09-24 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-8021.
-
Fix Version/s: 2.17.0
   Resolution: Fixed

> Add Automatic-Module-Name headers for Beam Java modules 
> 
>
> Key: BEAM-8021
> URL: https://issues.apache.org/jira/browse/BEAM-8021
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Lukasz Gajowy
>Priority: Minor
> Fix For: 2.17.0
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> For compatibility with the Java Platform Module System (JPMS) in Java 9 and 
> later, every JAR should have a module name, even if the library does not 
> itself use modules. As [suggested in the mailing 
> list|https://lists.apache.org/thread.html/956065580ce049481e756482dc3ccfdc994fef3b8cdb37cab3e2d9b1@%3Cdev.beam.apache.org%3E],
>  this is a simple change that we can do and still be backwards compatible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8021) Add Automatic-Module-Name headers for Beam Java modules

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8021?focusedWorklogId=317841=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317841
 ]

ASF GitHub Bot logged work on BEAM-8021:


Author: ASF GitHub Bot
Created on: 24/Sep/19 21:04
Start Date: 24/Sep/19 21:04
Worklog Time Spent: 10m 
  Work Description: asfgit commented on pull request #9417: [BEAM-8021] Add 
Automatic-Module-Name headers to beam's artifacts.
URL: https://github.com/apache/beam/pull/9417
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317841)
Time Spent: 7h 50m  (was: 7h 40m)

> Add Automatic-Module-Name headers for Beam Java modules 
> 
>
> Key: BEAM-8021
> URL: https://issues.apache.org/jira/browse/BEAM-8021
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Lukasz Gajowy
>Priority: Minor
> Fix For: 2.17.0
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> For compatibility with the Java Platform Module System (JPMS) in Java 9 and 
> later, every JAR should have a module name, even if the library does not 
> itself use modules. As [suggested in the mailing 
> list|https://lists.apache.org/thread.html/956065580ce049481e756482dc3ccfdc994fef3b8cdb37cab3e2d9b1@%3Cdev.beam.apache.org%3E],
>  this is a simple change that we can do and still be backwards compatible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6668) use add experiment methods (Java and Python)

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6668?focusedWorklogId=317837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317837
 ]

ASF GitHub Bot logged work on BEAM-6668:


Author: ASF GitHub Bot
Created on: 24/Sep/19 21:02
Start Date: 24/Sep/19 21:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9445: [BEAM-6668] Use add 
experiment methods
URL: https://github.com/apache/beam/pull/9445#issuecomment-534746882
 
 
   Thanks for the cleanup. Looks like there's some merge conflicts. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317837)
Time Spent: 2h 10m  (was: 2h)

> use add experiment methods (Java and Python)
> 
>
> Key: BEAM-6668
> URL: https://issues.apache.org/jira/browse/BEAM-6668
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp, sdk-py-core
>Reporter: Udi Meiri
>Priority: Minor
>  Labels: beginner, easyfix, newbie
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Python:
> Convert instances of experiments.append(...)
> to debug_options.add_experiment(...)
> Java:
> Use ExperimentalOptions.addExperiment(...)
> instead of getExperiments(), modify, setExperiments() pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6668) use add experiment methods (Java and Python)

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6668?focusedWorklogId=317834=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317834
 ]

ASF GitHub Bot logged work on BEAM-6668:


Author: ASF GitHub Bot
Created on: 24/Sep/19 21:01
Start Date: 24/Sep/19 21:01
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9445: [BEAM-6668] Use add 
experiment methods
URL: https://github.com/apache/beam/pull/9445#issuecomment-534746497
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317834)
Time Spent: 2h  (was: 1h 50m)

> use add experiment methods (Java and Python)
> 
>
> Key: BEAM-6668
> URL: https://issues.apache.org/jira/browse/BEAM-6668
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp, sdk-py-core
>Reporter: Udi Meiri
>Priority: Minor
>  Labels: beginner, easyfix, newbie
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Python:
> Convert instances of experiments.append(...)
> to debug_options.add_experiment(...)
> Java:
> Use ExperimentalOptions.addExperiment(...)
> instead of getExperiments(), modify, setExperiments() pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-09-24 Thread Derek He (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937189#comment-16937189
 ] 

Derek He commented on BEAM-8306:


[~iemejia] We already had a fix and we are testing it. After finish, I can make 
a pull request and you can review it. Thanks.

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Priority: Major
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8021) Add Automatic-Module-Name headers for Beam Java modules

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8021?focusedWorklogId=317829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317829
 ]

ASF GitHub Bot logged work on BEAM-8021:


Author: ASF GitHub Bot
Created on: 24/Sep/19 21:00
Start Date: 24/Sep/19 21:00
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9417: [BEAM-8021] 
Add Automatic-Module-Name headers to beam's artifacts.
URL: https://github.com/apache/beam/pull/9417#discussion_r327830244
 
 

 ##
 File path: model/job-management/build.gradle
 ##
 @@ -17,10 +17,12 @@
  */
 
 plugins { id 'org.apache.beam.module' }
-applyPortabilityNature(shadowJarValidationExcludes:[
-"org/apache/beam/model/expansion/v1/**",
-"org/apache/beam/model/jobmanagement/v1/**",
-])
+applyPortabilityNature(
+automaticModuleName: 'org.apache.beam.model.job.management',
+shadowJarValidationExcludes: [
+"org/apache/beam/model/expansion/v1/**",
 
 Review comment:
   ```suggestion
   // TODO: Migrate expansion service to separate module.
   "org/apache/beam/model/expansion/v1/**",
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317829)
Time Spent: 7h 40m  (was: 7.5h)

> Add Automatic-Module-Name headers for Beam Java modules 
> 
>
> Key: BEAM-8021
> URL: https://issues.apache.org/jira/browse/BEAM-8021
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Lukasz Gajowy
>Priority: Minor
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> For compatibility with the Java Platform Module System (JPMS) in Java 9 and 
> later, every JAR should have a module name, even if the library does not 
> itself use modules. As [suggested in the mailing 
> list|https://lists.apache.org/thread.html/956065580ce049481e756482dc3ccfdc994fef3b8cdb37cab3e2d9b1@%3Cdev.beam.apache.org%3E],
>  this is a simple change that we can do and still be backwards compatible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8021) Add Automatic-Module-Name headers for Beam Java modules

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8021?focusedWorklogId=317830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317830
 ]

ASF GitHub Bot logged work on BEAM-8021:


Author: ASF GitHub Bot
Created on: 24/Sep/19 21:00
Start Date: 24/Sep/19 21:00
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9417: [BEAM-8021] 
Add Automatic-Module-Name headers to beam's artifacts.
URL: https://github.com/apache/beam/pull/9417#discussion_r327831395
 
 

 ##
 File path: sdks/java/io/common/build.gradle
 ##
 @@ -17,7 +17,7 @@
  */
 
 plugins { id 'org.apache.beam.module' }
-applyJavaNature(exportJavadoc: false)
+applyJavaNature(exportJavadoc: false, automaticModuleName: 
'org.apache.beam.sdk.io.common')
 
 Review comment:
   I agree with you but let us revisit this once this PR is in so you don't 
have to deal with so many merge conflicts/fix-ups.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317830)
Time Spent: 7h 40m  (was: 7.5h)

> Add Automatic-Module-Name headers for Beam Java modules 
> 
>
> Key: BEAM-8021
> URL: https://issues.apache.org/jira/browse/BEAM-8021
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Lukasz Gajowy
>Priority: Minor
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> For compatibility with the Java Platform Module System (JPMS) in Java 9 and 
> later, every JAR should have a module name, even if the library does not 
> itself use modules. As [suggested in the mailing 
> list|https://lists.apache.org/thread.html/956065580ce049481e756482dc3ccfdc994fef3b8cdb37cab3e2d9b1@%3Cdev.beam.apache.org%3E],
>  this is a simple change that we can do and still be backwards compatible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8311) Fix python mongodbio display data type

2019-09-24 Thread Mark Liu (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937188#comment-16937188
 ] 

Mark Liu commented on BEAM-8311:


https://github.com/apache/beam/pull/9601 is original fix to master

> Fix python mongodbio display data type
> --
>
> Key: BEAM-8311
> URL: https://issues.apache.org/jira/browse/BEAM-8311
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Blocker
> Fix For: 2.16.0
>
>
> When I try to write document to mongodb through
>  | "Write User Doc to Mongo" >> beam.io.WriteToMongoDB(uri=MONGO_URI,
>db="dbname",
>coll="col_name"
>))
> Error {{ValueError: Invalid DisplayDataItem. Value {} is of an unsupported 
> type.}}
>  ERROR:root:Error while visiting Write User Doc to Mongo/ParDo(_WriteMongoFn)
> Traceback (most recent call last):
>   File "beam_home.py", line 317, in 
> run()
>   File "beam_home.py", line 312, in run
> p.run().wait_until_finish()
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 406, in run
> self._options).run(False)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 419, in run
> return self.runner.run_pipeline(self, self._options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 469, in run_pipeline
> super(DataflowRunner, self).run_pipeline(pipeline, options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 
> 158, in run_pipeline
> pipeline.visit(RunVisitor(self))
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 447, in visit
> self._root_transform().visit(visitor, self, visited)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 824, in visit
> part.visit(visitor, pipeline, visited)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 824, in visit
> part.visit(visitor, pipeline, visited)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 827, in visit
> visitor.visit_transform(self)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 
> 153, in visit_transform
> self.runner.run_transform(transform_node, options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 
> 196, in run_transform
> return m(transform_node, options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 807, in run_ParDo
> transform_node.transform.output_tags)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 590, in _add_step
> DisplayData.create_from(transform_node.transform).items])
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/transforms/display.py", 
> line 274, in get_dict
> self.is_valid()
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/transforms/display.py", 
> line 246, in is_valid
> .format(self.value))
> ValueError: Invalid DisplayDataItem. Value {} is of an unsupported type.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8311) Fix python mongodbio display data type

2019-09-24 Thread Mark Liu (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937187#comment-16937187
 ] 

Mark Liu commented on BEAM-8311:


https://github.com/apache/beam/pull/9655 for cherry-pick to release

> Fix python mongodbio display data type
> --
>
> Key: BEAM-8311
> URL: https://issues.apache.org/jira/browse/BEAM-8311
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Blocker
> Fix For: 2.16.0
>
>
> When I try to write document to mongodb through
>  | "Write User Doc to Mongo" >> beam.io.WriteToMongoDB(uri=MONGO_URI,
>db="dbname",
>coll="col_name"
>))
> Error {{ValueError: Invalid DisplayDataItem. Value {} is of an unsupported 
> type.}}
>  ERROR:root:Error while visiting Write User Doc to Mongo/ParDo(_WriteMongoFn)
> Traceback (most recent call last):
>   File "beam_home.py", line 317, in 
> run()
>   File "beam_home.py", line 312, in run
> p.run().wait_until_finish()
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 406, in run
> self._options).run(False)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 419, in run
> return self.runner.run_pipeline(self, self._options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 469, in run_pipeline
> super(DataflowRunner, self).run_pipeline(pipeline, options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 
> 158, in run_pipeline
> pipeline.visit(RunVisitor(self))
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 447, in visit
> self._root_transform().visit(visitor, self, visited)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 824, in visit
> part.visit(visitor, pipeline, visited)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 824, in visit
> part.visit(visitor, pipeline, visited)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 827, in visit
> visitor.visit_transform(self)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 
> 153, in visit_transform
> self.runner.run_transform(transform_node, options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 
> 196, in run_transform
> return m(transform_node, options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 807, in run_ParDo
> transform_node.transform.output_tags)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 590, in _add_step
> DisplayData.create_from(transform_node.transform).items])
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/transforms/display.py", 
> line 274, in get_dict
> self.is_valid()
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/transforms/display.py", 
> line 246, in is_valid
> .format(self.value))
> ValueError: Invalid DisplayDataItem. Value {} is of an unsupported type.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8311) Fix python mongodbio display data type

2019-09-24 Thread Yichi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yichi Zhang updated BEAM-8311:
--
Description: 
When I try to write document to mongodb through
 | "Write User Doc to Mongo" >> beam.io.WriteToMongoDB(uri=MONGO_URI,
   db="dbname",
   coll="col_name"
   ))
Error {{ValueError: Invalid DisplayDataItem. Value {} is of an unsupported 
type.}}
 ERROR:root:Error while visiting Write User Doc to Mongo/ParDo(_WriteMongoFn)
Traceback (most recent call last):
  File "beam_home.py", line 317, in 
run()
  File "beam_home.py", line 312, in run
p.run().wait_until_finish()
  File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
406, in run
self._options).run(False)
  File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
419, in run
return self.runner.run_pipeline(self, self._options)
  File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
 line 469, in run_pipeline
super(DataflowRunner, self).run_pipeline(pipeline, options)
  File "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", 
line 158, in run_pipeline
pipeline.visit(RunVisitor(self))
  File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
447, in visit
self._root_transform().visit(visitor, self, visited)
  File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
824, in visit
part.visit(visitor, pipeline, visited)
  File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
824, in visit
part.visit(visitor, pipeline, visited)
  File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
827, in visit
visitor.visit_transform(self)
  File "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", 
line 153, in visit_transform
self.runner.run_transform(transform_node, options)
  File "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", 
line 196, in run_transform
return m(transform_node, options)
  File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
 line 807, in run_ParDo
transform_node.transform.output_tags)
  File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
 line 590, in _add_step
DisplayData.create_from(transform_node.transform).items])
  File 
"/usr/local/lib/python2.7/site-packages/apache_beam/transforms/display.py", 
line 274, in get_dict
self.is_valid()
  File 
"/usr/local/lib/python2.7/site-packages/apache_beam/transforms/display.py", 
line 246, in is_valid
.format(self.value))
ValueError: Invalid DisplayDataItem. Value {} is of an unsupported type.
 

> Fix python mongodbio display data type
> --
>
> Key: BEAM-8311
> URL: https://issues.apache.org/jira/browse/BEAM-8311
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Blocker
> Fix For: 2.16.0
>
>
> When I try to write document to mongodb through
>  | "Write User Doc to Mongo" >> beam.io.WriteToMongoDB(uri=MONGO_URI,
>db="dbname",
>coll="col_name"
>))
> Error {{ValueError: Invalid DisplayDataItem. Value {} is of an unsupported 
> type.}}
>  ERROR:root:Error while visiting Write User Doc to Mongo/ParDo(_WriteMongoFn)
> Traceback (most recent call last):
>   File "beam_home.py", line 317, in 
> run()
>   File "beam_home.py", line 312, in run
> p.run().wait_until_finish()
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 406, in run
> self._options).run(False)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 419, in run
> return self.runner.run_pipeline(self, self._options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 469, in run_pipeline
> super(DataflowRunner, self).run_pipeline(pipeline, options)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 
> 158, in run_pipeline
> pipeline.visit(RunVisitor(self))
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 447, in visit
> self._root_transform().visit(visitor, self, visited)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 824, in visit
> 

[jira] [Commented] (BEAM-8311) Fix python mongodbio display data type

2019-09-24 Thread Yichi Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937185#comment-16937185
 ] 

Yichi Zhang commented on BEAM-8311:
---

CC: [~markflyhigh]

> Fix python mongodbio display data type
> --
>
> Key: BEAM-8311
> URL: https://issues.apache.org/jira/browse/BEAM-8311
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Blocker
> Fix For: 2.16.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8311) Fix python mongodbio display data type

2019-09-24 Thread Yichi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yichi Zhang updated BEAM-8311:
--
Priority: Blocker  (was: Major)

> Fix python mongodbio display data type
> --
>
> Key: BEAM-8311
> URL: https://issues.apache.org/jira/browse/BEAM-8311
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Blocker
> Fix For: 2.16.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8301) Argument inference breaks on incomparable types as defaults.

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8301?focusedWorklogId=317826=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317826
 ]

ASF GitHub Bot logged work on BEAM-8301:


Author: ASF GitHub Bot
Created on: 24/Sep/19 20:50
Start Date: 24/Sep/19 20:50
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #9641: [BEAM-8301] Fix 
incomparable defaults.
URL: https://github.com/apache/beam/pull/9641#issuecomment-534742533
 
 
   LGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317826)
Time Spent: 20m  (was: 10m)

> Argument inference breaks on incomparable types as defaults.
> 
>
> Key: BEAM-8301
> URL: https://issues.apache.org/jira/browse/BEAM-8301
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0
>Reporter: Robert Bradshaw
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> A common culprit is numpy arrays, e.g.
> {code:python}
> class MyDoFn(beam.DoFn):
>   def process(element, arg=np.ndarray(...)):
> ... 
> {code}
> This bug was introduced as part of [BEAM-7060].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8311) Fix python mongodbio display data type

2019-09-24 Thread Yichi Zhang (Jira)
Yichi Zhang created BEAM-8311:
-

 Summary: Fix python mongodbio display data type
 Key: BEAM-8311
 URL: https://issues.apache.org/jira/browse/BEAM-8311
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Yichi Zhang
Assignee: Yichi Zhang
 Fix For: 2.16.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-09-24 Thread Mark Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Liu reassigned BEAM-7060:
--

Assignee: Mark Liu  (was: Udi Meiri)

> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Mark Liu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 18.5h
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8279) temporarily disable IOTypeHints.from_callable

2019-09-24 Thread Mark Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Liu resolved BEAM-8279.

Resolution: Fixed

> temporarily disable IOTypeHints.from_callable
> -
>
> Key: BEAM-8279
> URL: https://issues.apache.org/jira/browse/BEAM-8279
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Py3 annotations support is too buggy to be put into the 2.16 release, and 
> there is no easy way to disable/enable via flags.
> PRs made to fix bugs discovered while testing internally the upcoming release:
> https://github.com/apache/beam/pull/9563 - converts python type hints to Beam 
> internal
> https://github.com/apache/beam/pull/9602 - fixes 4-5 bugs
> Re-enable from_callable once is has been more thoroughly tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6923?focusedWorklogId=317811=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317811
 ]

ASF GitHub Bot logged work on BEAM-6923:


Author: ASF GitHub Bot
Created on: 24/Sep/19 20:19
Start Date: 24/Sep/19 20:19
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #9647: [BEAM-6923] limit 
number of concurrent artifact write to 8
URL: https://github.com/apache/beam/pull/9647#issuecomment-534730524
 
 
   Yes since you can keep track of the number of calls and on a per call basis
   provide pushback.
   
   On Tue, Sep 24, 2019 at 12:43 PM Robert Bradshaw 
   wrote:
   
   > Note that gRPC does support pushback and signalling to the client via the
   > HTTP/2 protocol it is built on top of, should we be using that?
   >
   > If that works, it'd be good to use, but would this apply to multiple
   > concurrent connections, which I think is the issue here?
   >
   > —
   > You are receiving this because you commented.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or mute the thread
   > 

   > .
   >
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317811)
Time Spent: 1h  (was: 50m)

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Assignee: Ankur Goenka
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump 
> size-sorted.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=317808=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317808
 ]

ASF GitHub Bot logged work on BEAM-8213:


Author: ASF GitHub Bot
Created on: 24/Sep/19 20:10
Start Date: 24/Sep/19 20:10
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9642: [BEAM-8213] Split 
up monolithic python preCommit tests on jenkins
URL: https://github.com/apache/beam/pull/9642#issuecomment-534727425
 
 
   > the overall work is equivalent they should take about 1/5 as long (or 
slightly longer if you include setup time).
   I don't think we will have 1/5 reduction, since currently Python 2.7, 3.5, 
3.6, 3.7 precommits are already executing in parallel (by way of gradle 
parallelism) on the same Jenkins worker taking up one slot, so we will still 
have a parallel execution but will require 4x more slots. 
   
   Increasing slots per worker may help, but there are some potentially 
heavy-weight tests, such as portable python precommit tests that bring up 
Flink, that may cause jenkins VMs to OOM if we run a lot of them in parallel on 
the same VM. I have heard of a second hand account that parallelizing  portable 
precommit tests 4x on the same Jenkins worker caused OOMs, but did not verify 
myself. Perhaps not an issue, but we need a reliable way to monitor Jenkins 
worker health / utilization to be confident.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317808)
Time Spent: 2.5h  (was: 2h 20m)

> Run and report python tox tasks separately within Jenkins
> -
>
> Key: BEAM-8213
> URL: https://issues.apache.org/jira/browse/BEAM-8213
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> As a python developer, the speed and comprehensibility of the jenkins 
> PreCommit job could be greatly improved.
> Here are some of the problems
> - when a lint job fails, it's not reported in the test results summary, so 
> even though the job is marked as failed, I see "Test Result (no failures)" 
> which is quite confusing
> - I have to wait for over an hour to discover the lint failed, which takes 
> about a minute to run on its own
> - The logs are a jumbled mess of all the different tasks running on top of 
> each other
> - The test results give no indication of which version of python they use.  I 
> click on Test results, then the test module, then the test class, then I see 
> 4 tests named the same thing.  I assume that the first is python 2.7, the 
> second is 3.5 and so on.   It takes 5 clicks and then reading the log output 
> to know which version of python a single error pertains to, then I need to 
> repeat for each failure.  This makes it very difficult to discover problems, 
> and deduce that they may have something to do with python version mismatches.
> I believe the solution to this is to split up the single monolithic python 
> PreCommit job into sub-jobs (possibly using a pipeline with steps).  This 
> would give us the following benefits:
> - sub job results should become available as they finish, so for example, 
> lint results should be available very early on
> - sub job results will be reported separately, and there will be a job for 
> each py2, py35, py36 and so on, so it will be clear when an error is related 
> to a particular python version
> - sub jobs without reports, like docs and lint, will have their own failure 
> status and logs, so when they fail it will be more obvious what went wrong.
> I'm happy to help out once I get some feedback on the desired way forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=317809=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317809
 ]

ASF GitHub Bot logged work on BEAM-8213:


Author: ASF GitHub Bot
Created on: 24/Sep/19 20:10
Start Date: 24/Sep/19 20:10
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9642: [BEAM-8213] Split 
up monolithic python preCommit tests on jenkins
URL: https://github.com/apache/beam/pull/9642#issuecomment-534727425
 
 
   > the overall work is equivalent they should take about 1/5 as long (or 
slightly longer if you include setup time).
   
   I don't think we will have 1/5 reduction, since currently Python 2.7, 3.5, 
3.6, 3.7 precommits are already executing in parallel (by way of gradle 
parallelism) on the same Jenkins worker taking up one slot, so we will still 
have a parallel execution but will require 4x more slots. 
   
   Increasing slots per worker may help, but there are some potentially 
heavy-weight tests, such as portable python precommit tests that bring up 
Flink, that may cause jenkins VMs to OOM if we run a lot of them in parallel on 
the same VM. I have heard of a second hand account that parallelizing  portable 
precommit tests 4x on the same Jenkins worker caused OOMs, but did not verify 
myself. Perhaps not an issue, but we need a reliable way to monitor Jenkins 
worker health / utilization to be confident.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317809)
Time Spent: 2h 40m  (was: 2.5h)

> Run and report python tox tasks separately within Jenkins
> -
>
> Key: BEAM-8213
> URL: https://issues.apache.org/jira/browse/BEAM-8213
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> As a python developer, the speed and comprehensibility of the jenkins 
> PreCommit job could be greatly improved.
> Here are some of the problems
> - when a lint job fails, it's not reported in the test results summary, so 
> even though the job is marked as failed, I see "Test Result (no failures)" 
> which is quite confusing
> - I have to wait for over an hour to discover the lint failed, which takes 
> about a minute to run on its own
> - The logs are a jumbled mess of all the different tasks running on top of 
> each other
> - The test results give no indication of which version of python they use.  I 
> click on Test results, then the test module, then the test class, then I see 
> 4 tests named the same thing.  I assume that the first is python 2.7, the 
> second is 3.5 and so on.   It takes 5 clicks and then reading the log output 
> to know which version of python a single error pertains to, then I need to 
> repeat for each failure.  This makes it very difficult to discover problems, 
> and deduce that they may have something to do with python version mismatches.
> I believe the solution to this is to split up the single monolithic python 
> PreCommit job into sub-jobs (possibly using a pipeline with steps).  This 
> would give us the following benefits:
> - sub job results should become available as they finish, so for example, 
> lint results should be available very early on
> - sub job results will be reported separately, and there will be a job for 
> each py2, py35, py36 and so on, so it will be clear when an error is related 
> to a particular python version
> - sub jobs without reports, like docs and lint, will have their own failure 
> status and logs, so when they fail it will be more obvious what went wrong.
> I'm happy to help out once I get some feedback on the desired way forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8111) SchemaCoder broken on DataflowRunner

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8111?focusedWorklogId=317803=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317803
 ]

ASF GitHub Bot logged work on BEAM-8111:


Author: ASF GitHub Bot
Created on: 24/Sep/19 20:05
Start Date: 24/Sep/19 20:05
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9550: Revert "[BEAM-8111] 
Add ValidatesRunner test to AvroSchemaTest"
URL: https://github.com/apache/beam/pull/9550#issuecomment-534725691
 
 
   So should this be closed? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317803)
Time Spent: 4h 10m  (was: 4h)

> SchemaCoder broken on DataflowRunner
> 
>
> Key: BEAM-8111
> URL: https://issues.apache.org/jira/browse/BEAM-8111
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/commit/e65c176a9f34e45d408281e1101a2ae54cef0f6c
>  broke SchemaCoder on Dataflow. When translating a schema that uses logical 
> types from a cloud object dataflow encounters a runtime error.
> This means any pipelines that use SqlTransform or schema transforms will fail 
> on Dataflow in 2.15.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3372) Duplicated 'zone' PipelineOption has inconsistent documentation

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3372?focusedWorklogId=317799=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317799
 ]

ASF GitHub Bot logged work on BEAM-3372:


Author: ASF GitHub Bot
Created on: 24/Sep/19 20:03
Start Date: 24/Sep/19 20:03
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9578: [BEAM-3372] remove 
duplicated zone
URL: https://github.com/apache/beam/pull/9578#issuecomment-534724952
 
 
   Good to merge?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317799)
Time Spent: 1h 20m  (was: 1h 10m)

> Duplicated 'zone' PipelineOption has inconsistent documentation
> ---
>
> Key: BEAM-3372
> URL: https://issues.apache.org/jira/browse/BEAM-3372
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, runner-dataflow
>Reporter: Scott Wegner
>Priority: Minor
>  Labels: ccoss2019, newbie, starter, test
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Two different PipelineOptions interfaces defined a 'zone' option: GcpOptions 
> [1] and DataflowWorkerPoolOptions [2]. It's not an error for an option to be 
> redefined, and internally Beam checks that the definitions are compatible.
> In this case the two 'zone' definitions are compatible but they have 
> different descriptions. This can be confusing as setting one will also impact 
> the other.
> We should make improvements around duplicate PipelineOptions definitions for 
> a given runner. In this case, I propose we:
> a) Update the @Description's so that they match.
> b) Mark one of them as @Deprecated with a link to the other. Migrate code 
> references and plan to remove it on the next major version.
> c) Add a test which checks all PipelineOptions on the DataflowRunner 
> classpath and verify that any duplicates have the properties above 
> (equivalent definitions including @Description, and only one non-@Deprecated 
> version)
> [1] 
> https://github.com/apache/beam/blob/670941961845593d9a7e09b17c1bd117f27bf579/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L95
> [2] 
> https://github.com/apache/beam/blob/670941961845593d9a7e09b17c1bd117f27bf579/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java#L175



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7560) Python local filesystem match does not work without directory separator.

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7560?focusedWorklogId=317795=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317795
 ]

ASF GitHub Bot logged work on BEAM-7560:


Author: ASF GitHub Bot
Created on: 24/Sep/19 20:02
Start Date: 24/Sep/19 20:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9579: [BEAM-7560] 
verifying file match
URL: https://github.com/apache/beam/pull/9579#discussion_r327806663
 
 

 ##
 File path: sdks/python/apache_beam/io/textio_test.py
 ##
 @@ -168,6 +168,28 @@ def 
test_read_single_file_larger_than_default_buffer(self):
 self._run_read_test(file_name, expected_data,
 buffer_size=TextSource.DEFAULT_READ_BUFFER_SIZE)
 
+  def test_create_file_then_read(self):
+with open('.txt', 'w') as file:
 
 Review comment:
   These should be placed in a temp directory that gets cleaned up (see other 
tests in this class).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317795)
Time Spent: 40m  (was: 0.5h)

> Python local filesystem match does not work without directory separator.
> 
>
> Key: BEAM-7560
> URL: https://issues.apache.org/jira/browse/BEAM-7560
> Project: Beam
>  Issue Type: Test
>  Components: io-py-files
>Reporter: Robert Bradshaw
>Priority: Major
>  Labels: ccoss2019, starter
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> E.g. {{beam.io.ReadFromText('./*.txt')}} works but 
> {{beam.io.ReadFromText('*.txt')}} throws a "No files found based on the 
> filepattern..." error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7560) Python local filesystem match does not work without directory separator.

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7560?focusedWorklogId=317796=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317796
 ]

ASF GitHub Bot logged work on BEAM-7560:


Author: ASF GitHub Bot
Created on: 24/Sep/19 20:02
Start Date: 24/Sep/19 20:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9579: [BEAM-7560] 
verifying file match
URL: https://github.com/apache/beam/pull/9579#discussion_r327807655
 
 

 ##
 File path: sdks/python/apache_beam/io/textio_test.py
 ##
 @@ -168,6 +168,28 @@ def 
test_read_single_file_larger_than_default_buffer(self):
 self._run_read_test(file_name, expected_data,
 buffer_size=TextSource.DEFAULT_READ_BUFFER_SIZE)
 
+  def test_create_file_then_read(self):
+with open('.txt', 'w') as file:
+  file.write('Hello\n')
+
+p = beam.Pipeline()
+output1 = p | "Read1" >> ReadFromText("./.txt")
+
+with open('test_file.txt', 'w') as file:
+  file.write('Hello\n')
+
 
 Review comment:
   Nit: following convention in this file, don't have newlines between each 
individual statement. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317796)
Time Spent: 40m  (was: 0.5h)

> Python local filesystem match does not work without directory separator.
> 
>
> Key: BEAM-7560
> URL: https://issues.apache.org/jira/browse/BEAM-7560
> Project: Beam
>  Issue Type: Test
>  Components: io-py-files
>Reporter: Robert Bradshaw
>Priority: Major
>  Labels: ccoss2019, starter
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> E.g. {{beam.io.ReadFromText('./*.txt')}} works but 
> {{beam.io.ReadFromText('*.txt')}} throws a "No files found based on the 
> filepattern..." error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7560) Python local filesystem match does not work without directory separator.

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7560?focusedWorklogId=317797=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317797
 ]

ASF GitHub Bot logged work on BEAM-7560:


Author: ASF GitHub Bot
Created on: 24/Sep/19 20:02
Start Date: 24/Sep/19 20:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9579: [BEAM-7560] 
verifying file match
URL: https://github.com/apache/beam/pull/9579#discussion_r327806392
 
 

 ##
 File path: sdks/python/apache_beam/io/textio_test.py
 ##
 @@ -168,6 +168,28 @@ def 
test_read_single_file_larger_than_default_buffer(self):
 self._run_read_test(file_name, expected_data,
 buffer_size=TextSource.DEFAULT_READ_BUFFER_SIZE)
 
+  def test_create_file_then_read(self):
 
 Review comment:
   Could you clarify what this is testing (and maybe rename the test)? Is it 
testing matching against dot files?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317797)
Time Spent: 40m  (was: 0.5h)

> Python local filesystem match does not work without directory separator.
> 
>
> Key: BEAM-7560
> URL: https://issues.apache.org/jira/browse/BEAM-7560
> Project: Beam
>  Issue Type: Test
>  Components: io-py-files
>Reporter: Robert Bradshaw
>Priority: Major
>  Labels: ccoss2019, starter
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> E.g. {{beam.io.ReadFromText('./*.txt')}} works but 
> {{beam.io.ReadFromText('*.txt')}} throws a "No files found based on the 
> filepattern..." error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7560) Python local filesystem match does not work without directory separator.

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7560?focusedWorklogId=317798=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317798
 ]

ASF GitHub Bot logged work on BEAM-7560:


Author: ASF GitHub Bot
Created on: 24/Sep/19 20:02
Start Date: 24/Sep/19 20:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9579: [BEAM-7560] 
verifying file match
URL: https://github.com/apache/beam/pull/9579#discussion_r327807495
 
 

 ##
 File path: sdks/python/apache_beam/io/textio_test.py
 ##
 @@ -168,6 +168,28 @@ def 
test_read_single_file_larger_than_default_buffer(self):
 self._run_read_test(file_name, expected_data,
 buffer_size=TextSource.DEFAULT_READ_BUFFER_SIZE)
 
+  def test_create_file_then_read(self):
+with open('.txt', 'w') as file:
+  file.write('Hello\n')
+
+p = beam.Pipeline()
+output1 = p | "Read1" >> ReadFromText("./.txt")
 
 Review comment:
   I'd give the outputs more meaningful names. Either than or just write
   
   ```
   assert_that(
   p | ReadFromText("./.txt"),
   equal_to(['Hello']),
   label="meaningful label")
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317798)
Time Spent: 40m  (was: 0.5h)

> Python local filesystem match does not work without directory separator.
> 
>
> Key: BEAM-7560
> URL: https://issues.apache.org/jira/browse/BEAM-7560
> Project: Beam
>  Issue Type: Test
>  Components: io-py-files
>Reporter: Robert Bradshaw
>Priority: Major
>  Labels: ccoss2019, starter
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> E.g. {{beam.io.ReadFromText('./*.txt')}} works but 
> {{beam.io.ReadFromText('*.txt')}} throws a "No files found based on the 
> filepattern..." error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8003) Remove all mentions of PKB on Confluence / website docs

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8003?focusedWorklogId=317789=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317789
 ]

ASF GitHub Bot logged work on BEAM-8003:


Author: ASF GitHub Bot
Created on: 24/Sep/19 19:50
Start Date: 24/Sep/19 19:50
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9626: [BEAM-8003] pyjobs 
init commit
URL: https://github.com/apache/beam/pull/9626#issuecomment-534720251
 
 
   This doesn't seem related to BEAM-8003, is this PR against the right repo? 
If so, has there been discussion on the list about this? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317789)
Time Spent: 4h 20m  (was: 4h 10m)

> Remove all mentions of PKB on Confluence / website docs
> ---
>
> Key: BEAM-8003
> URL: https://issues.apache.org/jira/browse/BEAM-8003
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing, website
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8293) Document or log file system issues with docker

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8293?focusedWorklogId=317787=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317787
 ]

ASF GitHub Bot logged work on BEAM-8293:


Author: ASF GitHub Bot
Created on: 24/Sep/19 19:44
Start Date: 24/Sep/19 19:44
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9646: [BEAM-8293] 
prescriptive log message for artifact retrieval failure
URL: https://github.com/apache/beam/pull/9646
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317787)
Time Spent: 20m  (was: 10m)

> Document or log file system issues with docker
> --
>
> Key: BEAM-8293
> URL: https://issues.apache.org/jira/browse/BEAM-8293
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> A frequently asked question about portability in the mailing list is, "Why am 
> I getting IOExceptions in my job?" where the answer is often, because the SDK 
> harness is using docker, which does not have access to the local filesystem 
> by default, so when users try to read/write via transforms or don't set a 
> artifact_staging_location, they get errors. We should at least document this 
> on the website. Even better would be to log something, especially for 
> artifact_staging_location, which is implicit and users might not be aware of.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6923?focusedWorklogId=317786=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317786
 ]

ASF GitHub Bot logged work on BEAM-6923:


Author: ASF GitHub Bot
Created on: 24/Sep/19 19:43
Start Date: 24/Sep/19 19:43
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9647: [BEAM-6923] limit 
number of concurrent artifact write to 8
URL: https://github.com/apache/beam/pull/9647#issuecomment-534717907
 
 
   > Note that gRPC does support pushback and signalling to the client via the 
HTTP/2 protocol it is built on top of, should we be using that?
   
   If that works, it'd be good to use, but would this apply to multiple 
concurrent connections, which I think is the issue here? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317786)
Time Spent: 50m  (was: 40m)

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Assignee: Ankur Goenka
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump 
> size-sorted.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=317770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317770
 ]

ASF GitHub Bot logged work on BEAM-8213:


Author: ASF GitHub Bot
Created on: 24/Sep/19 19:20
Start Date: 24/Sep/19 19:20
Worklog Time Spent: 10m 
  Work Description: youngoli commented on issue #9642: [BEAM-8213] Split up 
monolithic python preCommit tests on jenkins
URL: https://github.com/apache/beam/pull/9642#issuecomment-534709534
 
 
   > Can you describe how other jobs will be affected so that I can send out a 
proper warning, pease?
   
   If you do seed job then it will update Jenkins and the github setup here in 
total, not just for this PR. So during that time where your change is live, all 
PRs will run the split precommits. If those precommits are failing for whatever 
reason then you'll be blocking those PRs. So ideally you'll want to warn devs 
first and if it causes problems, the do it again with a clean environment to 
undo your changes (this also occurs automatically in 8 hour intervals, but 
that's too long if the precommits are failing).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317770)
Time Spent: 2h 20m  (was: 2h 10m)

> Run and report python tox tasks separately within Jenkins
> -
>
> Key: BEAM-8213
> URL: https://issues.apache.org/jira/browse/BEAM-8213
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> As a python developer, the speed and comprehensibility of the jenkins 
> PreCommit job could be greatly improved.
> Here are some of the problems
> - when a lint job fails, it's not reported in the test results summary, so 
> even though the job is marked as failed, I see "Test Result (no failures)" 
> which is quite confusing
> - I have to wait for over an hour to discover the lint failed, which takes 
> about a minute to run on its own
> - The logs are a jumbled mess of all the different tasks running on top of 
> each other
> - The test results give no indication of which version of python they use.  I 
> click on Test results, then the test module, then the test class, then I see 
> 4 tests named the same thing.  I assume that the first is python 2.7, the 
> second is 3.5 and so on.   It takes 5 clicks and then reading the log output 
> to know which version of python a single error pertains to, then I need to 
> repeat for each failure.  This makes it very difficult to discover problems, 
> and deduce that they may have something to do with python version mismatches.
> I believe the solution to this is to split up the single monolithic python 
> PreCommit job into sub-jobs (possibly using a pipeline with steps).  This 
> would give us the following benefits:
> - sub job results should become available as they finish, so for example, 
> lint results should be available very early on
> - sub job results will be reported separately, and there will be a job for 
> each py2, py35, py36 and so on, so it will be clear when an error is related 
> to a particular python version
> - sub jobs without reports, like docs and lint, will have their own failure 
> status and logs, so when they fail it will be more obvious what went wrong.
> I'm happy to help out once I get some feedback on the desired way forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8286) Python precommit (:sdks:python:test-suites:tox:py2:docs) failing

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8286?focusedWorklogId=317766=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317766
 ]

ASF GitHub Bot logged work on BEAM-8286:


Author: ASF GitHub Bot
Created on: 24/Sep/19 19:10
Start Date: 24/Sep/19 19:10
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #9654: 
[BEAM-8286] replace dead intersphinx link for google-cloud-python
URL: https://github.com/apache/beam/pull/9654
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317766)
Time Spent: 1h 10m  (was: 1h)

> Python precommit (:sdks:python:test-suites:tox:py2:docs) failing
> 
>
> Key: BEAM-8286
> URL: https://issues.apache.org/jira/browse/BEAM-8286
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Example failure: 
> [https://builds.apache.org/job/beam_PreCommit_Python_Commit/8638/console]
> 17:29:13 * What went wrong:
> 17:29:13 Execution failed for task ':sdks:python:test-suites:tox:py2:docs'.
> 17:29:13 > Process 'command 'sh'' finished with non-zero exit value 1
> Fails on my local machine (on head) as well. Can't determine exact cause.
> ERROR: InvocationError for command /usr/bin/time scripts/generate_pydoc.sh 
> (exited with code 1) 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3845) Avoid calling Class#newInstance

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3845?focusedWorklogId=317764=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317764
 ]

ASF GitHub Bot logged work on BEAM-3845:


Author: ASF GitHub Bot
Created on: 24/Sep/19 19:08
Start Date: 24/Sep/19 19:08
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on issue #9613: [BEAM-3845] 
Remove deprecated Class.newInstance() method usage
URL: https://github.com/apache/beam/pull/9613#issuecomment-534705212
 
 
   The PostRelease_NightlySnapshot
   

   runs :release:runJavaExamplesValidationTask
   which I think exercises those scripts?
   
   On Tue, Sep 24, 2019 at 8:33 AM Łukasz Gajowy 
   wrote:
   
   > @alanmyrvold  and @yifanzou
   >  we noticed together with @iemejia
   >  that this class (TestScripts) is used only
   > by the quickstart-java-*.groovy, starter-generation.groovy and
   > mobilegaming-java-*.groovy scripts. The scripts (quickstart,
   > mobilegaming...) in turn seem to be not used by any job or gradle task in
   > Beam's repo, so I have a question: are the groovy scripts still used
   > anywhere or will be used? Or is it a "dead code" and we should delete this?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or mute the thread
   > 

   > .
   >
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317764)
Time Spent: 50m  (was: 40m)

> Avoid calling Class#newInstance
> ---
>
> Key: BEAM-3845
> URL: https://issues.apache.org/jira/browse/BEAM-3845
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Ted Yu
>Assignee: Lukasz Gajowy
>Priority: Minor
>  Labels: triaged
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Class#newInstance is deprecated starting in Java 9 - 
> https://bugs.openjdk.java.net/browse/JDK-6850612 - because it may throw 
> undeclared checked exceptions.
> The suggested replacement is getDeclaredConstructor().newInstance(), which 
> wraps the checked exceptions in InvocationException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8029) Using BigQueryIO.read with DIRECT_READ causes Illegal Mutation

2019-09-24 Thread Jason Bowman (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937108#comment-16937108
 ] 

Jason Bowman commented on BEAM-8029:


Serializing the generic record in the serializing the generic record to binary 
avro, using the ByteArray coder, and deserializing it again in the next 
pipeline stage prevents the mutation.

In the initial report it shows value it's showing fields being overwritten. We 
see the same, and we see byte array fields getting partially overwritten, for 
example a Json field will turn into: "\{"a": 1}b": 2} with the previous value 
being left in the array. This seems to point to a reader corruption/reuse issue.

Digging a bit deeper I find:

[https://github.com/apache/beam/blob/ac45af909923e6d5e43f83087943ad71513b37e8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageStreamSource.java#L251]

It seems the GenericRecord is reused explicitly and is a member variable of the 
stream source, not the row result. The maintainers would seem to assume that 
you would never use the GenericRecord result.

> Using BigQueryIO.read with DIRECT_READ causes Illegal Mutation 
> ---
>
> Key: BEAM-8029
> URL: https://issues.apache.org/jira/browse/BEAM-8029
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.14.0
>Reporter: Chris Larsen
>Priority: Major
>
>  
> Code to read from BigQuery that is causing the issue:
> {code:java}
> pipeline
>     .apply(BigQueryIO
>     .read(SchemaAndRecord::getRecord)
>     .from(options.getTableRef())
>     .withMethod(Method.DIRECT_READ)
>     .withCoder(AvroCoder.of(schema)))
> {code}
> If we remove .withMethod(Method.DIRECT_READ) then there is no issue.
>  
> The error is:
> {code:java}
> org.apache.beam.sdk.util.IllegalMutationException: PTransform 
> BigQueryIO.TypedRead/Read(BigQueryStorageTableSource) mutated value 
> {"device_id": "rpi-rpi0-thermostat", "temperature_c": 20.0, "temperature_f": 
> 52.0, "sample_time": 1564412307969368, "humidity": 74.3} after it was output 
> (new value was {"device_id": "rpi-rpi0-thermostat", "temperature_c": 20.0, 
> "temperature_f": 52.0, "sample_time": 1564412360458615, "humidity": 74.7}). 
> Values must not be mutated in any way after being output.
> at 
> org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.commit
>  (ImmutabilityCheckingBundleFactory.java:134)
> at org.apache.beam.runners.direct.EvaluationContext.commitBundles 
> (EvaluationContext.java:210)
> at org.apache.beam.runners.direct.EvaluationContext.handleResult 
> (EvaluationContext.java:151)
> at 
> org.apache.beam.runners.direct.QuiescenceDriver$TimerIterableCompletionCallback.handleResult
>  (QuiescenceDriver.java:262)
> at org.apache.beam.runners.direct.DirectTransformExecutor.finishBundle 
> (DirectTransformExecutor.java:189)
> at org.apache.beam.runners.direct.DirectTransformExecutor.run 
> (DirectTransformExecutor.java:126)
> at java.util.concurrent.Executors$RunnableAdapter.call 
> (Executors.java:511)
> at java.util.concurrent.FutureTask.run (FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker 
> (ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run 
> (ThreadPoolExecutor.java:624)
> at java.lang.Thread.run (Thread.java:748)
> Caused by: org.apache.beam.sdk.util.IllegalMutationException: Value 
> {"device_id": "rpi-rpi0-thermostat", "temperature_c": 20.0, "temperature_f": 
> 52.0, "sample_time": 1564412307969368, "humidity": 74.3} mutated illegally, 
> new value was {"device_id": "rpi-rpi0-thermostat", "temperature_c": 20.0, 
> "temperature_f": 52.0, "sample_time": 1564412360458615, "humidity": 74.7}. 
> Encoding was 
> AiZycGktcnBpMC10aGVybW9zdGF0AgAAADRAAgAAAEpAArDVsP7jtMcFAjMzMzMzk1JA, 
> now 
> AiZycGktcnBpMC10aGVybW9zdGF0AgAAADRAAgAAAEpAAu6FuLDktMcFAs3MzMzMrFJA.
> at 
> org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.illegalMutation
>  (MutationDetectors.java:153)
> at 
> org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.verifyUnmodifiedThrowingCheckedExceptions
>  (MutationDetectors.java:148)
> at 
> org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.verifyUnmodified
>  (MutationDetectors.java:123)
> at 
> org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.commit
>  (ImmutabilityCheckingBundleFactory.java:124)
> at org.apache.beam.runners.direct.EvaluationContext.commitBundles 
> (EvaluationContext.java:210)
> at org.apache.beam.runners.direct.EvaluationContext.handleResult 
> (EvaluationContext.java:151)
> at 
> 

[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=317761=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317761
 ]

ASF GitHub Bot logged work on BEAM-8213:


Author: ASF GitHub Bot
Created on: 24/Sep/19 19:02
Start Date: 24/Sep/19 19:02
Worklog Time Spent: 10m 
  Work Description: youngoli commented on issue #9642: [BEAM-8213] Split up 
monolithic python preCommit tests on jenkins
URL: https://github.com/apache/beam/pull/9642#issuecomment-534702315
 
 
   As far as queue times go, I think the impact of this should be minimal. Like 
Chad mentioned, although it will take up 5 Jenkins slots, the overall work is 
equivalent they should take about 1/5 as long (or slightly longer if you 
include setup time).
   
   I think the main mitigation here is that these are precommits and not 
postcommits. Since it's not set to run on any kind of regular schedule I'm less 
worried about sudden spikes in Jenkins usage. And even if the slots do all get 
full, the shorter time for these precommits means that slots would be available 
sooner.
   
   So I don't think this is worth blocking on for that reason. But I'd measure 
how long the tests take to check that splitting the test didn't drastically 
increase the total amount of work, and I'd keep an eye on overall latency via 
these two pages:
   
   http://104.154.241.245/d/_TNndF2iz/pre-commit-test-latency?orgId=1
   https://builds.apache.org/label/beam/load-statistics?type=min
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317761)
Time Spent: 2h 10m  (was: 2h)

> Run and report python tox tasks separately within Jenkins
> -
>
> Key: BEAM-8213
> URL: https://issues.apache.org/jira/browse/BEAM-8213
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> As a python developer, the speed and comprehensibility of the jenkins 
> PreCommit job could be greatly improved.
> Here are some of the problems
> - when a lint job fails, it's not reported in the test results summary, so 
> even though the job is marked as failed, I see "Test Result (no failures)" 
> which is quite confusing
> - I have to wait for over an hour to discover the lint failed, which takes 
> about a minute to run on its own
> - The logs are a jumbled mess of all the different tasks running on top of 
> each other
> - The test results give no indication of which version of python they use.  I 
> click on Test results, then the test module, then the test class, then I see 
> 4 tests named the same thing.  I assume that the first is python 2.7, the 
> second is 3.5 and so on.   It takes 5 clicks and then reading the log output 
> to know which version of python a single error pertains to, then I need to 
> repeat for each failure.  This makes it very difficult to discover problems, 
> and deduce that they may have something to do with python version mismatches.
> I believe the solution to this is to split up the single monolithic python 
> PreCommit job into sub-jobs (possibly using a pipeline with steps).  This 
> would give us the following benefits:
> - sub job results should become available as they finish, so for example, 
> lint results should be available very early on
> - sub job results will be reported separately, and there will be a job for 
> each py2, py35, py36 and so on, so it will be clear when an error is related 
> to a particular python version
> - sub jobs without reports, like docs and lint, will have their own failure 
> status and logs, so when they fail it will be more obvious what went wrong.
> I'm happy to help out once I get some feedback on the desired way forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8302) beam_PostCommit_XVR_Flink failing

2019-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8302?focusedWorklogId=317758=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317758
 ]

ASF GitHub Bot logged work on BEAM-8302:


Author: ASF GitHub Bot
Created on: 24/Sep/19 18:55
Start Date: 24/Sep/19 18:55
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9644: [BEAM-8302] 
Fix PostCommit_XVR_Flink
URL: https://github.com/apache/beam/pull/9644
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 317758)
Time Spent: 1h 10m  (was: 1h)

> beam_PostCommit_XVR_Flink failing
> -
>
> Key: BEAM-8302
> URL: https://issues.apache.org/jira/browse/BEAM-8302
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> E.g. see https://builds.apache.org/job/beam_PostCommit_XVR_Flink/432/console



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >