[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=385700=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385700
 ]

ASF GitHub Bot logged work on BEAM-9281:


Author: ASF GitHub Bot
Created on: 12/Feb/20 06:46
Start Date: 12/Feb/20 06:46
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10818: [BEAM-9281] Update 
commons-csv to version 1.8
URL: https://github.com/apache/beam/pull/10818#issuecomment-585055981
 
 
   Run SQL PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385700)
Time Spent: 40m  (was: 0.5h)

> Update commons-csv to version 1.8
> -
>
> Key: BEAM-9281
> URL: https://issues.apache.org/jira/browse/BEAM-9281
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
> Fix For: 2.20.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=385696=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385696
 ]

ASF GitHub Bot logged work on BEAM-6522:


Author: ASF GitHub Bot
Created on: 12/Feb/20 06:20
Start Date: 12/Feb/20 06:20
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10838: [BEAM-6522] 
[BEAM-7455] Unskip Avro IO tests that are now passing.
URL: https://github.com/apache/beam/pull/10838#issuecomment-585049296
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385696)
Time Spent: 8h 10m  (was: 8h)

> Dill fails to pickle  avro.RecordSchema classes on Python 3.
> 
>
> Key: BEAM-6522
> URL: https://issues.apache.org/jira/browse/BEAM-6522
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> The avroio module still has 4 failing tests. This is actually 2 times the 
> same 2 tests, both for Avro and Fastavro.
> *apache_beam.io.avroio_test.TestAvro.test_sink_transform*
>  *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform*
> fail with:
> {code:java}
> Traceback (most recent call last):
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 432, in test_sink_transform
> | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line 
> 528, in expand
> return pcoll | beam.io.iobase.Write(self._sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 960, in expand
> return pcoll | WriteImpl(self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 979, in expand
> lambda _, sink: sink.initialize_write(), self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1103, in Map
> pardo = FlatMap(wrapper, *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1054, in FlatMap
> pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 864, in __init__
> super(ParDo, self).__init__(fn, *args, **kwargs)
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 646, in __init__
> self.args = pickler.loads(pickler.dumps(self.args))
> File 
> 

[jira] [Work logged] (BEAM-9301) Check in beam-linkage-check.sh

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9301?focusedWorklogId=385674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385674
 ]

ASF GitHub Bot logged work on BEAM-9301:


Author: ASF GitHub Bot
Created on: 12/Feb/20 04:39
Start Date: 12/Feb/20 04:39
Worklog Time Spent: 10m 
  Work Description: suztomo commented on pull request #10841: [BEAM-9301] 
Check in beam-linkage-check.sh
URL: https://github.com/apache/beam/pull/10841#discussion_r378037592
 
 

 ##
 File path: sdks/java/build-tools/beam-linkage-check.sh
 ##
 @@ -0,0 +1,99 @@
+#!/bin/bash
 
 Review comment:
   @iemejia I tried with zsh compatible shell script, but it's not trivial. 
Therefore this relies on bash.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385674)
Time Spent: 0.5h  (was: 20m)

> Check in beam-linkage-check.sh
> --
>
> Key: BEAM-9301
> URL: https://issues.apache.org/jira/browse/BEAM-9301
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/pull/10769#issuecomment-584571787
> bq. @suztomo can you contribute this script maybe into Beam's build-tools 
> directory so we can improve it a bit for further use?
> This is a temporary solution before exclusion rules in Linkage Checker 
> (BEAM-9206) are implemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9301) Check in beam-linkage-check.sh

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9301?focusedWorklogId=385671=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385671
 ]

ASF GitHub Bot logged work on BEAM-9301:


Author: ASF GitHub Bot
Created on: 12/Feb/20 04:37
Start Date: 12/Feb/20 04:37
Worklog Time Spent: 10m 
  Work Description: suztomo commented on pull request #10841: [BEAM-9301] 
Check in beam-linkage-check.sh
URL: https://github.com/apache/beam/pull/10841
 
 
   Script to compare linkage errors (checkJavaLinkage task in the root gradle 
project) between PR's branch and master branch.
   
   This is a temporary solution before Linkage Checker implements exclusion 
rules (BEAM-9206).
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 

[jira] [Work logged] (BEAM-9301) Check in beam-linkage-check.sh

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9301?focusedWorklogId=385672=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385672
 ]

ASF GitHub Bot logged work on BEAM-9301:


Author: ASF GitHub Bot
Created on: 12/Feb/20 04:38
Start Date: 12/Feb/20 04:38
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10841: [BEAM-9301] Check in 
beam-linkage-check.sh
URL: https://github.com/apache/beam/pull/10841#issuecomment-585024938
 
 
   R: @iemejia 
   CC: @lukecwik 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385672)
Time Spent: 20m  (was: 10m)

> Check in beam-linkage-check.sh
> --
>
> Key: BEAM-9301
> URL: https://issues.apache.org/jira/browse/BEAM-9301
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/pull/10769#issuecomment-584571787
> bq. @suztomo can you contribute this script maybe into Beam's build-tools 
> directory so we can improve it a bit for further use?
> This is a temporary solution before exclusion rules in Linkage Checker 
> (BEAM-9206) are implemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8979) protoc-gen-mypy: program not found or is not executable

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8979?focusedWorklogId=385661=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385661
 ]

ASF GitHub Bot logged work on BEAM-8979:


Author: ASF GitHub Bot
Created on: 12/Feb/20 03:54
Start Date: 12/Feb/20 03:54
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10734: [BEAM-8979] 
reintroduce mypy-protobuf stub generation
URL: https://github.com/apache/beam/pull/10734#discussion_r378028956
 
 

 ##
 File path: sdks/python/gen_protos.py
 ##
 @@ -216,6 +216,23 @@ def _import(m):
 sys.path.pop(0)
 
 
+def _find_protoc_gen_mypy():
 
 Review comment:
   Yeah, might as well keep it.  It includes a fallback when PATH is not set 
and it’s easier to debug. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385661)
Time Spent: 9h  (was: 8h 50m)

> protoc-gen-mypy: program not found or is not executable
> ---
>
> Key: BEAM-8979
> URL: https://issues.apache.org/jira/browse/BEAM-8979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Kamil Wasilewski
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> In some tests, `:sdks:python:sdist:` task fails due to problems in finding 
> protoc-gen-mypy. The following tests are affected (there might be more):
>  * 
> [https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/]
>  * 
> [https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/
>  
> |https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/]
> Relevant logs:
> {code:java}
> 10:46:32 > Task :sdks:python:sdist FAILED
> 10:46:32 Requirement already satisfied: mypy-protobuf==1.12 in 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages
>  (1.12)
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto 
> but not used.
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto 
> but not used.
> 10:46:32 protoc-gen-mypy: program not found or is not executable
> 10:46:32 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
> 10:46:32 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/dist.py:476:
>  UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0'
> 10:46:32   normalized_version,
> 10:46:32 Traceback (most recent call last):
> 10:46:32   File "setup.py", line 295, in 
> 10:46:32 'mypy': generate_protos_first(mypy),
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/__init__.py",
>  line 145, in setup
> 10:46:32 return distutils.core.setup(**attrs)
> 10:46:32   File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
> 10:46:32 dist.run_commands()
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 966, in 
> run_commands
> 10:46:32 self.run_command(cmd)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/command/sdist.py",
>  line 44, in run
> 10:46:32 self.run_command('egg_info')
> 10:46:32   File "/usr/lib/python3.7/distutils/cmd.py", line 313, in 
> run_command
> 10:46:32 self.distribution.run_command(command)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File "setup.py", line 220, in run
> 10:46:32 gen_protos.generate_proto_files(log=log)
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/sdks/python/gen_protos.py",
>  line 144, in generate_proto_files
> 10:46:32 '%s' % ret_code)
> 10:46:32 RuntimeError: Protoc returned non-zero status (see logs for 
> details): 1
> {code}
>  
> This is what I have tried so far to resolve this (without being successful):
>  * Including _--plugin=protoc-gen-mypy=\{abs_path_to_executable}_ parameter 
> to the _protoc_ call 

[jira] [Work logged] (BEAM-8979) protoc-gen-mypy: program not found or is not executable

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8979?focusedWorklogId=385662=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385662
 ]

ASF GitHub Bot logged work on BEAM-8979:


Author: ASF GitHub Bot
Created on: 12/Feb/20 03:54
Start Date: 12/Feb/20 03:54
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10734: [BEAM-8979] 
reintroduce mypy-protobuf stub generation
URL: https://github.com/apache/beam/pull/10734#discussion_r378028966
 
 

 ##
 File path: sdks/python/gen_protos.py
 ##
 @@ -216,6 +216,23 @@ def _import(m):
 sys.path.pop(0)
 
 
+def _find_protoc_gen_mypy():
 
 Review comment:
   Yeah, might as well keep it.  It includes a fallback when PATH is not set 
and it’s easier to debug. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385662)
Time Spent: 9h 10m  (was: 9h)

> protoc-gen-mypy: program not found or is not executable
> ---
>
> Key: BEAM-8979
> URL: https://issues.apache.org/jira/browse/BEAM-8979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Kamil Wasilewski
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> In some tests, `:sdks:python:sdist:` task fails due to problems in finding 
> protoc-gen-mypy. The following tests are affected (there might be more):
>  * 
> [https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/]
>  * 
> [https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/
>  
> |https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/]
> Relevant logs:
> {code:java}
> 10:46:32 > Task :sdks:python:sdist FAILED
> 10:46:32 Requirement already satisfied: mypy-protobuf==1.12 in 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages
>  (1.12)
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto 
> but not used.
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto 
> but not used.
> 10:46:32 protoc-gen-mypy: program not found or is not executable
> 10:46:32 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
> 10:46:32 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/dist.py:476:
>  UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0'
> 10:46:32   normalized_version,
> 10:46:32 Traceback (most recent call last):
> 10:46:32   File "setup.py", line 295, in 
> 10:46:32 'mypy': generate_protos_first(mypy),
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/__init__.py",
>  line 145, in setup
> 10:46:32 return distutils.core.setup(**attrs)
> 10:46:32   File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
> 10:46:32 dist.run_commands()
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 966, in 
> run_commands
> 10:46:32 self.run_command(cmd)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/command/sdist.py",
>  line 44, in run
> 10:46:32 self.run_command('egg_info')
> 10:46:32   File "/usr/lib/python3.7/distutils/cmd.py", line 313, in 
> run_command
> 10:46:32 self.distribution.run_command(command)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File "setup.py", line 220, in run
> 10:46:32 gen_protos.generate_proto_files(log=log)
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/sdks/python/gen_protos.py",
>  line 144, in generate_proto_files
> 10:46:32 '%s' % ret_code)
> 10:46:32 RuntimeError: Protoc returned non-zero status (see logs for 
> details): 1
> {code}
>  
> This is what I have tried so far to resolve this (without being successful):
>  * Including _--plugin=protoc-gen-mypy=\{abs_path_to_executable}_ parameter 
> to the _protoc_ call 

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=385656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385656
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 12/Feb/20 03:33
Start Date: 12/Feb/20 03:33
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10835: 
[BEAM-8575] Removed MAX_TIMESTAMP from testing data
URL: https://github.com/apache/beam/pull/10835#discussion_r378019506
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util_test.py
 ##
 @@ -551,9 +551,6 @@ def test_reshuffle_preserves_timestamps(self):
   {
   'name': 'bar', 'timestamp': 33
   },
 
 Review comment:
   I'm afraid not.
   It seems b/146457921 and BEAM-9003 are reported separately.
   b/146457921 is for Unified Worker and BEAM-9003 is for Dataflow. 
   
   b/146457921 causes this test to be sickbayed in validates_runner_tests.bzl.
   BEAM-9003 causes this test to be sickbayed on Jenkins streaming VR test 
suite.
   
   It's possible that BEAM-9003 could be solved at the same time. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385656)
Time Spent: 50h  (was: 49h 50m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 50h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8979) protoc-gen-mypy: program not found or is not executable

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8979?focusedWorklogId=385652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385652
 ]

ASF GitHub Bot logged work on BEAM-8979:


Author: ASF GitHub Bot
Created on: 12/Feb/20 03:15
Start Date: 12/Feb/20 03:15
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10734: [BEAM-8979] 
reintroduce mypy-protobuf stub generation
URL: https://github.com/apache/beam/pull/10734#discussion_r378020687
 
 

 ##
 File path: sdks/python/gen_protos.py
 ##
 @@ -216,6 +216,23 @@ def _import(m):
 sys.path.pop(0)
 
 
+def _find_protoc_gen_mypy():
 
 Review comment:
   Do you want to keep this code in?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385652)
Time Spent: 8h 50m  (was: 8h 40m)

> protoc-gen-mypy: program not found or is not executable
> ---
>
> Key: BEAM-8979
> URL: https://issues.apache.org/jira/browse/BEAM-8979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Kamil Wasilewski
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> In some tests, `:sdks:python:sdist:` task fails due to problems in finding 
> protoc-gen-mypy. The following tests are affected (there might be more):
>  * 
> [https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/]
>  * 
> [https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/
>  
> |https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/]
> Relevant logs:
> {code:java}
> 10:46:32 > Task :sdks:python:sdist FAILED
> 10:46:32 Requirement already satisfied: mypy-protobuf==1.12 in 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages
>  (1.12)
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto 
> but not used.
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto 
> but not used.
> 10:46:32 protoc-gen-mypy: program not found or is not executable
> 10:46:32 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
> 10:46:32 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/dist.py:476:
>  UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0'
> 10:46:32   normalized_version,
> 10:46:32 Traceback (most recent call last):
> 10:46:32   File "setup.py", line 295, in 
> 10:46:32 'mypy': generate_protos_first(mypy),
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/__init__.py",
>  line 145, in setup
> 10:46:32 return distutils.core.setup(**attrs)
> 10:46:32   File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
> 10:46:32 dist.run_commands()
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 966, in 
> run_commands
> 10:46:32 self.run_command(cmd)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/command/sdist.py",
>  line 44, in run
> 10:46:32 self.run_command('egg_info')
> 10:46:32   File "/usr/lib/python3.7/distutils/cmd.py", line 313, in 
> run_command
> 10:46:32 self.distribution.run_command(command)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File "setup.py", line 220, in run
> 10:46:32 gen_protos.generate_proto_files(log=log)
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/sdks/python/gen_protos.py",
>  line 144, in generate_proto_files
> 10:46:32 '%s' % ret_code)
> 10:46:32 RuntimeError: Protoc returned non-zero status (see logs for 
> details): 1
> {code}
>  
> This is what I have tried so far to resolve this (without being successful):
>  * Including _--plugin=protoc-gen-mypy=\{abs_path_to_executable}_ parameter 
> to the _protoc_ call ingen_protos.py:131
>  * Appending protoc-gen-mypy's directory to the 

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=385648=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385648
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 12/Feb/20 03:09
Start Date: 12/Feb/20 03:09
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10835: 
[BEAM-8575] Removed MAX_TIMESTAMP from testing data
URL: https://github.com/apache/beam/pull/10835#discussion_r378019506
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util_test.py
 ##
 @@ -551,9 +551,6 @@ def test_reshuffle_preserves_timestamps(self):
   {
   'name': 'bar', 'timestamp': 33
   },
 
 Review comment:
   I'm afraid not.
   It seems b/146457921 and BEAM-9003 are two different issues.
   b/146457921 is for Unified Worker and BEAM-9003 is for Dataflow. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385648)
Time Spent: 49h 50m  (was: 49h 40m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 49h 50m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8095) pytest Py3.7 crashes on test_remote_runner_display_data

2020-02-11 Thread Udi Meiri (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri resolved BEAM-8095.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> pytest Py3.7 crashes on test_remote_runner_display_data
> ---
>
> Key: BEAM-8095
> URL: https://issues.apache.org/jira/browse/BEAM-8095
> Project: Beam
>  Issue Type: Improvement
>  Components: test-failures, testing
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Adding certain flags such as "-n 1" or "--debug" causes Python to abort.
> The --debug flag logs some information in a pytestdebug.log, but I'm not 
> familiar with it and what it means if anything.
> {code}
> $ pytest  
> apache_beam/runners/dataflow/dataflow_runner_test.py::DataflowRunnerTest::test_remote_runner_display_data
>  
> ===
>  test session starts 
> 
> platform linux -- Python 3.7.3rc1, pytest-5.1.1, py-1.8.0, pluggy-0.12.0
> rootdir: /usr/local/google/home/ehudm/src/beam/sdks/python, inifile: 
> pytest.ini
> plugins: forked-1.0.2, xdist-1.29.0
> collected 1 item  
>   
>   
>   
>
> apache_beam/runners/dataflow/dataflow_runner_test.py .
>   
>   
>   
>  [100%]
> 
>  1 passed in 0.11s 
> =
> $ pytest  
> apache_beam/runners/dataflow/dataflow_runner_test.py::DataflowRunnerTest::test_remote_runner_display_data
>   --debug
> writing pytestdebug information to 
> /usr/local/google/home/ehudm/src/beam/sdks/python/pytestdebug.log
> ===
>  test session starts 
> 
> platform linux -- Python 3.7.3rc1, pytest-5.1.1, py-1.8.0, pluggy-0.12.0 -- 
> /usr/local/google/home/ehudm/virtualenvs/beam-py37/bin/python3.7
> using: pytest-5.1.1 pylib-1.8.0
> setuptools registered plugins:
>   pytest-forked-1.0.2 at 
> /usr/local/google/home/ehudm/virtualenvs/beam-py37/lib/python3.7/site-packages/pytest_forked/__init__.py
>   pytest-xdist-1.29.0 at 
> /usr/local/google/home/ehudm/virtualenvs/beam-py37/lib/python3.7/site-packages/xdist/plugin.py
>   pytest-xdist-1.29.0 at 
> /usr/local/google/home/ehudm/virtualenvs/beam-py37/lib/python3.7/site-packages/xdist/looponfail.py
> rootdir: /usr/local/google/home/ehudm/src/beam/sdks/python, inifile: 
> pytest.ini
> plugins: forked-1.0.2, xdist-1.29.0
> collected 1 item  
>   
>   
>   
>
> apache_beam/runners/dataflow/dataflow_runner_test.py Aborted (core dumped)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=385638=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385638
 ]

ASF GitHub Bot logged work on BEAM-6522:


Author: ASF GitHub Bot
Created on: 12/Feb/20 02:45
Start Date: 12/Feb/20 02:45
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10838: [BEAM-6522] 
[BEAM-7455] Unskip Avro IO tests that are now passing.
URL: https://github.com/apache/beam/pull/10838#issuecomment-584992554
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385638)
Time Spent: 8h  (was: 7h 50m)

> Dill fails to pickle  avro.RecordSchema classes on Python 3.
> 
>
> Key: BEAM-6522
> URL: https://issues.apache.org/jira/browse/BEAM-6522
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> The avroio module still has 4 failing tests. This is actually 2 times the 
> same 2 tests, both for Avro and Fastavro.
> *apache_beam.io.avroio_test.TestAvro.test_sink_transform*
>  *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform*
> fail with:
> {code:java}
> Traceback (most recent call last):
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 432, in test_sink_transform
> | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line 
> 528, in expand
> return pcoll | beam.io.iobase.Write(self._sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 960, in expand
> return pcoll | WriteImpl(self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 979, in expand
> lambda _, sink: sink.initialize_write(), self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1103, in Map
> pardo = FlatMap(wrapper, *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1054, in FlatMap
> pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 864, in __init__
> super(ParDo, self).__init__(fn, *args, **kwargs)
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 646, in __init__
> self.args = pickler.loads(pickler.dumps(self.args))
> File 
> 

[jira] [Commented] (BEAM-7463) BigQueryQueryToTableIT is flaky on Direct runner in PostCommit suites: incorrect checksum

2020-02-11 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034991#comment-17034991
 ] 

Valentyn Tymofieiev commented on BEAM-7463:
---

Looks like this is still failing, or failing again:

https://builds.apache.org/job/beam_PostCommit_Python37_PR/77/testReport/junit/apache_beam.io.gcp.big_query_query_to_table_it_test/BigQueryQueryToTableIT/test_big_query_new_types_native/

Expected: (Test pipeline expected terminated in state: DONE and Expected 
checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214)
 but: Expected checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214 Actual 
checksum is da39a3ee5e6b4b0d3255bfef95601890afd80709


> BigQueryQueryToTableIT is flaky on Direct runner in PostCommit suites: 
> incorrect checksum 
> --
>
> Key: BEAM-7463
> URL: https://issues.apache.org/jira/browse/BEAM-7463
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
>  Labels: currently-failing
> Fix For: Not applicable
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> {noformat}
> 15:03:38 FAIL: test_big_query_new_types 
> (apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT)
> 15:03:38 
> --
> 15:03:38 Traceback (most recent call last):
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py",
>  line 211, in test_big_query_new_types
> 15:03:38 big_query_query_to_table_pipeline.run_bq_pipeline(options)
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py",
>  line 82, in run_bq_pipeline
> 15:03:38 result = p.run()
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 15:03:38 else test_runner_api))
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 15:03:38 self._options).run(False)
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 15:03:38 return self.runner.run_pipeline(self, self._options)
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
>  line 51, in run_pipeline
> 15:03:38 hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 15:03:38 AssertionError: 
> 15:03:38 Expected: (Test pipeline expected terminated in state: DONE and 
> Expected checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214)
> 15:03:38  but: Expected checksum is 
> 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214 Actual checksum is 
> da39a3ee5e6b4b0d3255bfef95601890afd80709
> {noformat}
> [~Juta] could this be caused by changes to Bigquery matcher? 
> https://github.com/apache/beam/pull/8621/files#diff-f1ec7e3a3e7e2e5082ddb7043954c108R134
>  
> cc: [~pabloem] [~chamikara] [~apilloud]
> A recent postcommit run has BQ failures in other tests as well: 
> https://builds.apache.org/job/beam_PostCommit_Python3_Verify/1000/consoleFull



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7284) Support Py3 Dataclasses

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7284?focusedWorklogId=385635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385635
 ]

ASF GitHub Bot logged work on BEAM-7284:


Author: ASF GitHub Bot
Created on: 12/Feb/20 02:31
Start Date: 12/Feb/20 02:31
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10837: [BEAM-7284] 
Cleanup MappingProxy reducer since dill supports it now.
URL: https://github.com/apache/beam/pull/10837#issuecomment-584988523
 
 
   Thanks for a quick review, @lazylynx !
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385635)
Time Spent: 1h 40m  (was: 1.5h)

> Support Py3 Dataclasses 
> 
>
> Key: BEAM-7284
> URL: https://issues.apache.org/jira/browse/BEAM-7284
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> It looks like dill does not support Dataclasses yet, 
> https://github.com/uqfoundation/dill/issues/312, which very likely means that 
> Beam does not support them either.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8095) pytest Py3.7 crashes on test_remote_runner_display_data

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8095?focusedWorklogId=385634=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385634
 ]

ASF GitHub Bot logged work on BEAM-8095:


Author: ASF GitHub Bot
Created on: 12/Feb/20 02:26
Start Date: 12/Feb/20 02:26
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10759: [BEAM-8095] 
Remove no_xdist for test
URL: https://github.com/apache/beam/pull/10759
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385634)
Time Spent: 0.5h  (was: 20m)

> pytest Py3.7 crashes on test_remote_runner_display_data
> ---
>
> Key: BEAM-8095
> URL: https://issues.apache.org/jira/browse/BEAM-8095
> Project: Beam
>  Issue Type: Improvement
>  Components: test-failures, testing
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Adding certain flags such as "-n 1" or "--debug" causes Python to abort.
> The --debug flag logs some information in a pytestdebug.log, but I'm not 
> familiar with it and what it means if anything.
> {code}
> $ pytest  
> apache_beam/runners/dataflow/dataflow_runner_test.py::DataflowRunnerTest::test_remote_runner_display_data
>  
> ===
>  test session starts 
> 
> platform linux -- Python 3.7.3rc1, pytest-5.1.1, py-1.8.0, pluggy-0.12.0
> rootdir: /usr/local/google/home/ehudm/src/beam/sdks/python, inifile: 
> pytest.ini
> plugins: forked-1.0.2, xdist-1.29.0
> collected 1 item  
>   
>   
>   
>
> apache_beam/runners/dataflow/dataflow_runner_test.py .
>   
>   
>   
>  [100%]
> 
>  1 passed in 0.11s 
> =
> $ pytest  
> apache_beam/runners/dataflow/dataflow_runner_test.py::DataflowRunnerTest::test_remote_runner_display_data
>   --debug
> writing pytestdebug information to 
> /usr/local/google/home/ehudm/src/beam/sdks/python/pytestdebug.log
> ===
>  test session starts 
> 
> platform linux -- Python 3.7.3rc1, pytest-5.1.1, py-1.8.0, pluggy-0.12.0 -- 
> /usr/local/google/home/ehudm/virtualenvs/beam-py37/bin/python3.7
> using: pytest-5.1.1 pylib-1.8.0
> setuptools registered plugins:
>   pytest-forked-1.0.2 at 
> /usr/local/google/home/ehudm/virtualenvs/beam-py37/lib/python3.7/site-packages/pytest_forked/__init__.py
>   pytest-xdist-1.29.0 at 
> /usr/local/google/home/ehudm/virtualenvs/beam-py37/lib/python3.7/site-packages/xdist/plugin.py
>   pytest-xdist-1.29.0 at 
> /usr/local/google/home/ehudm/virtualenvs/beam-py37/lib/python3.7/site-packages/xdist/looponfail.py
> rootdir: /usr/local/google/home/ehudm/src/beam/sdks/python, inifile: 
> pytest.ini
> plugins: forked-1.0.2, xdist-1.29.0
> collected 1 item  
>   
> 

[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=385633=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385633
 ]

ASF GitHub Bot logged work on BEAM-9291:


Author: ASF GitHub Bot
Created on: 12/Feb/20 02:17
Start Date: 12/Feb/20 02:17
Worklog Time Spent: 10m 
  Work Description: stankiewicz commented on issue #10829: [BEAM-9291] 
Upload graph option in dataflow's python sdk
URL: https://github.com/apache/beam/pull/10829#issuecomment-584984570
 
 
   I have one challenge with full e2e test - python version stages full 
apache-beam package.
   pip download 2.20dev0 fails. 
   I can try to patch locally 2.19 tag just with this PR or maybe there is 
other way to test runner?
   When I change versions to 2.19 then eventually I have error some errors on 
version clash (probably haven't changed everywhere), but I can see that 
pipeline.json is uploaded, it contains steps and graph is rendered properly in 
console. 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385633)
Time Spent: 50m  (was: 40m)

> upload_graph support in Dataflow Python SDK
> ---
>
> Key: BEAM-9291
> URL: https://issues.apache.org/jira/browse/BEAM-9291
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Radosław Stankiewicz
>Assignee: Radosław Stankiewicz
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> upload_graph option is not supported in Dataflow's Python SDK so there is no 
> workaround for large graphs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7284) Support Py3 Dataclasses

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7284?focusedWorklogId=385628=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385628
 ]

ASF GitHub Bot logged work on BEAM-7284:


Author: ASF GitHub Bot
Created on: 12/Feb/20 02:15
Start Date: 12/Feb/20 02:15
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10837: [BEAM-7284] 
Cleanup MappingProxy reducer since dill supports it now.
URL: https://github.com/apache/beam/pull/10837
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385628)
Time Spent: 1.5h  (was: 1h 20m)

> Support Py3 Dataclasses 
> 
>
> Key: BEAM-7284
> URL: https://issues.apache.org/jira/browse/BEAM-7284
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It looks like dill does not support Dataclasses yet, 
> https://github.com/uqfoundation/dill/issues/312, which very likely means that 
> Beam does not support them either.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=385627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385627
 ]

ASF GitHub Bot logged work on BEAM-9291:


Author: ASF GitHub Bot
Created on: 12/Feb/20 02:14
Start Date: 12/Feb/20 02:14
Worklog Time Spent: 10m 
  Work Description: stankiewicz commented on pull request #10829: 
[BEAM-9291] Upload graph option in dataflow's python sdk
URL: https://github.com/apache/beam/pull/10829#discussion_r378006760
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -539,6 +541,15 @@ def run_pipeline(self, pipeline, options):
 # Get a Dataflow API client and set its options
 self.dataflow_client = apiclient.DataflowApplicationClient(options)
 
+if self.job.options.view_as(GoogleCloudOptions).upload_graph:
 
 Review comment:
   Re hiding it in apiclient - in first commit I've implemented it that way 
then I've noticed java implementation where it is implemented  in runner 
instead of apiclient and decided to move.  Will move again and rewrite tests. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385627)
Time Spent: 40m  (was: 0.5h)

> upload_graph support in Dataflow Python SDK
> ---
>
> Key: BEAM-9291
> URL: https://issues.apache.org/jira/browse/BEAM-9291
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Radosław Stankiewicz
>Assignee: Radosław Stankiewicz
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> upload_graph option is not supported in Dataflow's Python SDK so there is no 
> workaround for large graphs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=385626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385626
 ]

ASF GitHub Bot logged work on BEAM-9291:


Author: ASF GitHub Bot
Created on: 12/Feb/20 02:13
Start Date: 12/Feb/20 02:13
Worklog Time Spent: 10m 
  Work Description: stankiewicz commented on pull request #10829: 
[BEAM-9291] Upload graph option in dataflow's python sdk
URL: https://github.com/apache/beam/pull/10829#discussion_r378006495
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -562,6 +562,12 @@ def _add_argparse_args(cls, parser):
 default=None,
 choices=['COST_OPTIMIZED', 'SPEED_OPTIMIZED'],
 help='Set the Flexible Resource Scheduling mode')
+parser.add_argument(
 
 Review comment:
   Re option, ok, I will handle it as experiment.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385626)
Time Spent: 0.5h  (was: 20m)

> upload_graph support in Dataflow Python SDK
> ---
>
> Key: BEAM-9291
> URL: https://issues.apache.org/jira/browse/BEAM-9291
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Radosław Stankiewicz
>Assignee: Radosław Stankiewicz
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> upload_graph option is not supported in Dataflow's Python SDK so there is no 
> workaround for large graphs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=385624=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385624
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 12/Feb/20 02:11
Start Date: 12/Feb/20 02:11
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-584983018
 
 
   @EDjur - This LGTM. I would like to merge it but I cannot seem to trigger 
the tests
   
   @markflyhigh is there a manual way to run the tests?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385624)
Time Spent: 6h 40m  (was: 6.5h)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=385623=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385623
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 12/Feb/20 02:10
Start Date: 12/Feb/20 02:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-584982808
 
 
   tests, will you trigger?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385623)
Time Spent: 6.5h  (was: 6h 20m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7284) Support Py3 Dataclasses

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7284?focusedWorklogId=385620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385620
 ]

ASF GitHub Bot logged work on BEAM-7284:


Author: ASF GitHub Bot
Created on: 12/Feb/20 01:58
Start Date: 12/Feb/20 01:58
Worklog Time Spent: 10m 
  Work Description: lazylynx commented on issue #10837: [BEAM-7284] Cleanup 
MappingProxy reducer since dill supports it now.
URL: https://github.com/apache/beam/pull/10837#issuecomment-584979637
 
 
   @tvalentyn LGTM, thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385620)
Time Spent: 1h 20m  (was: 1h 10m)

> Support Py3 Dataclasses 
> 
>
> Key: BEAM-7284
> URL: https://issues.apache.org/jira/browse/BEAM-7284
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> It looks like dill does not support Dataclasses yet, 
> https://github.com/uqfoundation/dill/issues/312, which very likely means that 
> Beam does not support them either.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9301) Check in beam-linkage-check.sh

2020-02-11 Thread Tomo Suzuki (Jira)
Tomo Suzuki created BEAM-9301:
-

 Summary: Check in beam-linkage-check.sh
 Key: BEAM-9301
 URL: https://issues.apache.org/jira/browse/BEAM-9301
 Project: Beam
  Issue Type: Task
  Components: build-system
Reporter: Tomo Suzuki
Assignee: Tomo Suzuki


https://github.com/apache/beam/pull/10769#issuecomment-584571787

bq. @suztomo can you contribute this script maybe into Beam's build-tools 
directory so we can improve it a bit for further use?


This is a temporary solution before exclusion rules in Linkage Checker 
(BEAM-9206) are implemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8201) clean up the current container API

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8201?focusedWorklogId=385610=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385610
 ]

ASF GitHub Bot logged work on BEAM-8201:


Author: ASF GitHub Bot
Created on: 12/Feb/20 01:37
Start Date: 12/Feb/20 01:37
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10839: [BEAM-8201] 
Add other endpoint fields to provision API.
URL: https://github.com/apache/beam/pull/10839
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=385595=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385595
 ]

ASF GitHub Bot logged work on BEAM-6522:


Author: ASF GitHub Bot
Created on: 12/Feb/20 00:59
Start Date: 12/Feb/20 00:59
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10838: [BEAM-6522] 
[BEAM-7455] Unskip Avro IO tests that are now passing.
URL: https://github.com/apache/beam/pull/10838#issuecomment-584957126
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385595)
Time Spent: 7h 50m  (was: 7h 40m)

> Dill fails to pickle  avro.RecordSchema classes on Python 3.
> 
>
> Key: BEAM-6522
> URL: https://issues.apache.org/jira/browse/BEAM-6522
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> The avroio module still has 4 failing tests. This is actually 2 times the 
> same 2 tests, both for Avro and Fastavro.
> *apache_beam.io.avroio_test.TestAvro.test_sink_transform*
>  *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform*
> fail with:
> {code:java}
> Traceback (most recent call last):
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 432, in test_sink_transform
> | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line 
> 528, in expand
> return pcoll | beam.io.iobase.Write(self._sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 960, in expand
> return pcoll | WriteImpl(self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 979, in expand
> lambda _, sink: sink.initialize_write(), self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1103, in Map
> pardo = FlatMap(wrapper, *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1054, in FlatMap
> pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 864, in __init__
> super(ParDo, self).__init__(fn, *args, **kwargs)
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 646, in __init__
> self.args = pickler.loads(pickler.dumps(self.args))
> File 
> 

[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=385594=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385594
 ]

ASF GitHub Bot logged work on BEAM-6522:


Author: ASF GitHub Bot
Created on: 12/Feb/20 00:59
Start Date: 12/Feb/20 00:59
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10838: [BEAM-6522] 
[BEAM-7455] Unskip Avro IO tests that are now passing.
URL: https://github.com/apache/beam/pull/10838
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 

[jira] [Work logged] (BEAM-7284) Support Py3 Dataclasses

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7284?focusedWorklogId=385585=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385585
 ]

ASF GitHub Bot logged work on BEAM-7284:


Author: ASF GitHub Bot
Created on: 12/Feb/20 00:51
Start Date: 12/Feb/20 00:51
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10837: [BEAM-7284] 
Cleanup MappingProxy reducer since dill supports it now.
URL: https://github.com/apache/beam/pull/10837#issuecomment-584952973
 
 
   R: @lazylynx 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385585)
Time Spent: 1h 10m  (was: 1h)

> Support Py3 Dataclasses 
> 
>
> Key: BEAM-7284
> URL: https://issues.apache.org/jira/browse/BEAM-7284
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> It looks like dill does not support Dataclasses yet, 
> https://github.com/uqfoundation/dill/issues/312, which very likely means that 
> Beam does not support them either.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7198) Rename ToStringCoder into ToBytesCoder

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7198?focusedWorklogId=385578=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385578
 ]

ASF GitHub Bot logged work on BEAM-7198:


Author: ASF GitHub Bot
Created on: 12/Feb/20 00:37
Start Date: 12/Feb/20 00:37
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10828: [BEAM-7198] rename 
ToStringCoder to ToBytesCoder for proper representation of its role
URL: https://github.com/apache/beam/pull/10828#issuecomment-584945917
 
 
   Changes LGTM, we can merge once tests pass. Thanks a lot, @lazylynx !
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385578)
Time Spent: 50m  (was: 40m)

> Rename ToStringCoder into ToBytesCoder
> --
>
> Key: BEAM-7198
> URL: https://issues.apache.org/jira/browse/BEAM-7198
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
>  Labels: easy-fix, starter
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The name of ToStringCoder class [1] is confusing, since the output of 
> encode() on Python3 will be bytes. On Python 2 the output is also bytes, 
> since bytes and string are synonyms on Py2.
> ToBytesCoder would be a better name for this class. 
> Note that this class is not listed in coders that constitute Public APIs [2], 
> so we can treat this as internal change. As a courtesy to users  who happened 
> to reference a non-public coder in their pipelines we can keep the old class 
> name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but 
> clean up Beam codeabase to use the new name.
> [1] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344
> [2] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20
> cc: [~yoshiki.obata] [~chamikara]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7198) Rename ToStringCoder into ToBytesCoder

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7198?focusedWorklogId=385577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385577
 ]

ASF GitHub Bot logged work on BEAM-7198:


Author: ASF GitHub Bot
Created on: 12/Feb/20 00:35
Start Date: 12/Feb/20 00:35
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10828: [BEAM-7198] rename 
ToStringCoder to ToBytesCoder for proper representation of its role
URL: https://github.com/apache/beam/pull/10828#issuecomment-584945241
 
 
   test test test
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385577)
Time Spent: 40m  (was: 0.5h)

> Rename ToStringCoder into ToBytesCoder
> --
>
> Key: BEAM-7198
> URL: https://issues.apache.org/jira/browse/BEAM-7198
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
>  Labels: easy-fix, starter
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The name of ToStringCoder class [1] is confusing, since the output of 
> encode() on Python3 will be bytes. On Python 2 the output is also bytes, 
> since bytes and string are synonyms on Py2.
> ToBytesCoder would be a better name for this class. 
> Note that this class is not listed in coders that constitute Public APIs [2], 
> so we can treat this as internal change. As a courtesy to users  who happened 
> to reference a non-public coder in their pipelines we can keep the old class 
> name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but 
> clean up Beam codeabase to use the new name.
> [1] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344
> [2] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20
> cc: [~yoshiki.obata] [~chamikara]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9290) runner_harness_container_image experiment is not honored in python released sdks.

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9290?focusedWorklogId=385573=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385573
 ]

ASF GitHub Bot logged work on BEAM-9290:


Author: ASF GitHub Bot
Created on: 12/Feb/20 00:33
Start Date: 12/Feb/20 00:33
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10827: [BEAM-9290] 
Support runner_harness_container_image in released python…
URL: https://github.com/apache/beam/pull/10827#issuecomment-584943995
 
 
   Thanks for fixing this. It's unfortunate it wasn't done correctly and tests 
did not catch it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385573)
Time Spent: 50m  (was: 40m)

> runner_harness_container_image experiment is not honored in python released 
> sdks.
> -
>
> Key: BEAM-9290
> URL: https://issues.apache.org/jira/browse/BEAM-9290
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> --experiments=runner_harness_container_image=foo_image{code}
> does not have any affect on the job.
>  
>  
> cc: [~tvalentyn]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9160) Update AWS SDK to support Kubernetes Pod Level Identity

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9160?focusedWorklogId=385564=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385564
 ]

ASF GitHub Bot logged work on BEAM-9160:


Author: ASF GitHub Bot
Created on: 12/Feb/20 00:16
Start Date: 12/Feb/20 00:16
Worklog Time Spent: 10m 
  Work Description: ecapoccia commented on pull request #10825: [BEAM-9160] 
Update AWS SDK to support Pod Level Identity
URL: https://github.com/apache/beam/pull/10825#discussion_r377976039
 
 

 ##
 File path: 
sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/options/AwsModule.java
 ##
 @@ -122,6 +123,8 @@ public AWSCredentialsProvider deserializeWithType(
 return new SystemPropertiesCredentialsProvider();
   } else if 
(typeName.equals(ProfileCredentialsProvider.class.getSimpleName())) {
 return new ProfileCredentialsProvider();
+  } else if 
(typeName.equals(WebIdentityTokenCredentialsProvider.class.getSimpleName())) {
+return WebIdentityTokenCredentialsProvider.create();
 
 Review comment:
   @iemejia @andeb opened PR https://github.com/apache/beam/pull/10836 as 
discussed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385564)
Time Spent: 2h 40m  (was: 2.5h)

> Update AWS SDK to support Kubernetes Pod Level Identity
> ---
>
> Key: BEAM-9160
> URL: https://issues.apache.org/jira/browse/BEAM-9160
> Project: Beam
>  Issue Type: Improvement
>  Components: dependencies
>Affects Versions: 2.17.0
>Reporter: Mohamed Noah
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Many organizations have started leveraging pod level identity in Kubernetes. 
> The current version of the AWS SDK packaged with Beam 2.17.0 is out of date 
> and doesn't provide native support to pod level identity access management.
>  
> It is recommended that we introduce support to access AWS resources such as 
> S3 using pod level identity. 
> Current Version of the AWS Java SDK in Beam:
> def aws_java_sdk_version = "1.11.519"
> Proposed AWS Java SDK Version:
> 
>  com.amazonaws
>  aws-java-sdk
>  1.11.710
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9160) Update AWS SDK to support Kubernetes Pod Level Identity

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9160?focusedWorklogId=385559=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385559
 ]

ASF GitHub Bot logged work on BEAM-9160:


Author: ASF GitHub Bot
Created on: 12/Feb/20 00:08
Start Date: 12/Feb/20 00:08
Worklog Time Spent: 10m 
  Work Description: ecapoccia commented on pull request #10836: BEAM-9160
URL: https://github.com/apache/beam/pull/10836
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385559)
Time Spent: 2.5h  (was: 2h 20m)

> Update AWS SDK to support Kubernetes Pod Level Identity
> ---
>
> Key: BEAM-9160
> URL: https://issues.apache.org/jira/browse/BEAM-9160
> Project: Beam
>  Issue Type: Improvement
>  Components: dependencies
>Affects Versions: 2.17.0
>Reporter: Mohamed Noah
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Many organizations have started leveraging pod level identity in Kubernetes. 
> The current version of the AWS SDK packaged with Beam 2.17.0 is out of date 
> and doesn't provide native support to pod level identity access management.
>  
> It is recommended that we introduce support to access AWS resources such as 
> S3 using pod level identity. 
> Current Version of the AWS Java SDK in Beam:
> def aws_java_sdk_version = "1.11.519"
> Proposed AWS Java SDK Version:
> 
>  com.amazonaws
>  aws-java-sdk
>  1.11.710
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=385556=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385556
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 12/Feb/20 00:05
Start Date: 12/Feb/20 00:05
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on issue #10835: [BEAM-8575] 
Removed MAX_TIMESTAMP from testing data
URL: https://github.com/apache/beam/pull/10835#issuecomment-584930246
 
 
   R: @angoenka 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385556)
Time Spent: 49h 40m  (was: 49.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 49h 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=38=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-38
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 12/Feb/20 00:05
Start Date: 12/Feb/20 00:05
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #10835: [BEAM-8575] 
Removed MAX_TIMESTAMP from testing data
URL: https://github.com/apache/beam/pull/10835#discussion_r377972727
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util_test.py
 ##
 @@ -551,9 +551,6 @@ def test_reshuffle_preserves_timestamps(self):
   {
   'name': 'bar', 'timestamp': 33
   },
 
 Review comment:
   Can we un-sickbay this test after this fix?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 38)
Time Spent: 49.5h  (was: 49h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 49.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=385553=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385553
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 12/Feb/20 00:05
Start Date: 12/Feb/20 00:05
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #10835: [BEAM-8575] Removed 
MAX_TIMESTAMP from testing data
URL: https://github.com/apache/beam/pull/10835#issuecomment-584929984
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385553)
Time Spent: 49h 20m  (was: 49h 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 49h 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9219) Streamline creation of Python and Java dependencies pages

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9219?focusedWorklogId=385551=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385551
 ]

ASF GitHub Bot logged work on BEAM-9219:


Author: ASF GitHub Bot
Created on: 12/Feb/20 00:04
Start Date: 12/Feb/20 00:04
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10745: [BEAM-9219] 
Streamline creation of Python and Java dependencies pages
URL: https://github.com/apache/beam/pull/10745
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385551)
Time Spent: 11.5h  (was: 11h 20m)

> Streamline creation of Python and Java dependencies pages
> -
>
> Key: BEAM-9219
> URL: https://issues.apache.org/jira/browse/BEAM-9219
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: David Wrede
>Priority: Minor
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> This issue is about the need to address keeping both Python and Java SDK 
> dependency pages more relevant and up-to-date while reducing the amount of 
> time it takes to provide that information. The current method of scraping and 
> copying dependencies into a table for every release is a non-trivial task 
> because of the semi-automated workflows done by the tech writers on the 
> website.
> In an effort to provide accurate dependency listings that are always in sync 
> with SDK releases, referring people to the appropriate places in the source 
> code (or through CLI commands) should provide people the information they are 
> looking for and not require the creation and maintenance of an automated 
> tooling solution to generate the dependency tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=385528=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385528
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 11/Feb/20 23:14
Start Date: 11/Feb/20 23:14
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10835: 
[BEAM-8575] Removed MAX_TIMESTAMP from testing data
URL: https://github.com/apache/beam/pull/10835
 
 
   Removed MAX_TIMESTAMP from testing data, because the semantics of these 
extremum elements are not decided yet.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 

[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=385506=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385506
 ]

ASF GitHub Bot logged work on BEAM-5605:


Author: ASF GitHub Bot
Created on: 11/Feb/20 22:47
Start Date: 11/Feb/20 22:47
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #10576: [BEAM-5605] 
Convert all BoundedSources to SplittableDoFns when using beam_fn_api experiment.
URL: https://github.com/apache/beam/pull/10576#discussion_r377945680
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/JavaReadViaImpulse.java
 ##
 @@ -1,176 +0,0 @@
-/*
 
 Review comment:
   Gotcha.
   
   I already approved this, but consider it double-approved.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385506)
Time Spent: 12h 40m  (was: 12.5h)

> Support Portable SplittableDoFn for batch
> -
>
> Key: BEAM-5605
> URL: https://issues.apache.org/jira/browse/BEAM-5605
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>
> Roll-up item tracking work towards supporting portable SplittableDoFn for 
> batch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385490=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385490
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 22:00
Start Date: 11/Feb/20 22:00
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377925605
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * PCollection examples = 
files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto);
+ * }
+ *
+ * Writing Thrift Files
+ *
+ * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} 
to be written to
+ * Thrift files. It can be used with the general-purpose {@link FileIO} 
transforms with
+ * FileIO.write/writeDynamic specifically.
+ *
+ * For example:
+ *
+ * {@code
+ * pipeline
+ *   .apply(...) // PCollection
+ *   .apply(FileIO
+ * .write()
+ * .via(ThriftIO.sink(thriftProto))
+ * .to("destination/path");
+ * }
+ *
+ * This IO API is considered experimental and may break or receive 
backwards-incompatible changes
+ * in future versions of the Apache Beam SDK.
+ */
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public class ThriftIO {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class);
+
+  /** Disable construction of utility class. */
+  private ThriftIO() {}
+
+  /**
+   * Reads each file in a {@link PCollection} of {@link 
org.apache.beam.sdk.io.FileIO.ReadableFile},
+   * which allows more flexible usage.
+   */
+  public static  ReadFiles readFiles(Class recordClass) {
+return new 
AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build();
+  }
+
+  
//
+
+  /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link 
FileIO#writeDynamic}. */
+  public static > Sink 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385488
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:59
Start Date: 11/Feb/20 21:59
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377925182
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * PCollection examples = 
files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto);
+ * }
+ *
+ * Writing Thrift Files
+ *
+ * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} 
to be written to
+ * Thrift files. It can be used with the general-purpose {@link FileIO} 
transforms with
+ * FileIO.write/writeDynamic specifically.
+ *
+ * For example:
+ *
+ * {@code
+ * pipeline
+ *   .apply(...) // PCollection
+ *   .apply(FileIO
+ * .write()
+ * .via(ThriftIO.sink(thriftProto))
+ * .to("destination/path");
+ * }
+ *
+ * This IO API is considered experimental and may break or receive 
backwards-incompatible changes
+ * in future versions of the Apache Beam SDK.
+ */
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public class ThriftIO {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class);
+
+  /** Disable construction of utility class. */
+  private ThriftIO() {}
+
+  /**
+   * Reads each file in a {@link PCollection} of {@link 
org.apache.beam.sdk.io.FileIO.ReadableFile},
+   * which allows more flexible usage.
+   */
+  public static  ReadFiles readFiles(Class recordClass) {
+return new 
AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build();
+  }
+
+  
//
+
+  /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link 
FileIO#writeDynamic}. */
+  public static > Sink 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385489
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:59
Start Date: 11/Feb/20 21:59
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377925387
 
 

 ##
 File path: 
sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftIOTest.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import 
org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomStringUtils;
+import org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomUtils;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.values.PCollection;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Resources;
+import org.apache.thrift.protocol.TBinaryProtocol;
+import org.apache.thrift.protocol.TCompactProtocol;
+import org.apache.thrift.protocol.TJSONProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.protocol.TSimpleJSONProtocol;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link ThriftIO}. */
+@RunWith(JUnit4.class)
+public class ThriftIOTest implements Serializable {
+
+  private static final String RESOURCE_DIR = "ThriftIOTest/";
+
+  private static final String THRIFT_DIR = 
Resources.getResource(RESOURCE_DIR).getPath();
+  private static final String ALL_THRIFT_STRING =
+  Resources.getResource(RESOURCE_DIR).getPath() + "*";
+  private static final TestThriftStruct TEST_THRIFT_STRUCT = new 
TestThriftStruct();
+  private static List testThriftStructs;
+  private final TProtocolFactory tBinaryProtoFactory = new 
TBinaryProtocol.Factory();
+  private final TProtocolFactory tJsonProtocolFactory = new 
TJSONProtocol.Factory();
+  private final TProtocolFactory tSimpleJsonProtocolFactory = new 
TSimpleJSONProtocol.Factory();
+  private final TProtocolFactory tCompactProtocolFactory = new 
TCompactProtocol.Factory();
+  @Rule public transient TestPipeline mainPipeline = TestPipeline.create();
+  @Rule public transient TestPipeline readPipeline = TestPipeline.create();
+  @Rule public transient TestPipeline writePipeline = TestPipeline.create();
+  @Rule public transient TemporaryFolder temporaryFolder = new 
TemporaryFolder();
+
+  @Before
+  public void setUp() throws Exception {
+byte[] bytes = new byte[10];
+ByteBuffer buffer = ByteBuffer.wrap(bytes);
+
+TEST_THRIFT_STRUCT.testByte = 100;
+TEST_THRIFT_STRUCT.testShort = 200;
+TEST_THRIFT_STRUCT.testInt = 2500;
+TEST_THRIFT_STRUCT.testLong = 79303L;
+TEST_THRIFT_STRUCT.testDouble = 25.007;
+TEST_THRIFT_STRUCT.testBool = true;
+TEST_THRIFT_STRUCT.stringIntMap = new HashMap<>();
+TEST_THRIFT_STRUCT.stringIntMap.put("first", (short) 1);
+TEST_THRIFT_STRUCT.stringIntMap.put("second", (short) 2);
+TEST_THRIFT_STRUCT.testBinary = buffer;
+
+testThriftStructs = ImmutableList.copyOf(generateTestObjects(1000L));
+  }
+
+  /** Tests {@link ThriftIO#readFiles(Class)} with {@link TBinaryProtocol}. */
+  @Test
+  public void testReadFilesBinaryProtocol() {
+
+PCollection testThriftDoc =
+mainPipeline
+.apply(Create.of(THRIFT_DIR + 
"data").withCoder(StringUtf8Coder.of()))
+

[jira] [Work logged] (BEAM-9269) Set shorter Commit Deadline and handle with backoff/retry

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9269?focusedWorklogId=385485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385485
 ]

ASF GitHub Bot logged work on BEAM-9269:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:53
Start Date: 11/Feb/20 21:53
Worklog Time Spent: 10m 
  Work Description: nielm commented on pull request #10752: [BEAM-9269] Add 
commit deadline for Spanner writes.
URL: https://github.com/apache/beam/pull/10752#discussion_r377922130
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerConfig.java
 ##
 @@ -136,13 +163,51 @@ public SpannerConfig withHost(ValueProvider 
host) {
 return toBuilder().setHost(host).build();
   }
 
+  public SpannerConfig withCommitDeadline(Duration commitDeadline) {
+return 
withCommitDeadline(ValueProvider.StaticValueProvider.of(commitDeadline));
+  }
+
+  public SpannerConfig withCommitDeadline(ValueProvider 
commitDeadline) {
+return toBuilder().setCommitDeadline(commitDeadline).build();
+  }
+
+  public SpannerConfig withMaxCumulativeBackoff(Duration maxCumulativeBackoff) 
{
+return 
withMaxCumulativeBackoff(ValueProvider.StaticValueProvider.of(maxCumulativeBackoff));
+  }
+
+  public SpannerConfig withMaxCumulativeBackoff(ValueProvider 
maxCumulativeBackoff) {
+return toBuilder().setMaxCumulativeBackoff(maxCumulativeBackoff).build();
+  }
+
   @VisibleForTesting
   SpannerConfig withServiceFactory(ServiceFactory 
serviceFactory) {
 return toBuilder().setServiceFactory(serviceFactory).build();
   }
 
   public SpannerAccessor connectToSpanner() {
 SpannerOptions.Builder builder = SpannerOptions.newBuilder();
+
+if (getCommitDeadline() != null && getCommitDeadline().get().getMillis() > 
0) {
+
+  // In Spanner API version 1.21 or above, we can set the deadline / total 
Timeout on an API
+  // call using the following code:
+  //
+  // UnaryCallSettings.Builder commitSettings =
+  // builder.getSpannerStubSettingsBuilder().commitSettings();
+  // RetrySettings.Builder commitRetrySettings = 
commitSettings.getRetrySettings().toBuilder()
+  // commitSettings.setRetrySettings(
+  // commitRetrySettings.setTotalTimeout(
+  // Duration.ofMillis(getCommitDeadlineMillis().get()))
+  // .build());
+  //
+  // However, at time of this commit, the Spanner API is at only at 
v1.6.0, where the only
+  // method to set a deadline is with GRPC Interceptors, so we have to use 
that...
+  SpannerInterceptorProvider interceptorProvider =
+  SpannerInterceptorProvider.createDefault()
+  .with(new 
CommitDeadlineSettingInterceptor(getCommitDeadline().get()));
 
 Review comment:
   > Just to confirm, this deadline will not cause Dataflow workitems to fail 
but just that request will be retried by SpannerIO within the same workitem
   
   Correct, it will backoff/retry up to a configurable time limit (default 15 
mins per workitem). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385485)
Time Spent: 3h 50m  (was: 3h 40m)

> Set shorter Commit Deadline and handle with backoff/retry
> -
>
> Key: BEAM-9269
> URL: https://issues.apache.org/jira/browse/BEAM-9269
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0, 2.18.0, 2.19.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: Major
>  Labels: google-cloud-spanner
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Default commit deadline in Spanner is 1hr, which can lead to a variety of 
> issues including database overload and session expiry.
> Shorter deadline should be set with backoff/retry when deadline expires, so 
> that the Spanner database does not become overloaded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385484=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385484
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:53
Start Date: 11/Feb/20 21:53
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377922006
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.io.OutputStream;
+import org.apache.beam.sdk.coders.CoderException;
+import org.apache.beam.sdk.coders.CustomCoder;
+
+public class ThriftCoder extends CustomCoder {
 
 Review comment:
   Done updated
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385484)
Time Spent: 13h 40m  (was: 13.5h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385482
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:51
Start Date: 11/Feb/20 21:51
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377921210
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
 
 Review comment:
   Sounds good, I'll remove references to `read()`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385482)
Time Spent: 13.5h  (was: 13h 20m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9160) Update AWS SDK to support Kubernetes Pod Level Identity

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9160?focusedWorklogId=385479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385479
 ]

ASF GitHub Bot logged work on BEAM-9160:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:46
Start Date: 11/Feb/20 21:46
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10825: [BEAM-9160] Update 
AWS SDK to support Pod Level Identity
URL: https://github.com/apache/beam/pull/10825#issuecomment-584869325
 
 
   The linkage errors from svm is false positives.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385479)
Time Spent: 2h 20m  (was: 2h 10m)

> Update AWS SDK to support Kubernetes Pod Level Identity
> ---
>
> Key: BEAM-9160
> URL: https://issues.apache.org/jira/browse/BEAM-9160
> Project: Beam
>  Issue Type: Improvement
>  Components: dependencies
>Affects Versions: 2.17.0
>Reporter: Mohamed Noah
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Many organizations have started leveraging pod level identity in Kubernetes. 
> The current version of the AWS SDK packaged with Beam 2.17.0 is out of date 
> and doesn't provide native support to pod level identity access management.
>  
> It is recommended that we introduce support to access AWS resources such as 
> S3 using pod level identity. 
> Current Version of the AWS Java SDK in Beam:
> def aws_java_sdk_version = "1.11.519"
> Proposed AWS Java SDK Version:
> 
>  com.amazonaws
>  aws-java-sdk
>  1.11.710
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385478=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385478
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:45
Start Date: 11/Feb/20 21:45
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377918220
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.io.OutputStream;
+import org.apache.beam.sdk.coders.CoderException;
+import org.apache.beam.sdk.coders.CustomCoder;
+
+public class ThriftCoder extends CustomCoder {
+
+  public static  ThriftCoder of() {
 
 Review comment:
   Done, removed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385478)
Time Spent: 13h 20m  (was: 13h 10m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9160) Update AWS SDK to support Kubernetes Pod Level Identity

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9160?focusedWorklogId=385477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385477
 ]

ASF GitHub Bot logged work on BEAM-9160:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:43
Start Date: 11/Feb/20 21:43
Worklog Time Spent: 10m 
  Work Description: andeb commented on issue #10825: [BEAM-9160] Update AWS 
SDK to support Pod Level Identity
URL: https://github.com/apache/beam/pull/10825#issuecomment-584867846
 
 
   Thanks for reviewing and merging it, @iemejia!
   
   With regards to the linkage errors, they may be false positives considering 
that there are lines of code to ignore classes from GraalVm? 
https://github.com/GoogleCloudPlatform/cloud-opensource-java/blob/master/dependencies/src/main/java/com/google/cloud/tools/opensource/classpath/LinkageChecker.java#L208-L213
   
   Also, I thought it made sense to expose 
`WebIdentityTokenCredentialsProvider` directly to be consistent with other 
exposed credential providers. Happy to do any cleanup though but it seems that 
@ecapoccia is already on it!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385477)
Time Spent: 2h 10m  (was: 2h)

> Update AWS SDK to support Kubernetes Pod Level Identity
> ---
>
> Key: BEAM-9160
> URL: https://issues.apache.org/jira/browse/BEAM-9160
> Project: Beam
>  Issue Type: Improvement
>  Components: dependencies
>Affects Versions: 2.17.0
>Reporter: Mohamed Noah
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Many organizations have started leveraging pod level identity in Kubernetes. 
> The current version of the AWS SDK packaged with Beam 2.17.0 is out of date 
> and doesn't provide native support to pod level identity access management.
>  
> It is recommended that we introduce support to access AWS resources such as 
> S3 using pod level identity. 
> Current Version of the AWS Java SDK in Beam:
> def aws_java_sdk_version = "1.11.519"
> Proposed AWS Java SDK Version:
> 
>  com.amazonaws
>  aws-java-sdk
>  1.11.710
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385476
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:41
Start Date: 11/Feb/20 21:41
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377916026
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * PCollection examples = 
files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto);
+ * }
+ *
+ * Writing Thrift Files
+ *
+ * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} 
to be written to
+ * Thrift files. It can be used with the general-purpose {@link FileIO} 
transforms with
+ * FileIO.write/writeDynamic specifically.
+ *
+ * For example:
+ *
+ * {@code
+ * pipeline
+ *   .apply(...) // PCollection
+ *   .apply(FileIO
+ * .write()
+ * .via(ThriftIO.sink(thriftProto))
+ * .to("destination/path");
+ * }
+ *
+ * This IO API is considered experimental and may break or receive 
backwards-incompatible changes
+ * in future versions of the Apache Beam SDK.
+ */
+@Experimental(Experimental.Kind.SOURCE_SINK)
 
 Review comment:
   Done updated
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385476)
Time Spent: 13h 10m  (was: 13h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385475=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385475
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:38
Start Date: 11/Feb/20 21:38
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377914782
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
 
 Review comment:
   +1 to not have a `read()` , less 'useless' code to maintain, other File 
based IOs only have it for historical reasons and we decided to deprecate 
`readAll` transforms too to make FileIO.match + read composition more explicit 
since it cover more cases.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385475)
Time Spent: 13h  (was: 12h 50m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385474
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:33
Start Date: 11/Feb/20 21:33
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377912388
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
 
 Review comment:
   Correct `read()` is not implemented and I will remove references to it 
unless we think it should be implemented. I think `readFiles()` will cover 
everything but the simple use case. What are your thoughts?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385474)
Time Spent: 12h 50m  (was: 12h 40m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 12h 50m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385473=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385473
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:29
Start Date: 11/Feb/20 21:29
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377910032
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
 
 Review comment:
   This reference will be removed. `readFiles()` will take in the class as it 
is needed for the pipeline to deserialize the data into. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385473)
Time Spent: 12h 40m  (was: 12.5h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=385470=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385470
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:28
Start Date: 11/Feb/20 21:28
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #10802: [BEAM-8537] Move 
wrappers of RestrictionTracker out of iobase
URL: https://github.com/apache/beam/pull/10802#issuecomment-584861527
 
 
   All tests passed. I'm going to merge the PR and work on integrating 
https://github.com/apache/beam/pull/10375. Thanks, everyone!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385470)
Time Spent: 14h 10m  (was: 14h)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 14h 10m
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=385471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385471
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:28
Start Date: 11/Feb/20 21:28
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #10802: [BEAM-8537] 
Move wrappers of RestrictionTracker out of iobase
URL: https://github.com/apache/beam/pull/10802
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385471)
Time Spent: 14h 20m  (was: 14h 10m)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 14h 20m
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385469=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385469
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:27
Start Date: 11/Feb/20 21:27
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377909310
 
 

 ##
 File path: 
sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftIOTest.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import 
org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomStringUtils;
+import org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomUtils;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.values.PCollection;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Resources;
+import org.apache.thrift.protocol.TBinaryProtocol;
+import org.apache.thrift.protocol.TCompactProtocol;
+import org.apache.thrift.protocol.TJSONProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.protocol.TSimpleJSONProtocol;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link ThriftIO}. */
+@RunWith(JUnit4.class)
+public class ThriftIOTest implements Serializable {
+
+  private static final String RESOURCE_DIR = "ThriftIOTest/";
+
+  private static final String THRIFT_DIR = 
Resources.getResource(RESOURCE_DIR).getPath();
+  private static final String ALL_THRIFT_STRING =
+  Resources.getResource(RESOURCE_DIR).getPath() + "*";
+  private static final TestThriftStruct TEST_THRIFT_STRUCT = new 
TestThriftStruct();
+  private static List testThriftStructs;
+  private final TProtocolFactory tBinaryProtoFactory = new 
TBinaryProtocol.Factory();
+  private final TProtocolFactory tJsonProtocolFactory = new 
TJSONProtocol.Factory();
+  private final TProtocolFactory tSimpleJsonProtocolFactory = new 
TSimpleJSONProtocol.Factory();
+  private final TProtocolFactory tCompactProtocolFactory = new 
TCompactProtocol.Factory();
+  @Rule public transient TestPipeline mainPipeline = TestPipeline.create();
+  @Rule public transient TestPipeline readPipeline = TestPipeline.create();
+  @Rule public transient TestPipeline writePipeline = TestPipeline.create();
+  @Rule public transient TemporaryFolder temporaryFolder = new 
TemporaryFolder();
+
+  @Before
+  public void setUp() throws Exception {
+byte[] bytes = new byte[10];
+ByteBuffer buffer = ByteBuffer.wrap(bytes);
+
+TEST_THRIFT_STRUCT.testByte = 100;
+TEST_THRIFT_STRUCT.testShort = 200;
+TEST_THRIFT_STRUCT.testInt = 2500;
+TEST_THRIFT_STRUCT.testLong = 79303L;
+TEST_THRIFT_STRUCT.testDouble = 25.007;
+TEST_THRIFT_STRUCT.testBool = true;
+TEST_THRIFT_STRUCT.stringIntMap = new HashMap<>();
+TEST_THRIFT_STRUCT.stringIntMap.put("first", (short) 1);
+TEST_THRIFT_STRUCT.stringIntMap.put("second", (short) 2);
+TEST_THRIFT_STRUCT.testBinary = buffer;
+
+testThriftStructs = ImmutableList.copyOf(generateTestObjects(1000L));
+  }
+
+  /** Tests {@link ThriftIO#readFiles(Class)} with {@link TBinaryProtocol}. */
+  @Test
 
 Review comment:
   `read` was in the old implementation and I will remove the references to it. 
I think that `readFiles()` will cover most use cases for this IO. 
 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385462
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377901030
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * PCollection examples = 
files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto);
+ * }
+ *
+ * Writing Thrift Files
+ *
+ * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} 
to be written to
+ * Thrift files. It can be used with the general-purpose {@link FileIO} 
transforms with
+ * FileIO.write/writeDynamic specifically.
+ *
+ * For example:
+ *
+ * {@code
+ * pipeline
+ *   .apply(...) // PCollection
+ *   .apply(FileIO
+ * .write()
+ * .via(ThriftIO.sink(thriftProto))
+ * .to("destination/path");
+ * }
+ *
+ * This IO API is considered experimental and may break or receive 
backwards-incompatible changes
+ * in future versions of the Apache Beam SDK.
+ */
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public class ThriftIO {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class);
+
+  /** Disable construction of utility class. */
+  private ThriftIO() {}
+
+  /**
+   * Reads each file in a {@link PCollection} of {@link 
org.apache.beam.sdk.io.FileIO.ReadableFile},
+   * which allows more flexible usage.
+   */
+  public static  ReadFiles readFiles(Class recordClass) {
+return new 
AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build();
+  }
+
+  
//
+
+  /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link 
FileIO#writeDynamic}. */
+  public static > Sink 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385464=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385464
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377901859
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * PCollection examples = 
files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto);
+ * }
+ *
+ * Writing Thrift Files
+ *
+ * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} 
to be written to
+ * Thrift files. It can be used with the general-purpose {@link FileIO} 
transforms with
+ * FileIO.write/writeDynamic specifically.
+ *
+ * For example:
+ *
+ * {@code
+ * pipeline
+ *   .apply(...) // PCollection
+ *   .apply(FileIO
+ * .write()
+ * .via(ThriftIO.sink(thriftProto))
+ * .to("destination/path");
+ * }
+ *
+ * This IO API is considered experimental and may break or receive 
backwards-incompatible changes
+ * in future versions of the Apache Beam SDK.
+ */
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public class ThriftIO {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class);
+
+  /** Disable construction of utility class. */
+  private ThriftIO() {}
+
+  /**
+   * Reads each file in a {@link PCollection} of {@link 
org.apache.beam.sdk.io.FileIO.ReadableFile},
+   * which allows more flexible usage.
+   */
+  public static  ReadFiles readFiles(Class recordClass) {
+return new 
AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build();
+  }
+
+  
//
+
+  /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link 
FileIO#writeDynamic}. */
+  public static > Sink 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385465
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377895151
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * PCollection examples = 
files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto);
+ * }
+ *
+ * Writing Thrift Files
+ *
+ * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} 
to be written to
+ * Thrift files. It can be used with the general-purpose {@link FileIO} 
transforms with
+ * FileIO.write/writeDynamic specifically.
+ *
+ * For example:
+ *
+ * {@code
+ * pipeline
+ *   .apply(...) // PCollection
+ *   .apply(FileIO
+ * .write()
+ * .via(ThriftIO.sink(thriftProto))
+ * .to("destination/path");
+ * }
+ *
+ * This IO API is considered experimental and may break or receive 
backwards-incompatible changes
+ * in future versions of the Apache Beam SDK.
+ */
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public class ThriftIO {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class);
+
+  /** Disable construction of utility class. */
+  private ThriftIO() {}
+
+  /**
+   * Reads each file in a {@link PCollection} of {@link 
org.apache.beam.sdk.io.FileIO.ReadableFile},
+   * which allows more flexible usage.
+   */
+  public static  ReadFiles readFiles(Class recordClass) {
+return new 
AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build();
+  }
+
+  
//
+
+  /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link 
FileIO#writeDynamic}. */
+  public static > Sink 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385459=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385459
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377886065
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/package-info.java
 ##
 @@ -0,0 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/** Transforms for reading and writing to Thrift files. */
+package org.apache.beam.sdk.io.thrift;
 
 Review comment:
   Add `@Experimental(Kind.SOURCE_SINK)` at the package level too
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385459)
Time Spent: 11h 40m  (was: 11.5h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385467=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385467
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377903969
 
 

 ##
 File path: 
sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftIOTest.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import 
org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomStringUtils;
+import org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomUtils;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.values.PCollection;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Resources;
+import org.apache.thrift.protocol.TBinaryProtocol;
+import org.apache.thrift.protocol.TCompactProtocol;
+import org.apache.thrift.protocol.TJSONProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.protocol.TSimpleJSONProtocol;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link ThriftIO}. */
+@RunWith(JUnit4.class)
+public class ThriftIOTest implements Serializable {
+
+  private static final String RESOURCE_DIR = "ThriftIOTest/";
+
+  private static final String THRIFT_DIR = 
Resources.getResource(RESOURCE_DIR).getPath();
+  private static final String ALL_THRIFT_STRING =
+  Resources.getResource(RESOURCE_DIR).getPath() + "*";
+  private static final TestThriftStruct TEST_THRIFT_STRUCT = new 
TestThriftStruct();
+  private static List testThriftStructs;
+  private final TProtocolFactory tBinaryProtoFactory = new 
TBinaryProtocol.Factory();
+  private final TProtocolFactory tJsonProtocolFactory = new 
TJSONProtocol.Factory();
+  private final TProtocolFactory tSimpleJsonProtocolFactory = new 
TSimpleJSONProtocol.Factory();
+  private final TProtocolFactory tCompactProtocolFactory = new 
TCompactProtocol.Factory();
+  @Rule public transient TestPipeline mainPipeline = TestPipeline.create();
+  @Rule public transient TestPipeline readPipeline = TestPipeline.create();
+  @Rule public transient TestPipeline writePipeline = TestPipeline.create();
+  @Rule public transient TemporaryFolder temporaryFolder = new 
TemporaryFolder();
+
+  @Before
+  public void setUp() throws Exception {
+byte[] bytes = new byte[10];
+ByteBuffer buffer = ByteBuffer.wrap(bytes);
+
+TEST_THRIFT_STRUCT.testByte = 100;
+TEST_THRIFT_STRUCT.testShort = 200;
+TEST_THRIFT_STRUCT.testInt = 2500;
+TEST_THRIFT_STRUCT.testLong = 79303L;
+TEST_THRIFT_STRUCT.testDouble = 25.007;
+TEST_THRIFT_STRUCT.testBool = true;
+TEST_THRIFT_STRUCT.stringIntMap = new HashMap<>();
+TEST_THRIFT_STRUCT.stringIntMap.put("first", (short) 1);
+TEST_THRIFT_STRUCT.stringIntMap.put("second", (short) 2);
+TEST_THRIFT_STRUCT.testBinary = buffer;
+
+testThriftStructs = ImmutableList.copyOf(generateTestObjects(1000L));
+  }
+
+  /** Tests {@link ThriftIO#readFiles(Class)} with {@link TBinaryProtocol}. */
+  @Test
+  public void testReadFilesBinaryProtocol() {
+
+PCollection testThriftDoc =
+mainPipeline
+.apply(Create.of(THRIFT_DIR + 
"data").withCoder(StringUtf8Coder.of()))
+

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385455
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377905535
 
 

 ##
 File path: 
sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java
 ##
 @@ -0,0 +1,1232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
 
 Review comment:
   Hm in that case, I think it's fine to commit the generated file.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385455)
Time Spent: 11h 10m  (was: 11h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385466=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385466
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377900552
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * PCollection examples = 
files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto);
+ * }
+ *
+ * Writing Thrift Files
+ *
+ * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} 
to be written to
+ * Thrift files. It can be used with the general-purpose {@link FileIO} 
transforms with
+ * FileIO.write/writeDynamic specifically.
+ *
+ * For example:
+ *
+ * {@code
+ * pipeline
+ *   .apply(...) // PCollection
+ *   .apply(FileIO
+ * .write()
+ * .via(ThriftIO.sink(thriftProto))
+ * .to("destination/path");
+ * }
+ *
+ * This IO API is considered experimental and may break or receive 
backwards-incompatible changes
+ * in future versions of the Apache Beam SDK.
+ */
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public class ThriftIO {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class);
+
+  /** Disable construction of utility class. */
+  private ThriftIO() {}
+
+  /**
+   * Reads each file in a {@link PCollection} of {@link 
org.apache.beam.sdk.io.FileIO.ReadableFile},
+   * which allows more flexible usage.
+   */
+  public static  ReadFiles readFiles(Class recordClass) {
+return new 
AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build();
+  }
+
+  
//
+
+  /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link 
FileIO#writeDynamic}. */
+  public static > Sink 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385460=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385460
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377894156
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.io.OutputStream;
+import org.apache.beam.sdk.coders.CoderException;
+import org.apache.beam.sdk.coders.CustomCoder;
+
+public class ThriftCoder extends CustomCoder {
+
+  public static  ThriftCoder of() {
 
 Review comment:
   remove public
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385460)
Time Spent: 11h 50m  (was: 11h 40m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385457=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385457
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377905535
 
 

 ##
 File path: 
sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java
 ##
 @@ -0,0 +1,1232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
 
 Review comment:
   Hm in that case, I think it's fine to commit the generated file - unless you 
feel up to adding the gradle config : )
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385457)
Time Spent: 11h 20m  (was: 11h 10m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385463=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385463
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377894855
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
 
 Review comment:
   Having a read() is not mandatory if the example uses FileIO.match and 
friends IMO.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385463)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385458
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377885914
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * PCollection examples = 
files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto);
+ * }
+ *
+ * Writing Thrift Files
+ *
+ * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} 
to be written to
+ * Thrift files. It can be used with the general-purpose {@link FileIO} 
transforms with
+ * FileIO.write/writeDynamic specifically.
+ *
+ * For example:
+ *
+ * {@code
+ * pipeline
+ *   .apply(...) // PCollection
+ *   .apply(FileIO
+ * .write()
+ * .via(ThriftIO.sink(thriftProto))
+ * .to("destination/path");
+ * }
+ *
+ * This IO API is considered experimental and may break or receive 
backwards-incompatible changes
+ * in future versions of the Apache Beam SDK.
+ */
+@Experimental(Experimental.Kind.SOURCE_SINK)
 
 Review comment:
   `@Experimental(Kind.SOURCE_SINK)` to make it consistent with the rest of the 
code base
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385458)
Time Spent: 11.5h  (was: 11h 20m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385461
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377885479
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.io.OutputStream;
+import org.apache.beam.sdk.coders.CoderException;
+import org.apache.beam.sdk.coders.CustomCoder;
+
+public class ThriftCoder extends CustomCoder {
 
 Review comment:
   This is in principle internal, so maybe make it package protected.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385461)
Time Spent: 11h 50m  (was: 11h 40m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9300) parse struct literal in ZetaSQL

2020-02-11 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-9300:
--
Status: Open  (was: Triage Needed)

> parse struct literal in ZetaSQL
> ---
>
> Key: BEAM-9300
> URL: https://issues.apache.org/jira/browse/BEAM-9300
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql-zetasql
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>
> https://github.com/apache/beam/blob/b02a325409d55f1ecb7f9fb6ecc4f60a974c810d/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ExpressionConverter.java#L569



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9300) parse struct literal in ZetaSQL

2020-02-11 Thread Kyle Weaver (Jira)
Kyle Weaver created BEAM-9300:
-

 Summary: parse struct literal in ZetaSQL
 Key: BEAM-9300
 URL: https://issues.apache.org/jira/browse/BEAM-9300
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql-zetasql
Reporter: Kyle Weaver
Assignee: Kyle Weaver


https://github.com/apache/beam/blob/b02a325409d55f1ecb7f9fb6ecc4f60a974c810d/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ExpressionConverter.java#L569



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385453
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:14
Start Date: 11/Feb/20 21:14
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377902208
 
 

 ##
 File path: 
sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java
 ##
 @@ -0,0 +1,1232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
 
 Review comment:
   It can be generated from the .thrift file that is included. I think we would 
need to add a thrift compiler to the build.gradle to compile it for testing, 
thoughts?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385453)
Time Spent: 11h  (was: 10h 50m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9269) Set shorter Commit Deadline and handle with backoff/retry

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9269?focusedWorklogId=385451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385451
 ]

ASF GitHub Bot logged work on BEAM-9269:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:05
Start Date: 11/Feb/20 21:05
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10752: [BEAM-9269] Add 
commit deadline for Spanner writes.
URL: https://github.com/apache/beam/pull/10752#issuecomment-584851158
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385451)
Time Spent: 3.5h  (was: 3h 20m)

> Set shorter Commit Deadline and handle with backoff/retry
> -
>
> Key: BEAM-9269
> URL: https://issues.apache.org/jira/browse/BEAM-9269
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0, 2.18.0, 2.19.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: Major
>  Labels: google-cloud-spanner
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Default commit deadline in Spanner is 1hr, which can lead to a variety of 
> issues including database overload and session expiry.
> Shorter deadline should be set with backoff/retry when deadline expires, so 
> that the Spanner database does not become overloaded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9269) Set shorter Commit Deadline and handle with backoff/retry

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9269?focusedWorklogId=385452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385452
 ]

ASF GitHub Bot logged work on BEAM-9269:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:05
Start Date: 11/Feb/20 21:05
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10752: [BEAM-9269] Add 
commit deadline for Spanner writes.
URL: https://github.com/apache/beam/pull/10752#issuecomment-584851223
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385452)
Time Spent: 3h 40m  (was: 3.5h)

> Set shorter Commit Deadline and handle with backoff/retry
> -
>
> Key: BEAM-9269
> URL: https://issues.apache.org/jira/browse/BEAM-9269
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0, 2.18.0, 2.19.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: Major
>  Labels: google-cloud-spanner
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Default commit deadline in Spanner is 1hr, which can lead to a variety of 
> issues including database overload and session expiry.
> Shorter deadline should be set with backoff/retry when deadline expires, so 
> that the Spanner database does not become overloaded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=385444=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385444
 ]

ASF GitHub Bot logged work on BEAM-9291:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:00
Start Date: 11/Feb/20 21:00
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10829: [BEAM-9291] Upload 
graph option in dataflow's python sdk
URL: https://github.com/apache/beam/pull/10829#issuecomment-584848915
 
 
   I added a few comments. Could we also verify that this works as expected on 
Dataflow?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385444)
Time Spent: 20m  (was: 10m)

> upload_graph support in Dataflow Python SDK
> ---
>
> Key: BEAM-9291
> URL: https://issues.apache.org/jira/browse/BEAM-9291
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Radosław Stankiewicz
>Assignee: Radosław Stankiewicz
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> upload_graph option is not supported in Dataflow's Python SDK so there is no 
> workaround for large graphs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=385442=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385442
 ]

ASF GitHub Bot logged work on BEAM-9291:


Author: ASF GitHub Bot
Created on: 11/Feb/20 20:59
Start Date: 11/Feb/20 20:59
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10829: [BEAM-9291] 
Upload graph option in dataflow's python sdk
URL: https://github.com/apache/beam/pull/10829#discussion_r377894420
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -562,6 +562,12 @@ def _add_argparse_args(cls, parser):
 default=None,
 choices=['COST_OPTIMIZED', 'SPEED_OPTIMIZED'],
 help='Set the Flexible Resource Scheduling mode')
+parser.add_argument(
 
 Review comment:
   Looking at the Java implementation 
(https://github.com/apache/beam/pull/7047), this is an experiment and not a top 
level option.
   
   I believe you can handle this similar to other experiments in 
internal/apiclient.py
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385442)
Remaining Estimate: 0h
Time Spent: 10m

> upload_graph support in Dataflow Python SDK
> ---
>
> Key: BEAM-9291
> URL: https://issues.apache.org/jira/browse/BEAM-9291
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Radosław Stankiewicz
>Assignee: Radosław Stankiewicz
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> upload_graph option is not supported in Dataflow's Python SDK so there is no 
> workaround for large graphs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=385441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385441
 ]

ASF GitHub Bot logged work on BEAM-9291:


Author: ASF GitHub Bot
Created on: 11/Feb/20 20:59
Start Date: 11/Feb/20 20:59
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10829: [BEAM-9291] 
Upload graph option in dataflow's python sdk
URL: https://github.com/apache/beam/pull/10829#discussion_r377894662
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -539,6 +541,15 @@ def run_pipeline(self, pipeline, options):
 # Get a Dataflow API client and set its options
 self.dataflow_client = apiclient.DataflowApplicationClient(options)
 
+if self.job.options.view_as(GoogleCloudOptions).upload_graph:
 
 Review comment:
   Similarly, this staging can happen in apiclient.py if the experiment is 
present.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385441)
Remaining Estimate: 0h
Time Spent: 10m

> upload_graph support in Dataflow Python SDK
> ---
>
> Key: BEAM-9291
> URL: https://issues.apache.org/jira/browse/BEAM-9291
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Radosław Stankiewicz
>Assignee: Radosław Stankiewicz
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> upload_graph option is not supported in Dataflow's Python SDK so there is no 
> workaround for large graphs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=385436=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385436
 ]

ASF GitHub Bot logged work on BEAM-5605:


Author: ASF GitHub Bot
Created on: 11/Feb/20 20:55
Start Date: 11/Feb/20 20:55
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10576: [BEAM-5605] 
Convert all BoundedSources to SplittableDoFns when using beam_fn_api experiment.
URL: https://github.com/apache/beam/pull/10576#discussion_r377892829
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/JavaReadViaImpulse.java
 ##
 @@ -1,176 +0,0 @@
-/*
 
 Review comment:
   Yes
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385436)
Time Spent: 12.5h  (was: 12h 20m)

> Support Portable SplittableDoFn for batch
> -
>
> Key: BEAM-5605
> URL: https://issues.apache.org/jira/browse/BEAM-5605
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> Roll-up item tracking work towards supporting portable SplittableDoFn for 
> batch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=385434=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385434
 ]

ASF GitHub Bot logged work on BEAM-5605:


Author: ASF GitHub Bot
Created on: 11/Feb/20 20:49
Start Date: 11/Feb/20 20:49
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10576: [BEAM-5605] 
Convert all BoundedSources to SplittableDoFns when using beam_fn_api experiment.
URL: https://github.com/apache/beam/pull/10576#discussion_r37788
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java
 ##
 @@ -177,4 +205,128 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   .include("source", source);
 }
   }
+
+  /**
+   * A splittable {@link DoFn} which executes a {@link BoundedSource}.
+   *
+   * We model the element as the original source and the restriction as the 
sub-source. This
+   * allows us to split the sub-source over and over yet still receive 
"source" objects as inputs.
+   */
+  static class BoundedSourceAsSDFWrapperFn extends DoFn, 
T> {
+private static final long DEFAULT_DESIRED_BUNDLE_SIZE_BYTES = 64 * (1 << 
20);
+
+@GetInitialRestriction
+public BoundedSource initialRestriction(@Element BoundedSource 
element) {
+  return element;
+}
+
+@GetSize
+public double getSize(
+@Restriction BoundedSource restriction, PipelineOptions 
pipelineOptions)
+throws Exception {
+  return restriction.getEstimatedSizeBytes(pipelineOptions);
+}
+
+@SplitRestriction
+public void splitRestriction(
+@Restriction BoundedSource restriction,
+OutputReceiver> receiver,
+PipelineOptions pipelineOptions)
+throws Exception {
+  for (BoundedSource split :
+  restriction.split(DEFAULT_DESIRED_BUNDLE_SIZE_BYTES, 
pipelineOptions)) {
+receiver.output(split);
+  }
+}
+
+@NewTracker
+public RestrictionTracker, Object[]> restrictionTracker(
+@Restriction BoundedSource restriction, PipelineOptions 
pipelineOptions) {
+  return new BoundedSourceAsSDFRestrictionTracker<>(restriction, 
pipelineOptions);
+}
+
+@ProcessElement
+public void processElement(
+RestrictionTracker, Object[]> tracker, 
OutputReceiver receiver)
+throws IOException {
+  Object[] out = new Object[1];
+  while (tracker.tryClaim(out)) {
+receiver.output((T) out[0]);
+  }
+}
+
+@GetRestrictionCoder
+public Coder> restrictionCoder() {
+  return SerializableCoder.of(new TypeDescriptor>() {});
+}
+
+/**
+ * A fake restriction tracker which adapts to the {@link BoundedSource} 
API. The restriction
+ * object is used to advance the underlying source and to "return" the 
current element.
+ */
+private static class BoundedSourceAsSDFRestrictionTracker
 
 Review comment:
   :clap: 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385434)
Time Spent: 12h 20m  (was: 12h 10m)

> Support Portable SplittableDoFn for batch
> -
>
> Key: BEAM-5605
> URL: https://issues.apache.org/jira/browse/BEAM-5605
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>
> Roll-up item tracking work towards supporting portable SplittableDoFn for 
> batch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385431
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 20:37
Start Date: 11/Feb/20 20:37
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377883979
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.io.OutputStream;
+import org.apache.beam.sdk.coders.CoderException;
+import org.apache.beam.sdk.coders.CustomCoder;
+
+public class ThriftCoder extends CustomCoder {
+
+  public static  ThriftCoder of() {
+return new ThriftCoder<>();
+  }
+
+  /**
+   * Encodes the given value of type {@code T} onto the given output stream.
+   *
+   * @param value {@link org.apache.thrift.TBase} to encode.
+   * @param outStream stream to output encoded value to.
+   * @throws IOException if writing to the {@code OutputStream} fails for some 
reason
+   * @throws CoderException if the value could not be encoded for some reason
+   */
+  @Override
+  public void encode(T value, OutputStream outStream) throws CoderException, 
IOException {
+ObjectOutputStream oos = new ObjectOutputStream(outStream);
+oos.writeObject(value);
+oos.flush();
+  }
 
 Review comment:
   +1 to use Thrift native serializaton this will enable to share the data with 
cross-language pipelines in the future
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385431)
Time Spent: 10h 50m  (was: 10h 40m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8979) protoc-gen-mypy: program not found or is not executable

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8979?focusedWorklogId=385430=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385430
 ]

ASF GitHub Bot logged work on BEAM-8979:


Author: ASF GitHub Bot
Created on: 11/Feb/20 20:34
Start Date: 11/Feb/20 20:34
Worklog Time Spent: 10m 
  Work Description: nipunn1313 commented on issue #10734: [BEAM-8979] 
reintroduce mypy-protobuf stub generation
URL: https://github.com/apache/beam/pull/10734#issuecomment-584837962
 
 
    
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385430)
Time Spent: 8h 40m  (was: 8.5h)

> protoc-gen-mypy: program not found or is not executable
> ---
>
> Key: BEAM-8979
> URL: https://issues.apache.org/jira/browse/BEAM-8979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Kamil Wasilewski
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> In some tests, `:sdks:python:sdist:` task fails due to problems in finding 
> protoc-gen-mypy. The following tests are affected (there might be more):
>  * 
> [https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/]
>  * 
> [https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/
>  
> |https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/]
> Relevant logs:
> {code:java}
> 10:46:32 > Task :sdks:python:sdist FAILED
> 10:46:32 Requirement already satisfied: mypy-protobuf==1.12 in 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages
>  (1.12)
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto 
> but not used.
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto 
> but not used.
> 10:46:32 protoc-gen-mypy: program not found or is not executable
> 10:46:32 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
> 10:46:32 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/dist.py:476:
>  UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0'
> 10:46:32   normalized_version,
> 10:46:32 Traceback (most recent call last):
> 10:46:32   File "setup.py", line 295, in 
> 10:46:32 'mypy': generate_protos_first(mypy),
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/__init__.py",
>  line 145, in setup
> 10:46:32 return distutils.core.setup(**attrs)
> 10:46:32   File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
> 10:46:32 dist.run_commands()
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 966, in 
> run_commands
> 10:46:32 self.run_command(cmd)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/command/sdist.py",
>  line 44, in run
> 10:46:32 self.run_command('egg_info')
> 10:46:32   File "/usr/lib/python3.7/distutils/cmd.py", line 313, in 
> run_command
> 10:46:32 self.distribution.run_command(command)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File "setup.py", line 220, in run
> 10:46:32 gen_protos.generate_proto_files(log=log)
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/sdks/python/gen_protos.py",
>  line 144, in generate_proto_files
> 10:46:32 '%s' % ret_code)
> 10:46:32 RuntimeError: Protoc returned non-zero status (see logs for 
> details): 1
> {code}
>  
> This is what I have tried so far to resolve this (without being successful):
>  * Including _--plugin=protoc-gen-mypy=\{abs_path_to_executable}_ parameter 
> to the _protoc_ call ingen_protos.py:131
>  * Appending protoc-gen-mypy's directory to the PATH variable
> I wasn't able to reproduce this error locally.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9160) Update AWS SDK to support Pod Level Identity

2020-02-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9160:
---
Description: 
Many organizations have started leveraging pod level identity in Kubernetes. 
The current version of the AWS SDK packaged with Beam 2.17.0 is out of date and 
doesn't provide native support to pod level identity access management.

 

It is recommended that we introduce support to access AWS resources such as S3 
using pod level identity. 

Current Version of the AWS Java SDK in Beam:

def aws_java_sdk_version = "1.11.519"

Proposed AWS Java SDK Version:


 com.amazonaws
 aws-java-sdk
 1.11.710
 

  was:
Many organizations have started leveraging pod level identity in Kubernetes. 
The current version of the AWS SDK packaged with Bean 2.17.0 is out of date and 
doesn't provide native support to pod level identity access management.

 

It is recommended that we introduce support to access AWS resources such as S3 
using pod level identity. 

Current Version of the AWS Java SDK in Beam:

def aws_java_sdk_version = "1.11.519"

Proposed AWS Java SDK Version:


 com.amazonaws
 aws-java-sdk
 1.11.710



> Update AWS SDK to support Pod Level Identity
> 
>
> Key: BEAM-9160
> URL: https://issues.apache.org/jira/browse/BEAM-9160
> Project: Beam
>  Issue Type: Improvement
>  Components: dependencies
>Affects Versions: 2.17.0
>Reporter: Mohamed Noah
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Many organizations have started leveraging pod level identity in Kubernetes. 
> The current version of the AWS SDK packaged with Beam 2.17.0 is out of date 
> and doesn't provide native support to pod level identity access management.
>  
> It is recommended that we introduce support to access AWS resources such as 
> S3 using pod level identity. 
> Current Version of the AWS Java SDK in Beam:
> def aws_java_sdk_version = "1.11.519"
> Proposed AWS Java SDK Version:
> 
>  com.amazonaws
>  aws-java-sdk
>  1.11.710
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9160) Update AWS SDK to support Kubernetes Pod Level Identity

2020-02-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9160:
---
Summary: Update AWS SDK to support Kubernetes Pod Level Identity  (was: 
Update AWS SDK to support Pod Level Identity)

> Update AWS SDK to support Kubernetes Pod Level Identity
> ---
>
> Key: BEAM-9160
> URL: https://issues.apache.org/jira/browse/BEAM-9160
> Project: Beam
>  Issue Type: Improvement
>  Components: dependencies
>Affects Versions: 2.17.0
>Reporter: Mohamed Noah
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Many organizations have started leveraging pod level identity in Kubernetes. 
> The current version of the AWS SDK packaged with Beam 2.17.0 is out of date 
> and doesn't provide native support to pod level identity access management.
>  
> It is recommended that we introduce support to access AWS resources such as 
> S3 using pod level identity. 
> Current Version of the AWS Java SDK in Beam:
> def aws_java_sdk_version = "1.11.519"
> Proposed AWS Java SDK Version:
> 
>  com.amazonaws
>  aws-java-sdk
>  1.11.710
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=385428=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385428
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 11/Feb/20 20:23
Start Date: 11/Feb/20 20:23
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #10802: [BEAM-8537] Move 
wrappers of RestrictionTracker out of iobase
URL: https://github.com/apache/beam/pull/10802#issuecomment-584833264
 
 
   Run PythonLint PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385428)
Time Spent: 14h  (was: 13h 50m)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 14h
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9298) Drop support for Flink 1.7

2020-02-11 Thread Thomas Weise (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034809#comment-17034809
 ] 

Thomas Weise commented on BEAM-9298:


[~iemejia] yes, this should be on the mailing list. IMO good to communicate 
intent to dev@ and user@ and also refer to 
[https://beam.apache.org/documentation/runners/flink/#version-compatibility]

 

> Drop support for Flink 1.7 
> ---
>
> Key: BEAM-9298
> URL: https://issues.apache.org/jira/browse/BEAM-9298
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>
> With Flink 1.10 around the corner, more detail can be found in BEAM-9295, we 
> should consider dropping support for Flink 1.7. Then dropping 1.7 will also 
> decrease the build time.
> What do you think?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8979) protoc-gen-mypy: program not found or is not executable

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8979?focusedWorklogId=385426=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385426
 ]

ASF GitHub Bot logged work on BEAM-8979:


Author: ASF GitHub Bot
Created on: 11/Feb/20 20:19
Start Date: 11/Feb/20 20:19
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #10734: [BEAM-8979] 
reintroduce mypy-protobuf stub generation
URL: https://github.com/apache/beam/pull/10734#issuecomment-584831758
 
 
   > I'd rather wait for an official release if you don't mind
   
   Done!  Ready to test and hopefully merge. 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385426)
Time Spent: 8.5h  (was: 8h 20m)

> protoc-gen-mypy: program not found or is not executable
> ---
>
> Key: BEAM-8979
> URL: https://issues.apache.org/jira/browse/BEAM-8979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Kamil Wasilewski
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> In some tests, `:sdks:python:sdist:` task fails due to problems in finding 
> protoc-gen-mypy. The following tests are affected (there might be more):
>  * 
> [https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/]
>  * 
> [https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/
>  
> |https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/]
> Relevant logs:
> {code:java}
> 10:46:32 > Task :sdks:python:sdist FAILED
> 10:46:32 Requirement already satisfied: mypy-protobuf==1.12 in 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages
>  (1.12)
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto 
> but not used.
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto 
> but not used.
> 10:46:32 protoc-gen-mypy: program not found or is not executable
> 10:46:32 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
> 10:46:32 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/dist.py:476:
>  UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0'
> 10:46:32   normalized_version,
> 10:46:32 Traceback (most recent call last):
> 10:46:32   File "setup.py", line 295, in 
> 10:46:32 'mypy': generate_protos_first(mypy),
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/__init__.py",
>  line 145, in setup
> 10:46:32 return distutils.core.setup(**attrs)
> 10:46:32   File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
> 10:46:32 dist.run_commands()
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 966, in 
> run_commands
> 10:46:32 self.run_command(cmd)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/command/sdist.py",
>  line 44, in run
> 10:46:32 self.run_command('egg_info')
> 10:46:32   File "/usr/lib/python3.7/distutils/cmd.py", line 313, in 
> run_command
> 10:46:32 self.distribution.run_command(command)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File "setup.py", line 220, in run
> 10:46:32 gen_protos.generate_proto_files(log=log)
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/sdks/python/gen_protos.py",
>  line 144, in generate_proto_files
> 10:46:32 '%s' % ret_code)
> 10:46:32 RuntimeError: Protoc returned non-zero status (see logs for 
> details): 1
> {code}
>  
> This is what I have tried so far to resolve this (without being successful):
>  * Including _--plugin=protoc-gen-mypy=\{abs_path_to_executable}_ parameter 
> to the _protoc_ call ingen_protos.py:131
>  * Appending protoc-gen-mypy's directory to the PATH variable
> I wasn't able to reproduce this error locally.
>  



--
This message was sent by 

[jira] [Commented] (BEAM-9298) Drop support for Flink 1.7

2020-02-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034807#comment-17034807
 ] 

Ismaël Mejía commented on BEAM-9298:


[~mxm] [~thw] agree? Worth to discuss in the mailing list IMO

> Drop support for Flink 1.7 
> ---
>
> Key: BEAM-9298
> URL: https://issues.apache.org/jira/browse/BEAM-9298
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>
> With Flink 1.10 around the corner, more detail can be found in BEAM-9295, we 
> should consider dropping support for Flink 1.7. Then dropping 1.7 will also 
> decrease the build time.
> What do you think?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9298) Drop support for Flink 1.7

2020-02-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9298:
---
Status: Open  (was: Triage Needed)

> Drop support for Flink 1.7 
> ---
>
> Key: BEAM-9298
> URL: https://issues.apache.org/jira/browse/BEAM-9298
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>
> With Flink 1.10 around the corner, more detail can be found in BEAM-9295, we 
> should consider dropping support for Flink 1.7. Then dropping 1.7 will also 
> decrease the build time.
> What do you think?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9274) Support running yapf in a git pre-commit hook

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9274?focusedWorklogId=385424=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385424
 ]

ASF GitHub Bot logged work on BEAM-9274:


Author: ASF GitHub Bot
Created on: 11/Feb/20 20:15
Start Date: 11/Feb/20 20:15
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10810: [BEAM-9274] 
Support running yapf in a git pre-commit hook
URL: https://github.com/apache/beam/pull/10810#discussion_r377873324
 
 

 ##
 File path: .pre-commit-config.yaml
 ##
 @@ -0,0 +1,32 @@
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+repos:
+  - repo: https://github.com/pre-commit/mirrors-yapf
+# this rev is a release tag in the repo above and corresponds with a yapf
+# version. make sure this matches the version of yapf in tox.ini.
+rev: v0.29.0
+hooks:
+  - id: yapf
+files: ^sdks/python/apache_beam/
+# keep these in sync with sdks/python/.yapfignore
 
 Review comment:
   First, some background:
   
   pre-commit triggers its hooks based on changed files, matching based on 3 
values:
   - file type (e.g. it has a mapping from "python" to , ".py", etc)
   - files:  include patttern
   - exclude: exclude pattern
   
   pre-commit always passes an explicit list of changed files to the underlying 
tool, and never relies on recursive flags. 
   
   Ok, on to your question.  Our exclude patterns cover autogenerated files.  
So, if a new autogenerated file were added to the repo which was excluded in 
.yapfignore but not in .pre-commit-config.yaml (which I think would be the most 
common "failure" scenario), then the first time that a developer with 
pre-commit enabled changed that file (say, by regenerating it) and tried to 
commit those changes, yapf would autoformat it, and then pre-commit would fail 
the commit with a message like this:
   
   ```
   
yapf.Failed
   - hook id: yapf
   - files were modified by this hook
   ```
   
   At this point hopefully the developer realizes that the failure is due to 
yapf, and finds the trail of comments leading them to .pre-commit-config.yaml.  
Worst case scenario is that they commit it with the autoformatted changes, 
which would be pretty innocuous, and would not fail any jenkins tests because 
those same autogenerated files are excluded from pylint. 
   
   Another thing to note is that this is all just a convenience for developers. 
 Jenkins remains our last line of defense.  
   
   Also note that it's possible to invoke your lint tools using pre-commit 
within tox, so that you can consolidate all of your includes and excludes 
across all tools into one file, the pre-commit-config.yaml.  This is what we do 
where I work.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385424)
Time Spent: 1h  (was: 50m)

> Support running yapf in a git pre-commit hook
> -
>
> Key: BEAM-9274
> URL: https://issues.apache.org/jira/browse/BEAM-9274
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> As a developer I want to be able to automatically run yapf before I make a 
> commit so that I don't waste time with failures on jenkins. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9284) Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10

2020-02-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-9284.

Resolution: Fixed

> Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10
> ---
>
> Key: BEAM-9284
> URL: https://issues.apache.org/jira/browse/BEAM-9284
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Ismaël Mejía
>Priority: Minor
> Fix For: Not applicable
>
>
> Apache Flink 1.10 will coming and it's better to add Flink 1.10 build target 
> and make Flink Runner compatible with Flink 1.10.
> There are some incompatible changes on flink-clients as part of their support 
> for Java 11, so that is an area to be addressed in the Beam side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9284) Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10

2020-02-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-9284.

Fix Version/s: Not applicable
   Resolution: Duplicate

> Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10
> ---
>
> Key: BEAM-9284
> URL: https://issues.apache.org/jira/browse/BEAM-9284
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Ismaël Mejía
>Priority: Minor
> Fix For: Not applicable
>
>
> Apache Flink 1.10 will coming and it's better to add Flink 1.10 build target 
> and make Flink Runner compatible with Flink 1.10.
> There are some incompatible changes on flink-clients as part of their support 
> for Java 11, so that is an area to be addressed in the Beam side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-9284) Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10

2020-02-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reopened BEAM-9284:


> Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10
> ---
>
> Key: BEAM-9284
> URL: https://issues.apache.org/jira/browse/BEAM-9284
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Ismaël Mejía
>Priority: Minor
> Fix For: Not applicable
>
>
> Apache Flink 1.10 will coming and it's better to add Flink 1.10 build target 
> and make Flink Runner compatible with Flink 1.10.
> There are some incompatible changes on flink-clients as part of their support 
> for Java 11, so that is an area to be addressed in the Beam side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9284) Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10

2020-02-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9284:
---
Status: Open  (was: Triage Needed)

> Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10
> ---
>
> Key: BEAM-9284
> URL: https://issues.apache.org/jira/browse/BEAM-9284
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Ismaël Mejía
>Priority: Minor
> Fix For: Not applicable
>
>
> Apache Flink 1.10 will coming and it's better to add Flink 1.10 build target 
> and make Flink Runner compatible with Flink 1.10.
> There are some incompatible changes on flink-clients as part of their support 
> for Java 11, so that is an area to be addressed in the Beam side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9295) Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10

2020-02-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9295:
---
Status: Open  (was: Triage Needed)

> Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10
> ---
>
> Key: BEAM-9295
> URL: https://issues.apache.org/jira/browse/BEAM-9295
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>
> Apache Flink 1.10 has completed the final release vote, see [1]. So, I would 
> like to add Flink 1.10 build target and make Flink Runner compatible with 
> Flink 1.10.
> And I appreciate it if you can leave your suggestions or comments!
> [1] 
> https://lists.apache.org/thread.html/r97672d4d1e47372cebf23e6643a6cc30a06bfbdf3f277b0be3695b15%40%3Cdev.flink.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2

2020-02-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9299:
---
Status: Open  (was: Triage Needed)

> Upgrade Flink Runner to 1.8.3 and 1.9.2
> ---
>
> Key: BEAM-9299
> URL: https://issues.apache.org/jira/browse/BEAM-9299
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>
> I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache 
> Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. 
> What do you think?
> [1] https://dist.apache.org/repos/dist/release/flink/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2

2020-02-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034796#comment-17034796
 ] 

Ismaël Mejía commented on BEAM-9299:


For ref Flink 1.9.2 introduced a bacwards incompatible change FLINK-15844 so we 
need a workaround or just to avoid the hassle..

> Upgrade Flink Runner to 1.8.3 and 1.9.2
> ---
>
> Key: BEAM-9299
> URL: https://issues.apache.org/jira/browse/BEAM-9299
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>
> I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache 
> Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. 
> What do you think?
> [1] https://dist.apache.org/repos/dist/release/flink/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385422=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385422
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 20:03
Start Date: 11/Feb/20 20:03
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on pull request #10290: 
[BEAM-8561] Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377868235
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.io.OutputStream;
+import org.apache.beam.sdk.coders.CoderException;
+import org.apache.beam.sdk.coders.CustomCoder;
+
+public class ThriftCoder extends CustomCoder {
+
+  public static  ThriftCoder of() {
+return new ThriftCoder<>();
+  }
+
+  /**
+   * Encodes the given value of type {@code T} onto the given output stream.
+   *
+   * @param value {@link org.apache.thrift.TBase} to encode.
+   * @param outStream stream to output encoded value to.
+   * @throws IOException if writing to the {@code OutputStream} fails for some 
reason
+   * @throws CoderException if the value could not be encoded for some reason
+   */
+  @Override
+  public void encode(T value, OutputStream outStream) throws CoderException, 
IOException {
+ObjectOutputStream oos = new ObjectOutputStream(outStream);
+oos.writeObject(value);
+oos.flush();
+  }
 
 Review comment:
   fwiw the java thrift classes will use the TCompactProtocol to serialize 
themselves when being java serialized.
   
   Personally I would rather see a coder here that explicitly uses a TProtocol 
to serialize the object rather than relying on java serialization to do it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385422)
Time Spent: 10h 40m  (was: 10.5h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=385421=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385421
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 11/Feb/20 19:58
Start Date: 11/Feb/20 19:58
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10223: [BEAM-8399] Add 
--hdfs_full_urls option
URL: https://github.com/apache/beam/pull/10223#issuecomment-584822664
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385421)
Time Spent: 2h 50m  (was: 2h 40m)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=385416=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385416
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 11/Feb/20 19:54
Start Date: 11/Feb/20 19:54
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r377860779
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1087,6 +1087,44 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: raw data bytes.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+// A URN for artifacts described by HTTP links.
+// payload: a string for an artifact HTTP URL
+HTTP = 2 [(beam_urn) = "beam:artifact:http:v1"];
+// A URN for artifacts hosted on PYPI.
+// artifact_id: a PYPI project name
+// version_range: a PYPI compatible version string
+// payload: None
+PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"];
+// A URN for artifacts hosted on Maven central.
+// artifact_id: [maven group id]:[maven artifact id]
+// version_range: a Maven compatible version string
+// payload: None
+MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"];
+  }
+}
+
+message ArtifactFilePayload {
+  // A path to an artifact file on a local system.
+  string local_path = 1;
 
 Review comment:
   One issue here is that the callee may not know if it is on a shared 
filesystem with the caller (e.g. when calling the expansion service). And when 
calling two distinct expansion services, one would like to be able to compare 
between them. 
   
   Also, perhaps we should not limit ourselves to local paths here, but any 
path that can be opened with beam filesystems. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385416)
Time Spent: 2.5h  (was: 2h 20m)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=385417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385417
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 11/Feb/20 19:54
Start Date: 11/Feb/20 19:54
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r377862882
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1087,6 +1087,44 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: raw data bytes.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+// A URN for artifacts described by HTTP links.
+// payload: a string for an artifact HTTP URL
+HTTP = 2 [(beam_urn) = "beam:artifact:http:v1"];
+// A URN for artifacts hosted on PYPI.
+// artifact_id: a PYPI project name
+// version_range: a PYPI compatible version string
+// payload: None
+PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"];
+// A URN for artifacts hosted on Maven central.
+// artifact_id: [maven group id]:[maven artifact id]
+// version_range: a Maven compatible version string
+// payload: None
+MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"];
+  }
+}
+
+message ArtifactFilePayload {
+  // A path to an artifact file on a local system.
+  string local_path = 1;
+  // A generated staged name (no path).
+  string staged_name = 2;
+}
+
+message ArtifactInformation {
+  string urn = 1;
+  bytes payload = 2;
+  string artifact_id = 3;
 
 Review comment:
   If we can't come up with a standard format, I think it should be part of the 
payload. Similarly for artifact_id--they should mean the same thing. (We can 
always safely dedup on urn+payloads.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385417)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2

2020-02-11 Thread sunjincheng (Jira)
sunjincheng created BEAM-9299:
-

 Summary: Upgrade Flink Runner to 1.8.3 and 1.9.2
 Key: BEAM-9299
 URL: https://issues.apache.org/jira/browse/BEAM-9299
 Project: Beam
  Issue Type: Task
  Components: runner-flink
Reporter: sunjincheng
Assignee: sunjincheng
 Fix For: 2.20.0


I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache 
Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. 

What do you think?


[1] https://dist.apache.org/repos/dist/release/flink/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=385415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385415
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 11/Feb/20 19:54
Start Date: 11/Feb/20 19:54
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r377863536
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1087,6 +1087,44 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: raw data bytes.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+// A URN for artifacts described by HTTP links.
+// payload: a string for an artifact HTTP URL
+HTTP = 2 [(beam_urn) = "beam:artifact:http:v1"];
+// A URN for artifacts hosted on PYPI.
+// artifact_id: a PYPI project name
+// version_range: a PYPI compatible version string
+// payload: None
+PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"];
+// A URN for artifacts hosted on Maven central.
+// artifact_id: [maven group id]:[maven artifact id]
+// version_range: a Maven compatible version string
+// payload: None
+MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"];
+  }
+}
+
+message ArtifactFilePayload {
+  // A path to an artifact file on a local system.
+  string local_path = 1;
+  // A generated staged name (no path).
+  string staged_name = 2;
+}
+
+message ArtifactInformation {
+  string urn = 1;
+  bytes payload = 2;
+  string artifact_id = 3;
+  string version_range = 4;
+}
 
 Review comment:
   But we'll need it for more than just embedded payload. And the name itself 
may not be enough to determine the role (do we try to install all .tar.gz files 
in Python? Or are some just data?)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385415)
Time Spent: 2.5h  (was: 2h 20m)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=385414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385414
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 11/Feb/20 19:54
Start Date: 11/Feb/20 19:54
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r377862155
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1087,6 +1087,44 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: raw data bytes.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+// A URN for artifacts described by HTTP links.
+// payload: a string for an artifact HTTP URL
+HTTP = 2 [(beam_urn) = "beam:artifact:http:v1"];
+// A URN for artifacts hosted on PYPI.
+// artifact_id: a PYPI project name
+// version_range: a PYPI compatible version string
+// payload: None
+PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"];
+// A URN for artifacts hosted on Maven central.
+// artifact_id: [maven group id]:[maven artifact id]
+// version_range: a Maven compatible version string
+// payload: None
+MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"];
+  }
+}
+
+message ArtifactFilePayload {
+  // A path to an artifact file on a local system.
+  string local_path = 1;
+  // A generated staged name (no path).
+  string staged_name = 2;
 
 Review comment:
   But eventually we'll have to give it a name, right? (One could argue that 
dependencies should be a map(name -> artifact). OTOH, for some perhaps we could 
leave it blank and one could be inferred (e.g. for urls). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385414)
Time Spent: 2.5h  (was: 2h 20m)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >