[jira] [Work logged] (BEAM-7475) Add Python stateful processing example in blog

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7475?focusedWorklogId=262795=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262795
 ]

ASF GitHub Bot logged work on BEAM-7475:


Author: ASF GitHub Bot
Created on: 19/Jun/19 05:57
Start Date: 19/Jun/19 05:57
Worklog Time Spent: 10m 
  Work Description: rakeshcusat commented on pull request #8803: 
[BEAM-7475] update wordcount example
URL: https://github.com/apache/beam/pull/8803#discussion_r295131284
 
 

 ##
 File path: website/src/get-started/wordcount-example.md
 ##
 @@ -1362,7 +1378,11 @@ PCollection> wordCounts = 
windowedWords.apply(new WordCount.Cou
 ```
 
 ```py
-# This feature is not yet available in the Beam SDK for Python.
+import apache_beam as beam
+from apache_beam.transforms import window
+...
+...
+word_counts = windowed_words | beam.Map(WordCount()) 
 
 Review comment:
   it could be something like this: 
   ```
   class CountWordsFn(Dofn):
  def process(self, element):
   # Add logic here to split line into words.
   # call metrics to increment value.
   
 class CountWords(PTransform):
   
   def expand(self, pcoll):
 return pcoll |  FlatMap(CountWordsFN())
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262795)
Time Spent: 2h 50m  (was: 2h 40m)

> Add Python stateful processing example in blog
> --
>
> Key: BEAM-7475
> URL: https://issues.apache.org/jira/browse/BEAM-7475
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rakesh Kumar
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7475) Add Python stateful processing example in blog

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7475?focusedWorklogId=262794=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262794
 ]

ASF GitHub Bot logged work on BEAM-7475:


Author: ASF GitHub Bot
Created on: 19/Jun/19 05:47
Start Date: 19/Jun/19 05:47
Worklog Time Spent: 10m 
  Work Description: rakeshcusat commented on pull request #8803: 
[BEAM-7475] update wordcount example
URL: https://github.com/apache/beam/pull/8803#discussion_r295128950
 
 

 ##
 File path: website/src/get-started/wordcount-example.md
 ##
 @@ -1362,7 +1378,11 @@ PCollection> wordCounts = 
windowedWords.apply(new WordCount.Cou
 ```
 
 ```py
-# This feature is not yet available in the Beam SDK for Python.
+import apache_beam as beam
+from apache_beam.transforms import window
+...
+...
+word_counts = windowed_words | beam.Map(WordCount()) 
 
 Review comment:
   @pabloem you are right.  We can probably provide the implementation in this 
example for the clarification. Another way of doing is, providing the actual 
implementation of word count and provide the link here as we do for Java 
example.
   Let me know which one you like. I am fine either way.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262794)
Time Spent: 2h 40m  (was: 2.5h)

> Add Python stateful processing example in blog
> --
>
> Key: BEAM-7475
> URL: https://issues.apache.org/jira/browse/BEAM-7475
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rakesh Kumar
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7560) Python local filesystem match does not work without directory separator.

2019-06-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7560:
---
Status: Open  (was: Triage Needed)

> Python local filesystem match does not work without directory separator.
> 
>
> Key: BEAM-7560
> URL: https://issues.apache.org/jira/browse/BEAM-7560
> Project: Beam
>  Issue Type: Test
>  Components: io-python-files
>Reporter: Robert Bradshaw
>Priority: Major
>  Labels: starter
>
> E.g. {{beam.io.ReadFromText('./*.txt')}} works but 
> {{beam.io.ReadFromText('*.txt')}} throws a "No files found based on the 
> filepattern..." error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7562) Remove/Replace Sns's Writer turn type PCollectionTuple

2019-06-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7562:
---
Status: Open  (was: Triage Needed)

> Remove/Replace Sns's Writer turn type PCollectionTuple
> --
>
> Key: BEAM-7562
> URL: https://issues.apache.org/jira/browse/BEAM-7562
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Cam Mach
>Assignee: Cam Mach
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7565) support Splittable DoFn in samza runner

2019-06-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7565:
---
Status: Open  (was: Triage Needed)

> support Splittable DoFn in samza runner
> ---
>
> Key: BEAM-7565
> URL: https://issues.apache.org/jira/browse/BEAM-7565
> Project: Beam
>  Issue Type: Task
>  Components: runner-samza
>Reporter: Hai Lu
>Assignee: Hai Lu
>Priority: Major
>
> Consider both portable and regular runner



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7564) feature complete for ParDo in Samza Portable Runner

2019-06-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7564:
---
Status: Open  (was: Triage Needed)

> feature complete for ParDo in Samza Portable Runner
> ---
>
> Key: BEAM-7564
> URL: https://issues.apache.org/jira/browse/BEAM-7564
> Project: Beam
>  Issue Type: Task
>  Components: runner-samza
>Reporter: Hai Lu
>Assignee: Hai Lu
>Priority: Major
>
> Need to support timer, state, side input, etc. in samza portable runner



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7576) Automatically push flink job-server-container to hub.docker

2019-06-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7576:
---
Status: Open  (was: Triage Needed)

> Automatically push flink job-server-container to hub.docker
> ---
>
> Key: BEAM-7576
> URL: https://issues.apache.org/jira/browse/BEAM-7576
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Chethan UK
>Priority: Major
>
> Hey all,
>  
> It will be really helpful if beam automatically push the flink 
> job-server-container to hub.docker
>  
> Thanks!.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7572) Classes ApproximateUnique.Globally and ApproximateUnique.PerKey are inaccessible

2019-06-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7572:
---
Status: Open  (was: Triage Needed)

> Classes ApproximateUnique.Globally and ApproximateUnique.PerKey are 
> inaccessible
> 
>
> Key: BEAM-7572
> URL: https://issues.apache.org/jira/browse/BEAM-7572
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.13.0
>Reporter: Gunnar Schulze
>Priority: Minor
>
> The classes ApproximateUnique.Globally and ApproximateUnique.PerKey returned 
> by ApproximateUnique.globally() and ApproximateUnique.perKey() are 
> inaccessible because of a missing "public" modifier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7569) filesystem_test is failing for Windows

2019-06-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7569:
---
Status: Open  (was: Triage Needed)

> filesystem_test is failing for Windows
> --
>
> Key: BEAM-7569
> URL: https://issues.apache.org/jira/browse/BEAM-7569
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7580) Split GroupingBuffer to N pieces

2019-06-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7580:
---
Status: Open  (was: Triage Needed)

> Split GroupingBuffer to N pieces
> 
>
> Key: BEAM-7580
> URL: https://issues.apache.org/jira/browse/BEAM-7580
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>
> Split GroupingBuffer into N pieces and reduce data size a worker needs to 
> process to improve performance. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7570) withCustomGcsTempLocation should also be implemented for BigQueryIO.Read

2019-06-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7570:
---
Status: Open  (was: Triage Needed)

> withCustomGcsTempLocation should also be implemented for BigQueryIO.Read
> 
>
> Key: BEAM-7570
> URL: https://issues.apache.org/jira/browse/BEAM-7570
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Aaron Liblong
>Priority: Major
>
> A function in BigQueryIO.Write called withCustomGcsTempLocation allows 
> specification at template execution time of a GCS location used by BigQuery 
> to write temp files. BigQuery also needs to write temp files for _read_ 
> operations, and therefore this function should be available in 
> BigQueryIO.Read.
> This issue blocks the ability to deploy a template with BigQuery read ops to 
> an environment where users (who will execute the template) have only read 
> access.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3645) Support multi-process execution on the FnApiRunner

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3645?focusedWorklogId=262791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262791
 ]

ASF GitHub Bot logged work on BEAM-3645:


Author: ASF GitHub Bot
Created on: 19/Jun/19 05:37
Start Date: 19/Jun/19 05:37
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #8872: 
[BEAM-3645] add ParallelBundleManager
URL: https://github.com/apache/beam/pull/8872#discussion_r295126972
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1185,6 +1185,32 @@ def create_pipeline(self):
   def test_register_finalizations(self):
 raise unittest.SkipTest("TODO: Avoid bundle finalizations on repeat.")
 
+class FnApiRunnerTestWithMultiWorkers(FnApiRunnerTest):
+
+  def create_pipeline(self):
+return beam.Pipeline(
+runner=fn_api_runner.FnApiRunner(num_workers=2))
+
+  def test_checkpoint(self):
+raise unittest.SkipTest("Multiworker doesn't support split request.")
+
+  def test_split_half(self):
+raise unittest.SkipTest("Multiworker doesn't support split request.")
+
+class FnApiRunnerTestWithMultiWorkersAndBundleRepeat(FnApiRunnerTest):
+
+  def create_pipeline(self):
+return beam.Pipeline(
+runner=fn_api_runner.FnApiRunner(num_workers=2, bundle_repeat=2))
+
+  def test_checkpoint(self):
+raise unittest.SkipTest("Multiworker doesn't support split request.")
 
 Review comment:
   This isn't an issue anymore with a new class with partition() function.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262791)
Time Spent: 5h 50m  (was: 5h 40m)

> Support multi-process execution on the FnApiRunner
> --
>
> Key: BEAM-3645
> URL: https://issues.apache.org/jira/browse/BEAM-3645
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Charles Chen
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> https://issues.apache.org/jira/browse/BEAM-3644 gave us a 15x performance 
> gain over the previous DirectRunner.  We can do even better in multi-core 
> environments by supporting multi-process execution in the FnApiRunner, to 
> scale past Python GIL limitations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7581) Multiplexing chunked GroupingBuffer

2019-06-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7581:
---
Status: Open  (was: Triage Needed)

> Multiplexing chunked GroupingBuffer
> ---
>
> Key: BEAM-7581
> URL: https://issues.apache.org/jira/browse/BEAM-7581
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>
> Multiplexing above chunked GroupingBuffer to multi workers and support 
> parallel processing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7575) Add Wordcount-on-Flink to Python 3.6 and 3.7 test suite

2019-06-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7575:
---
Status: Open  (was: Triage Needed)

> Add Wordcount-on-Flink to Python 3.6 and 3.7 test suite
> ---
>
> Key: BEAM-7575
> URL: https://issues.apache.org/jira/browse/BEAM-7575
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: Frederik Bode
>Assignee: Frederik Bode
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3645) Support multi-process execution on the FnApiRunner

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3645?focusedWorklogId=262789=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262789
 ]

ASF GitHub Bot logged work on BEAM-3645:


Author: ASF GitHub Bot
Created on: 19/Jun/19 05:36
Start Date: 19/Jun/19 05:36
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #8769: [WIP] 
[BEAM-3645] support multi processes for Python FnApiRunner with 
EmbeddedGrpcWorkerHandler
URL: https://github.com/apache/beam/pull/8769#discussion_r295126791
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -1418,6 +1449,51 @@ def process_bundle(self, inputs, expected_outputs, 
parallel_uid_counter=None):
 
 return result, split_results
 
+class ParallelBundleManager(BundleManager):
+  _uid_counter = 0
+  def process_bundle(self, inputs, expected_outputs):
+input_value = list(inputs.values())[0]
 
 Review comment:
   Today I found a case where we shouldn't split inputs, which is a transform 
with timer and when use grpc handler. I would like to get your advice how to 
know when we should split inputs and when we shouldn't. I added a comment with 
examples at the new PR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262789)
Time Spent: 5h 40m  (was: 5.5h)

> Support multi-process execution on the FnApiRunner
> --
>
> Key: BEAM-3645
> URL: https://issues.apache.org/jira/browse/BEAM-3645
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Charles Chen
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> https://issues.apache.org/jira/browse/BEAM-3644 gave us a 15x performance 
> gain over the previous DirectRunner.  We can do even better in multi-core 
> environments by supporting multi-process execution in the FnApiRunner, to 
> scale past Python GIL limitations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3645) Support multi-process execution on the FnApiRunner

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3645?focusedWorklogId=262787=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262787
 ]

ASF GitHub Bot logged work on BEAM-3645:


Author: ASF GitHub Bot
Created on: 19/Jun/19 05:34
Start Date: 19/Jun/19 05:34
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #8769: [WIP] 
[BEAM-3645] support multi processes for Python FnApiRunner with 
EmbeddedGrpcWorkerHandler
URL: https://github.com/apache/beam/pull/8769#discussion_r295126333
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -1418,6 +1449,51 @@ def process_bundle(self, inputs, expected_outputs, 
parallel_uid_counter=None):
 
 return result, split_results
 
+class ParallelBundleManager(BundleManager):
+  _uid_counter = 0
+  def process_bundle(self, inputs, expected_outputs):
+input_value = list(inputs.values())[0]
+if isinstance(input_value, list):
 
 Review comment:
   Thanks for explanation. Your suggestion brings many advantages. I was able 
to get rid of handling threads and reuse BundleManager as it is by writing a 
new class with partition() method. It is changed at the new PR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262787)
Time Spent: 5.5h  (was: 5h 20m)

> Support multi-process execution on the FnApiRunner
> --
>
> Key: BEAM-3645
> URL: https://issues.apache.org/jira/browse/BEAM-3645
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Charles Chen
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> https://issues.apache.org/jira/browse/BEAM-3644 gave us a 15x performance 
> gain over the previous DirectRunner.  We can do even better in multi-core 
> environments by supporting multi-process execution in the FnApiRunner, to 
> scale past Python GIL limitations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7587) Spark portable runner: Streaming mode

2019-06-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7587:
---
Status: Open  (was: Triage Needed)

> Spark portable runner: Streaming mode
> -
>
> Key: BEAM-7587
> URL: https://issues.apache.org/jira/browse/BEAM-7587
> Project: Beam
>  Issue Type: Wish
>  Components: runner-spark
>Reporter: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>
> So far all work on the Spark portable runner has been in batch mode. This is 
> intended as an uber-issue for tracking progress on adding support for 
> streaming.
> It might be advantageous to wait for the structured streaming (non-portable) 
> runner to be completed (to some reasonable extent) before undertaking this, 
> rather than using the DStream API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7590) Convert PipelineOptionsMap to PipelineOption

2019-06-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7590:
---
Status: Open  (was: Triage Needed)

> Convert PipelineOptionsMap to PipelineOption
> 
>
> Key: BEAM-7590
> URL: https://issues.apache.org/jira/browse/BEAM-7590
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Minor
>
> Currently, BeamCalciteTable keeps a map version of PipelineOptions and that 
> map version is used in JDBCConnection and RelNodes as well. This map is empty 
> when the pipeline is constructed from SQLTransform and it will have the 
> parameters passed from JDBC Client when the pipeline is started by JDBC path. 
> Since for Row-Count estimation we need to use PipelineOptions (or its 
> sub-classes) and we cannot convert a map that is created from a 
> pipelineOptions Subclasses back to PipelineOptions, it is better to keep 
> PipelineOptions object itself.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7561) HdfsFileSystem is unable to match a directory

2019-06-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7561:
---
Status: Open  (was: Triage Needed)

> HdfsFileSystem is unable to match a directory
> -
>
> Key: BEAM-7561
> URL: https://issues.apache.org/jira/browse/BEAM-7561
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hadoop-file-system
>Affects Versions: 2.13.0
>Reporter: David Moravek
>Assignee: David Moravek
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [FileSystems.match|https://beam.apache.org/releases/javadoc/2.3.0/org/apache/beam/sdk/io/FileSystems.html#match-java.util.List-]
>  method should be able to match a directory according to javadoc. Unlike 
> _HdfsFileSystems_, _LocalFileSystems_ behaves as I would so I'm assuming this 
> is a bug on _HdfsFileSystems_ side.
> *Current behavior:*
> _HadoopFileSystem.match("hdfs:///tmp/dir")_ returns a _MatchResult_ with an 
> empty metadata
> *Expected behavior:*
> _HadoopFileSystem.match("hdfs:///tmp/dir")_ returns a _MatchResult_ with 
> metadata about the directory
> **



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7588) Python BigQuery sink should be able to handle 15TB load job quota

2019-06-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7588:
---
Status: Open  (was: Triage Needed)

> Python BigQuery sink should be able to handle 15TB load job quota
> -
>
> Key: BEAM-7588
> URL: https://issues.apache.org/jira/browse/BEAM-7588
> Project: Beam
>  Issue Type: Improvement
>  Components: io-python-gcp
>Reporter: Chamikara Jayalath
>Assignee: Pablo Estrada
>Priority: Major
>
> This can be done by using multiple load jobs under 15TB when the amount of 
> data to be loaded is > 15TB.
>  
> This is already handled by Java SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7130) convertAvroFieldStrict as public static function could handle more types of value for logical type timestamp-millis

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7130?focusedWorklogId=262786=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262786
 ]

ASF GitHub Bot logged work on BEAM-7130:


Author: ASF GitHub Bot
Created on: 19/Jun/19 05:31
Start Date: 19/Jun/19 05:31
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #8376: 
[BEAM-7130]Support Datetime value conversion in convertAvroFieldStrict
URL: https://github.com/apache/beam/pull/8376
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262786)
Time Spent: 1.5h  (was: 1h 20m)

> convertAvroFieldStrict as public static function could handle more types of 
> value for logical type timestamp-millis
> ---
>
> Key: BEAM-7130
> URL: https://issues.apache.org/jira/browse/BEAM-7130
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> https://lists.apache.org/thread.html/68322dcf9418b1d1640273f1a58f70874b61b4996d08dd982b29492c@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7589:
---
Status: Open  (was: Triage Needed)

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for stream poc-test under account **.; retrying in 5062 ms
> {code}
> I'm not seeing full stack trace, but can see in log also this:
> {code:java}
> [2019-06-13 08:29:09.427018] [0x07e1][0x7f8d508d3700] [warning] [AWS 
> Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
> {code}
> More details:
>  I'm using DataFlow runner, java SDK 2.11.
> 60 workers initally, (with auto scalling and also with flag 
> "enableStreamingEngine")
> Normally, I'm producing 4-5k per second, but when I have latency, this can be 
> even multiply by 3-4 times.
> When I'm starting the DataFlow job I have latency, so I produce more data, 
> and I fail immediately.
> Also, I have consumers, 3rd party tool, I know that they call describe stream 
> each 30 seconds.
> My job pipeline, running on GCP, reading data from PubSub, it read around 
> 20,000 record per second (in regular time, and in latency time even 100,000 
> records per second) , it does many aggregation and counting base on some 
> diamnesions (Using Beam sql) , This is done for 1 minutes window slide, and 
> wrting the result of aggregations to Kinesis stream.
> My stream has 10 shards, and my partition key logic is generating UUid per 
> each record: 
> UUID.randomUUID().toString()
> Hope this gave you some more context on my problem.
> Another suggestion I have, can you try fix the issue as I suggest and provide 
> me some specific version for testing? without merging it to master? (I would 
> di it myself, but I had truobles building locally the hue repository of 
> apache beam..)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6877) TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode changes

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6877?focusedWorklogId=262778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262778
 ]

ASF GitHub Bot logged work on BEAM-6877:


Author: ASF GitHub Bot
Created on: 19/Jun/19 05:01
Start Date: 19/Jun/19 05:01
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #8856: [BEAM-6877] Fix 
trivial_inference for Python 3.x
URL: https://github.com/apache/beam/pull/8856#issuecomment-503406845
 
 
   run python postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262778)
Time Spent: 4.5h  (was: 4h 20m)

> TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode 
> changes
> 
>
> Key: BEAM-6877
> URL: https://issues.apache.org/jira/browse/BEAM-6877
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Type inference doesn't work on Python 3.6 due to [bytecode to wordcode 
> changes|https://docs.python.org/3/whatsnew/3.6.html#cpython-bytecode-changes].
> Type inference always returns Any on Python 3.6, so this is not critical.
> Affected tests are:
>  *transforms.ptransform_test*:
>  - test_combine_properly_pipeline_type_checks_using_decorator
>  - test_mean_globally_pipeline_checking_satisfied
>  - test_mean_globally_runtime_checking_satisfied
>  - test_count_globally_pipeline_type_checking_satisfied
>  - test_count_globally_runtime_type_checking_satisfied
>  - test_pardo_type_inference
>  - test_pipeline_inference
>  - test_inferred_bad_kv_type
> *typehints.trivial_inference_test*:
>  - all tests in TrivialInferenceTest
> *io.gcp.pubsub_test.TestReadFromPubSubOverride*:
> * test_expand_with_other_options
> * test_expand_with_subscription
> * test_expand_with_topic



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6955) Support Dataflow --sdk_location with modified version number

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6955?focusedWorklogId=262765=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262765
 ]

ASF GitHub Bot logged work on BEAM-6955:


Author: ASF GitHub Bot
Created on: 19/Jun/19 04:32
Start Date: 19/Jun/19 04:32
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #8885: [BEAM-6955] Use 
base version component of Beam Python SDK version when choosing Dataflow 
container image to use.
URL: https://github.com/apache/beam/pull/8885#issuecomment-503401887
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262765)
Time Spent: 1h 10m  (was: 1h)

> Support Dataflow --sdk_location with modified version number
> 
>
> Key: BEAM-6955
> URL: https://issues.apache.org/jira/browse/BEAM-6955
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.11.0
>Reporter: Daniel Lescohier
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Support Dataflow --sdk_location with modified version number
> Determine the version tag to use for the Google Container Registry, for the 
> service image versions to use on the Dataflow worker nodes. Users of Dataflow 
> may be using a locally-modified version of Apache Beam, which they submit to 
> Dataflow with the --sdk_location option. Those users would most likely modify 
> the version number of Apache Beam, so they can distinguish it from the public 
> distribution of Apache Beam. However, the remote nodes in Dataflow still need 
> to bootsrap the worker service with a Docker image that a version tag exists 
> for. 
> The most appropriate way for system integrators to modify the Apache Beam 
> version number would be to add a Local Version Identifier: 
> https://www.python.org/dev/peps/pep-0440/#local-version-identifiers
> If people only use Local Version Identifiers, then we could use the "public" 
> attribute of the pkg_resources version object.
> If people instead use a post-release version identifier: 
> https://www.python.org/dev/peps/pep-0440/#post-releases then only the 
> "base_version" attribute would work both of these version number changes. 
> Since Dataflow documentation does not specify how to modify version numbers, 
> I am choosing to use "base_version" attribute.
> Will shortly submit a PR with the change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7507) StreamingDataflowWorker attempts to decode non-utf8 binary data as utf8

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7507?focusedWorklogId=262721=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262721
 ]

ASF GitHub Bot logged work on BEAM-7507:


Author: ASF GitHub Bot
Created on: 19/Jun/19 02:10
Start Date: 19/Jun/19 02:10
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on pull request #: 
[BEAM-7507] Avoid more String.format and ByteString.toStringUtf8
URL: https://github.com/apache/beam/pull/
 
 
   Profiling revealed a couple more spots where non-utf8 byte strings were 
being interpreted as utf8.  In this case, it was also using 
Preconditions.checkNotNull/checkState, which eagerly evaluates the parameters 
to the format string for the error message.
   
   @lukecwik thanks for looking at all these :)
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 

[jira] [Work logged] (BEAM-6877) TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode changes

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6877?focusedWorklogId=262714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262714
 ]

ASF GitHub Bot logged work on BEAM-6877:


Author: ASF GitHub Bot
Created on: 19/Jun/19 01:39
Start Date: 19/Jun/19 01:39
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #8856: [BEAM-6877] Fix 
trivial_inference for Python 3.x
URL: https://github.com/apache/beam/pull/8856#issuecomment-503372272
 
 
   run python postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262714)
Time Spent: 4h 20m  (was: 4h 10m)

> TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode 
> changes
> 
>
> Key: BEAM-6877
> URL: https://issues.apache.org/jira/browse/BEAM-6877
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Type inference doesn't work on Python 3.6 due to [bytecode to wordcode 
> changes|https://docs.python.org/3/whatsnew/3.6.html#cpython-bytecode-changes].
> Type inference always returns Any on Python 3.6, so this is not critical.
> Affected tests are:
>  *transforms.ptransform_test*:
>  - test_combine_properly_pipeline_type_checks_using_decorator
>  - test_mean_globally_pipeline_checking_satisfied
>  - test_mean_globally_runtime_checking_satisfied
>  - test_count_globally_pipeline_type_checking_satisfied
>  - test_count_globally_runtime_type_checking_satisfied
>  - test_pardo_type_inference
>  - test_pipeline_inference
>  - test_inferred_bad_kv_type
> *typehints.trivial_inference_test*:
>  - all tests in TrivialInferenceTest
> *io.gcp.pubsub_test.TestReadFromPubSubOverride*:
> * test_expand_with_other_options
> * test_expand_with_subscription
> * test_expand_with_topic



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7579) Tests fail with "Cannot create a default bucket when --dataflowKmsKey is set."

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7579?focusedWorklogId=262687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262687
 ]

ASF GitHub Bot logged work on BEAM-7579:


Author: ASF GitHub Bot
Created on: 19/Jun/19 00:36
Start Date: 19/Jun/19 00:36
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #8883: [BEAM-7579] Use bucket 
with default key in ITs
URL: https://github.com/apache/beam/pull/8883#issuecomment-503361230
 
 
   run java postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262687)
Time Spent: 1.5h  (was: 1h 20m)

> Tests fail with "Cannot create a default bucket when --dataflowKmsKey is set."
> --
>
> Key: BEAM-7579
> URL: https://issues.apache.org/jira/browse/BEAM-7579
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.13.0
>Reporter: Luke Cwik
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Example run: https://builds.apache.org/job/beam_PostCommit_Java/3535/
> Failing tests:
> {code:java}
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey
> org.apache.beam.sdk.extensions.gcp.util.GcsUtilIT.testRewriteMultiPart
> {code}
> Example trace:
> {code:java}
> Error Message
> java.lang.IllegalArgumentException: Cannot create a default bucket when 
> --dataflowKmsKey is set.
> Stacktrace
> java.lang.IllegalArgumentException: Cannot create a default bucket when 
> --dataflowKmsKey is set.
>   at 
> org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.tryCreateDefaultBucket(GcpOptions.java:297)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:268)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:256)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper(ProxyInvocationHandler.java:592)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault(ProxyInvocationHandler.java:533)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:158)
>   at com.sun.proxy.$Proxy32.getGcpTempLocation(Unknown Source)
>   at 
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey(GcsKmsKeyIT.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7579) Tests fail with "Cannot create a default bucket when --dataflowKmsKey is set."

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7579?focusedWorklogId=262686=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262686
 ]

ASF GitHub Bot logged work on BEAM-7579:


Author: ASF GitHub Bot
Created on: 19/Jun/19 00:35
Start Date: 19/Jun/19 00:35
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #8883: [BEAM-7579] Use 
bucket with default key in ITs
URL: https://github.com/apache/beam/pull/8883#discussion_r295080239
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/storage/GcsKmsKeyIT.java
 ##
 @@ -66,14 +65,16 @@ public static void setup() {
   /**
* Tests writing to gcpTempLocation with --dataflowKmsKey set on the command 
line. Verifies that
 
 Review comment:
   applied
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262686)
Time Spent: 1h 20m  (was: 1h 10m)

> Tests fail with "Cannot create a default bucket when --dataflowKmsKey is set."
> --
>
> Key: BEAM-7579
> URL: https://issues.apache.org/jira/browse/BEAM-7579
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.13.0
>Reporter: Luke Cwik
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Example run: https://builds.apache.org/job/beam_PostCommit_Java/3535/
> Failing tests:
> {code:java}
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey
> org.apache.beam.sdk.extensions.gcp.util.GcsUtilIT.testRewriteMultiPart
> {code}
> Example trace:
> {code:java}
> Error Message
> java.lang.IllegalArgumentException: Cannot create a default bucket when 
> --dataflowKmsKey is set.
> Stacktrace
> java.lang.IllegalArgumentException: Cannot create a default bucket when 
> --dataflowKmsKey is set.
>   at 
> org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.tryCreateDefaultBucket(GcpOptions.java:297)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:268)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:256)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper(ProxyInvocationHandler.java:592)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault(ProxyInvocationHandler.java:533)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:158)
>   at com.sun.proxy.$Proxy32.getGcpTempLocation(Unknown Source)
>   at 
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey(GcsKmsKeyIT.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7579) Tests fail with "Cannot create a default bucket when --dataflowKmsKey is set."

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7579?focusedWorklogId=262685=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262685
 ]

ASF GitHub Bot logged work on BEAM-7579:


Author: ASF GitHub Bot
Created on: 19/Jun/19 00:35
Start Date: 19/Jun/19 00:35
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #8883: [BEAM-7579] Use 
bucket with default key in ITs
URL: https://github.com/apache/beam/pull/8883#discussion_r295080404
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestPipelineOptions.java
 ##
 @@ -36,6 +36,14 @@
 
   void setTempRoot(String value);
 
+  /**
+   * An alternative tempRoot that has a bucket-default KMS key set. Used for 
GCP CMEK integration
+   * tests.
+   */
+  String getTempRootKms();
 
 Review comment:
   Attribute added and flag removed. Cleaned up test selection a bit.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262685)
Time Spent: 1h 20m  (was: 1h 10m)

> Tests fail with "Cannot create a default bucket when --dataflowKmsKey is set."
> --
>
> Key: BEAM-7579
> URL: https://issues.apache.org/jira/browse/BEAM-7579
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.13.0
>Reporter: Luke Cwik
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Example run: https://builds.apache.org/job/beam_PostCommit_Java/3535/
> Failing tests:
> {code:java}
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey
> org.apache.beam.sdk.extensions.gcp.util.GcsUtilIT.testRewriteMultiPart
> {code}
> Example trace:
> {code:java}
> Error Message
> java.lang.IllegalArgumentException: Cannot create a default bucket when 
> --dataflowKmsKey is set.
> Stacktrace
> java.lang.IllegalArgumentException: Cannot create a default bucket when 
> --dataflowKmsKey is set.
>   at 
> org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.tryCreateDefaultBucket(GcpOptions.java:297)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:268)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:256)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper(ProxyInvocationHandler.java:592)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault(ProxyInvocationHandler.java:533)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:158)
>   at com.sun.proxy.$Proxy32.getGcpTempLocation(Unknown Source)
>   at 
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey(GcsKmsKeyIT.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7579) Tests fail with "Cannot create a default bucket when --dataflowKmsKey is set."

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7579?focusedWorklogId=262684=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262684
 ]

ASF GitHub Bot logged work on BEAM-7579:


Author: ASF GitHub Bot
Created on: 19/Jun/19 00:35
Start Date: 19/Jun/19 00:35
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #8883: [BEAM-7579] Use 
bucket with default key in ITs
URL: https://github.com/apache/beam/pull/8883#discussion_r295080229
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/storage/GcsKmsKeyIT.java
 ##
 @@ -66,14 +65,16 @@ public static void setup() {
   /**
 
 Review comment:
   removed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262684)
Time Spent: 1h 10m  (was: 1h)

> Tests fail with "Cannot create a default bucket when --dataflowKmsKey is set."
> --
>
> Key: BEAM-7579
> URL: https://issues.apache.org/jira/browse/BEAM-7579
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.13.0
>Reporter: Luke Cwik
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Example run: https://builds.apache.org/job/beam_PostCommit_Java/3535/
> Failing tests:
> {code:java}
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey
> org.apache.beam.sdk.extensions.gcp.util.GcsUtilIT.testRewriteMultiPart
> {code}
> Example trace:
> {code:java}
> Error Message
> java.lang.IllegalArgumentException: Cannot create a default bucket when 
> --dataflowKmsKey is set.
> Stacktrace
> java.lang.IllegalArgumentException: Cannot create a default bucket when 
> --dataflowKmsKey is set.
>   at 
> org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.tryCreateDefaultBucket(GcpOptions.java:297)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:268)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:256)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper(ProxyInvocationHandler.java:592)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault(ProxyInvocationHandler.java:533)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:158)
>   at com.sun.proxy.$Proxy32.getGcpTempLocation(Unknown Source)
>   at 
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey(GcsKmsKeyIT.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7590) Convert PipelineOptionsMap to PipelineOption

2019-06-18 Thread Alireza Samadianzakaria (JIRA)
Alireza Samadianzakaria created BEAM-7590:
-

 Summary: Convert PipelineOptionsMap to PipelineOption
 Key: BEAM-7590
 URL: https://issues.apache.org/jira/browse/BEAM-7590
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Alireza Samadianzakaria
Assignee: Alireza Samadianzakaria


Currently, BeamCalciteTable keeps a map version of PipelineOptions and that map 
version is used in JDBCConnection and RelNodes as well. This map is empty when 
the pipeline is constructed from SQLTransform and it will have the parameters 
passed from JDBC Client when the pipeline is started by JDBC path. 

Since for Row-Count estimation we need to use PipelineOptions (or its 
sub-classes) and we cannot convert a map that is created from a pipelineOptions 
Subclasses back to PipelineOptions, it is better to keep PipelineOptions object 
itself.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4288) SplittableDoFn: splitAtFraction() API for Python

2019-06-18 Thread Robert Bradshaw (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw resolved BEAM-4288.
---
   Resolution: Fixed
Fix Version/s: 2.14.0

> SplittableDoFn: splitAtFraction() API for Python
> 
>
> Key: BEAM-4288
> URL: https://issues.apache.org/jira/browse/BEAM-4288
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Eugene Kirpichov
>Priority: Major
> Fix For: 2.14.0
>
>
> SDF currently only has a checkpoint() API. This Jira is about adding the 
> splitAtFraction() API and its support in runners that support the respective 
> feature for sources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4288) SplittableDoFn: splitAtFraction() API for Python

2019-06-18 Thread Robert Bradshaw (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16867127#comment-16867127
 ] 

Robert Bradshaw commented on BEAM-4288:
---

This was resolved with 
[https://github.com/apache/beam/commit/8bbdc8c8317a57cfda97dcc68c15d89753265011]

> SplittableDoFn: splitAtFraction() API for Python
> 
>
> Key: BEAM-4288
> URL: https://issues.apache.org/jira/browse/BEAM-4288
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Eugene Kirpichov
>Priority: Major
>
> SDF currently only has a checkpoint() API. This Jira is about adding the 
> splitAtFraction() API and its support in runners that support the respective 
> feature for sources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7424) Retry HTTP 429 errors from GCS w/ exponential backoff when reading data

2019-06-18 Thread Chamikara Jayalath (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16867126#comment-16867126
 ] 

Chamikara Jayalath commented on BEAM-7424:
--

I believe Python fix is still in development.

> Retry HTTP 429 errors from GCS w/ exponential backoff when reading data
> ---
>
> Key: BEAM-7424
> URL: https://issues.apache.org/jira/browse/BEAM-7424
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, io-python-gcp, sdk-py-core
>Reporter: Chamikara Jayalath
>Assignee: Heejong Lee
>Priority: Blocker
> Fix For: 2.14.0
>
>
> This has to be done for both Java and Python SDKs.
> Seems like Java SDK already retries 429 errors w/o backoff (please verify): 
> [https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializer.java#L185]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6877) TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode changes

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6877?focusedWorklogId=262674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262674
 ]

ASF GitHub Bot logged work on BEAM-6877:


Author: ASF GitHub Bot
Created on: 18/Jun/19 23:28
Start Date: 18/Jun/19 23:28
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #8856: [BEAM-6877] Fix 
trivial_inference for Python 3.x
URL: https://github.com/apache/beam/pull/8856#issuecomment-503348705
 
 
   run python postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262674)
Time Spent: 4h 10m  (was: 4h)

> TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode 
> changes
> 
>
> Key: BEAM-6877
> URL: https://issues.apache.org/jira/browse/BEAM-6877
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Type inference doesn't work on Python 3.6 due to [bytecode to wordcode 
> changes|https://docs.python.org/3/whatsnew/3.6.html#cpython-bytecode-changes].
> Type inference always returns Any on Python 3.6, so this is not critical.
> Affected tests are:
>  *transforms.ptransform_test*:
>  - test_combine_properly_pipeline_type_checks_using_decorator
>  - test_mean_globally_pipeline_checking_satisfied
>  - test_mean_globally_runtime_checking_satisfied
>  - test_count_globally_pipeline_type_checking_satisfied
>  - test_count_globally_runtime_type_checking_satisfied
>  - test_pardo_type_inference
>  - test_pipeline_inference
>  - test_inferred_bad_kv_type
> *typehints.trivial_inference_test*:
>  - all tests in TrivialInferenceTest
> *io.gcp.pubsub_test.TestReadFromPubSubOverride*:
> * test_expand_with_other_options
> * test_expand_with_subscription
> * test_expand_with_topic



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-7357) Kinesis IO.write throws LimitExceededException

2019-06-18 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin resolved BEAM-7357.
---
Resolution: Fixed

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7357
> URL: https://issues.apache.org/jira/browse/BEAM-7357
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Brachi Packter
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I used Kinesis IO to write to kinesis. I get very quickly many exceptions 
> like:
> [shard_map.cc:150] Shard map update for stream "***" failed. Code: 
> LimitExceededException Message: Rate exceeded for stream *** under account 
> ***; retrying in ..
> Also, I see many exceptions like:
> Caused by: java.lang.IllegalArgumentException: Stream ** does not exist at 
> org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument(Preconditions.java:191)
>  at 
> org.apache.beam.sdk.io.kinesis.KinesisIO$Write$KinesisWriterFn.setup(KinesisIO.java:515)
> I'm sure this stream exists because I can see some data from my pipeline that 
> was successfully ingested to it.
>  
> Here is my code:
>  
>  
> {code:java}
> .apply(KinesisIO.write()
>        .withStreamName("**")
>        .withPartitioner(new KinesisPartitioner() {
>                        @Override
>                         public String getPartitionKey(byte[] value) {
>                                         return UUID.randomUUID().toString()
>                          }
>                         @Override
>                         public String getExplicitHashKey(byte[] value) {
>                                         return null;
>                         }
>        })
>.withAWSClientsProvider("**","***",Regions.US_EAST_1));{code}
>  
> I tried to not use the Kinesis IO. and everything works well, I can't figure 
> out what went wrong.
> I tried using the same API as the library did.
>  
> {code:java}
> .apply(
>  ParDo.of(new DoFn() {
>  private transient IKinesisProducer inlineProducer;
>  @Setup
>  public void setup(){
>  KinesisProducerConfiguration config =   
> KinesisProducerConfiguration.fromProperties(new Properties());
>  config.setRegion(Regions.US_EAST_1.getName());
>  config.setCredentialsProvider(new AWSStaticCredentialsProvider(new 
> BasicAWSCredentials("***", "***")));
>  inlineProducer = new KinesisProducer(config);
>  }
>  @ProcessElement
>  public void processElement(ProcessContext c) throws Exception {
> ByteBuffer data = ByteBuffer.wrap(c.element());
> String partitionKey =UUID.randomUUID().toString();
> ListenableFuture f =
> getProducer().addUserRecord("***", partitionKey, data);
>Futures.addCallback(f, new UserRecordResultFutureCallback());
> }
>  class UserRecordResultFutureCallback implements 
> FutureCallback {
>  @Override
>  public void onFailure(Throwable cause) {
>throw new RuntimeException("failed produce:"+cause);
>  }
>  @Override
>  public void onSuccess(UserRecordResult result) {
>  }
>  }
>  })
>  );
>  
> {code}
>  
> Any idea what I did wrong? or what the error in the KinesisIO?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7357) Kinesis IO.write throws LimitExceededException

2019-06-18 Thread Anton Kedin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16867120#comment-16867120
 ] 

Anton Kedin commented on BEAM-7357:
---

[~aromanenko] cool, resolving this one in favor of BEAM-7589 then

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7357
> URL: https://issues.apache.org/jira/browse/BEAM-7357
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Brachi Packter
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I used Kinesis IO to write to kinesis. I get very quickly many exceptions 
> like:
> [shard_map.cc:150] Shard map update for stream "***" failed. Code: 
> LimitExceededException Message: Rate exceeded for stream *** under account 
> ***; retrying in ..
> Also, I see many exceptions like:
> Caused by: java.lang.IllegalArgumentException: Stream ** does not exist at 
> org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument(Preconditions.java:191)
>  at 
> org.apache.beam.sdk.io.kinesis.KinesisIO$Write$KinesisWriterFn.setup(KinesisIO.java:515)
> I'm sure this stream exists because I can see some data from my pipeline that 
> was successfully ingested to it.
>  
> Here is my code:
>  
>  
> {code:java}
> .apply(KinesisIO.write()
>        .withStreamName("**")
>        .withPartitioner(new KinesisPartitioner() {
>                        @Override
>                         public String getPartitionKey(byte[] value) {
>                                         return UUID.randomUUID().toString()
>                          }
>                         @Override
>                         public String getExplicitHashKey(byte[] value) {
>                                         return null;
>                         }
>        })
>.withAWSClientsProvider("**","***",Regions.US_EAST_1));{code}
>  
> I tried to not use the Kinesis IO. and everything works well, I can't figure 
> out what went wrong.
> I tried using the same API as the library did.
>  
> {code:java}
> .apply(
>  ParDo.of(new DoFn() {
>  private transient IKinesisProducer inlineProducer;
>  @Setup
>  public void setup(){
>  KinesisProducerConfiguration config =   
> KinesisProducerConfiguration.fromProperties(new Properties());
>  config.setRegion(Regions.US_EAST_1.getName());
>  config.setCredentialsProvider(new AWSStaticCredentialsProvider(new 
> BasicAWSCredentials("***", "***")));
>  inlineProducer = new KinesisProducer(config);
>  }
>  @ProcessElement
>  public void processElement(ProcessContext c) throws Exception {
> ByteBuffer data = ByteBuffer.wrap(c.element());
> String partitionKey =UUID.randomUUID().toString();
> ListenableFuture f =
> getProducer().addUserRecord("***", partitionKey, data);
>Futures.addCallback(f, new UserRecordResultFutureCallback());
> }
>  class UserRecordResultFutureCallback implements 
> FutureCallback {
>  @Override
>  public void onFailure(Throwable cause) {
>throw new RuntimeException("failed produce:"+cause);
>  }
>  @Override
>  public void onSuccess(UserRecordResult result) {
>  }
>  }
>  })
>  );
>  
> {code}
>  
> Any idea what I did wrong? or what the error in the KinesisIO?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-18 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-7589:
--
Summary: Kinesis IO.write throws LimitExceededException  (was: []Kinesis 
IO.write throws LimitExceededException)

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for stream poc-test under account **.; retrying in 5062 ms
> {code}
> I'm not seeing full stack trace, but can see in log also this:
> {code:java}
> [2019-06-13 08:29:09.427018] [0x07e1][0x7f8d508d3700] [warning] [AWS 
> Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
> {code}
> More details:
>  I'm using DataFlow runner, java SDK 2.11.
> 60 workers initally, (with auto scalling and also with flag 
> "enableStreamingEngine")
> Normally, I'm producing 4-5k per second, but when I have latency, this can be 
> even multiply by 3-4 times.
> When I'm starting the DataFlow job I have latency, so I produce more data, 
> and I fail immediately.
> Also, I have consumers, 3rd party tool, I know that they call describe stream 
> each 30 seconds.
> My job pipeline, running on GCP, reading data from PubSub, it read around 
> 20,000 record per second (in regular time, and in latency time even 100,000 
> records per second) , it does many aggregation and counting base on some 
> diamnesions (Using Beam sql) , This is done for 1 minutes window slide, and 
> wrting the result of aggregations to Kinesis stream.
> My stream has 10 shards, and my partition key logic is generating UUid per 
> each record: 
> UUID.randomUUID().toString()
> Hope this gave you some more context on my problem.
> Another suggestion I have, can you try fix the issue as I suggest and provide 
> me some specific version for testing? without merging it to master? (I would 
> di it myself, but I had truobles building locally the hue repository of 
> apache beam..)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7589) []Kinesis IO.write throws LimitExceededException

2019-06-18 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-7589:
-

 Summary: []Kinesis IO.write throws LimitExceededException
 Key: BEAM-7589
 URL: https://issues.apache.org/jira/browse/BEAM-7589
 Project: Beam
  Issue Type: Bug
  Components: io-java-kinesis
Affects Versions: 2.11.0
Reporter: Anton Kedin
Assignee: Alexey Romanenko
 Fix For: 2.15.0


Follow up from https://issues.apache.org/jira/browse/BEAM-7357:

 

Brachi Packter added a comment - 13/Jun/19 09:05
 [~aromanenko] I think I find what makes the shard map update now.

You create a producer per bundle (in SetUp function) and if I multiply it by 
the number of workers, this gives huge amount of producers, I belive this make 
the "update shard map" call.

If I copy your code and create *one* producer ** for every wroker, then this 
error disappear.

Can you just remove the producer creation from setUp method, and move it to 
some static field in the class, that created once the class is initiated.

See similar issue that was with JDBCIO, connection pool was created per setup 
method, and we moved it to be a static member, and then we will have one pool 
for JVM. ask [~iemejia] for more detail.

Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
  
 [~brachi_packter] What kind of error do you have in this case? Could you post 
an error stacktrace / exception message? 
 Also, it would be helpful (if it's possible) if you could provide more details 
about your environment and pipeline, like what is your pipeline topology, which 
runner do you use, number of workers in your cluster, etc. 
 For now, I can't reproduce it on my side, so all additional info will be 
helpful.

Brachi Packter added a comment - 16/Jun/19 06:44
 I get same Same error:
{code:java}
[0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
for stream "**" failed. Code: LimitExceededException Message: Rate exceeded for 
stream poc-test under account **.; retrying in 5062 ms
{code}
I'm not seeing full stack trace, but can see in log also this:
{code:java}
[2019-06-13 08:29:09.427018] [0x07e1][0x7f8d508d3700] [warning] [AWS 
Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
{code}
More details:
 I'm using DataFlow runner, java SDK 2.11.

60 workers initally, (with auto scalling and also with flag 
"enableStreamingEngine")

Normally, I'm producing 4-5k per second, but when I have latency, this can be 
even multiply by 3-4 times.

When I'm starting the DataFlow job I have latency, so I produce more data, and 
I fail immediately.

Also, I have consumers, 3rd party tool, I know that they call describe stream 
each 30 seconds.

My job pipeline, running on GCP, reading data from PubSub, it read around 
20,000 record per second (in regular time, and in latency time even 100,000 
records per second) , it does many aggregation and counting base on some 
diamnesions (Using Beam sql) , This is done for 1 minutes window slide, and 
wrting the result of aggregations to Kinesis stream.

My stream has 10 shards, and my partition key logic is generating UUid per each 
record: 

UUID.randomUUID().toString()

Hope this gave you some more context on my problem.

Another suggestion I have, can you try fix the issue as I suggest and provide 
me some specific version for testing? without merging it to master? (I would di 
it myself, but I had truobles building locally the hue repository of apache 
beam..)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-7493) beam-sdks-testing-nexmark produces corrupt pom.xml

2019-06-18 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-7493.
-
Resolution: Fixed

> beam-sdks-testing-nexmark produces corrupt pom.xml
> --
>
> Key: BEAM-7493
> URL: https://issues.apache.org/jira/browse/BEAM-7493
> Project: Beam
>  Issue Type: Bug
>  Components: testing-nexmark
>Affects Versions: 2.13.0
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: 2.14.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> {code}
> ./gradlew -Prelease -Ppublishing 
> :sdks:java:testing:nexmark:publishMavenJavaPublicationTotestPublicationLocalRepository
> {code}
> and if you inspect the pom you will see
> {code}
>
>   org.apache.beam
>   beam-sdks-java-testing-test-utils
> ...
> {code}
> It appears to be constructed from the directories, but this (and also 
> load-tests) in the {{testing}} directory have their archive base name 
> overridden: 
> https://github.com/apache/beam/blob/master/sdks/java/testing/test-utils/build.gradle#L20
> So after publication, this will result in a dependency not found error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7561) HdfsFileSystem is unable to match a directory

2019-06-18 Thread Anton Kedin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16867104#comment-16867104
 ] 

Anton Kedin commented on BEAM-7561:
---

Awesome, thank you

 

> HdfsFileSystem is unable to match a directory
> -
>
> Key: BEAM-7561
> URL: https://issues.apache.org/jira/browse/BEAM-7561
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hadoop-file-system
>Affects Versions: 2.13.0
>Reporter: David Moravek
>Assignee: David Moravek
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [FileSystems.match|https://beam.apache.org/releases/javadoc/2.3.0/org/apache/beam/sdk/io/FileSystems.html#match-java.util.List-]
>  method should be able to match a directory according to javadoc. Unlike 
> _HdfsFileSystems_, _LocalFileSystems_ behaves as I would so I'm assuming this 
> is a bug on _HdfsFileSystems_ side.
> *Current behavior:*
> _HadoopFileSystem.match("hdfs:///tmp/dir")_ returns a _MatchResult_ with an 
> empty metadata
> *Expected behavior:*
> _HadoopFileSystem.match("hdfs:///tmp/dir")_ returns a _MatchResult_ with 
> metadata about the directory
> **



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7588) Python BigQuery sink should be able to handle 15TB load job quota

2019-06-18 Thread Chamikara Jayalath (JIRA)
Chamikara Jayalath created BEAM-7588:


 Summary: Python BigQuery sink should be able to handle 15TB load 
job quota
 Key: BEAM-7588
 URL: https://issues.apache.org/jira/browse/BEAM-7588
 Project: Beam
  Issue Type: Improvement
  Components: io-python-gcp
Reporter: Chamikara Jayalath
Assignee: Pablo Estrada


This can be done by using multiple load jobs under 15TB when the amount of data 
to be loaded is > 15TB.

 

This is already handled by Java SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7586) Add Integration Test for MongoDbIO in python sdk

2019-06-18 Thread Yichi Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yichi Zhang updated BEAM-7586:
--
Status: Open  (was: Triage Needed)

> Add Integration Test for MongoDbIO in python sdk 
> -
>
> Key: BEAM-7586
> URL: https://issues.apache.org/jira/browse/BEAM-7586
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (BEAM-7586) Add Integration Test for MongoDbIO in python sdk

2019-06-18 Thread Yichi Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-7586 started by Yichi Zhang.
-
> Add Integration Test for MongoDbIO in python sdk 
> -
>
> Key: BEAM-7586
> URL: https://issues.apache.org/jira/browse/BEAM-7586
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5148) Implement MongoDB IO for Python SDK

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5148?focusedWorklogId=262625=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262625
 ]

ASF GitHub Bot logged work on BEAM-5148:


Author: ASF GitHub Bot
Created on: 18/Jun/19 21:07
Start Date: 18/Jun/19 21:07
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8826: [BEAM-5148] 
Implement MongoDB IO for Python SDK
URL: https://github.com/apache/beam/pull/8826#issuecomment-503312161
 
 
   Thanks. Merging.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262625)
Time Spent: 8h 10m  (was: 8h)

> Implement MongoDB IO for Python SDK
> ---
>
> Key: BEAM-5148
> URL: https://issues.apache.org/jira/browse/BEAM-5148
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Affects Versions: 3.0.0
>Reporter: Pascal Gula
>Assignee: Yichi Zhang
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Currently Java SDK has MongoDB support but Python SDK does not. With current 
> portability efforts other runners may soon be able to use Python SDK. Having 
> mongoDB support will allow these runners to execute large scale jobs using it.
> Since we need this IO components @ Peat, we started working on a PyPi package 
> available at this repository: [https://github.com/PEAT-AI/beam-extended]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5148) Implement MongoDB IO for Python SDK

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5148?focusedWorklogId=262626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262626
 ]

ASF GitHub Bot logged work on BEAM-5148:


Author: ASF GitHub Bot
Created on: 18/Jun/19 21:07
Start Date: 18/Jun/19 21:07
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #8826: 
[BEAM-5148] Implement MongoDB IO for Python SDK
URL: https://github.com/apache/beam/pull/8826
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262626)
Time Spent: 8h 20m  (was: 8h 10m)

> Implement MongoDB IO for Python SDK
> ---
>
> Key: BEAM-5148
> URL: https://issues.apache.org/jira/browse/BEAM-5148
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Affects Versions: 3.0.0
>Reporter: Pascal Gula
>Assignee: Yichi Zhang
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Currently Java SDK has MongoDB support but Python SDK does not. With current 
> portability efforts other runners may soon be able to use Python SDK. Having 
> mongoDB support will allow these runners to execute large scale jobs using it.
> Since we need this IO components @ Peat, we started working on a PyPi package 
> available at this repository: [https://github.com/PEAT-AI/beam-extended]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6877) TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode changes

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6877?focusedWorklogId=262620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262620
 ]

ASF GitHub Bot logged work on BEAM-6877:


Author: ASF GitHub Bot
Created on: 18/Jun/19 21:01
Start Date: 18/Jun/19 21:01
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #8856: [BEAM-6877] Fix 
trivial_inference for Python 3.x
URL: https://github.com/apache/beam/pull/8856#issuecomment-503310234
 
 
   Another instance of streaming WC error on an unrelated change: 
https://builds.apache.org/job/beam_PreCommit_Python_Commit/6973/consoleFull, so 
I am inclining to think it is a flake
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262620)
Time Spent: 3h 50m  (was: 3h 40m)

> TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode 
> changes
> 
>
> Key: BEAM-6877
> URL: https://issues.apache.org/jira/browse/BEAM-6877
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Type inference doesn't work on Python 3.6 due to [bytecode to wordcode 
> changes|https://docs.python.org/3/whatsnew/3.6.html#cpython-bytecode-changes].
> Type inference always returns Any on Python 3.6, so this is not critical.
> Affected tests are:
>  *transforms.ptransform_test*:
>  - test_combine_properly_pipeline_type_checks_using_decorator
>  - test_mean_globally_pipeline_checking_satisfied
>  - test_mean_globally_runtime_checking_satisfied
>  - test_count_globally_pipeline_type_checking_satisfied
>  - test_count_globally_runtime_type_checking_satisfied
>  - test_pardo_type_inference
>  - test_pipeline_inference
>  - test_inferred_bad_kv_type
> *typehints.trivial_inference_test*:
>  - all tests in TrivialInferenceTest
> *io.gcp.pubsub_test.TestReadFromPubSubOverride*:
> * test_expand_with_other_options
> * test_expand_with_subscription
> * test_expand_with_topic



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6877) TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode changes

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6877?focusedWorklogId=262621=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262621
 ]

ASF GitHub Bot logged work on BEAM-6877:


Author: ASF GitHub Bot
Created on: 18/Jun/19 21:01
Start Date: 18/Jun/19 21:01
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #8856: [BEAM-6877] Fix 
trivial_inference for Python 3.x
URL: https://github.com/apache/beam/pull/8856#issuecomment-503310234
 
 
   Another instance of streaming WC error on an unrelated change: 
https://builds.apache.org/job/beam_PreCommit_Python_Commit/6973/consoleFull, so 
I am inclined to think it is a flake
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262621)
Time Spent: 4h  (was: 3h 50m)

> TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode 
> changes
> 
>
> Key: BEAM-6877
> URL: https://issues.apache.org/jira/browse/BEAM-6877
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Type inference doesn't work on Python 3.6 due to [bytecode to wordcode 
> changes|https://docs.python.org/3/whatsnew/3.6.html#cpython-bytecode-changes].
> Type inference always returns Any on Python 3.6, so this is not critical.
> Affected tests are:
>  *transforms.ptransform_test*:
>  - test_combine_properly_pipeline_type_checks_using_decorator
>  - test_mean_globally_pipeline_checking_satisfied
>  - test_mean_globally_runtime_checking_satisfied
>  - test_count_globally_pipeline_type_checking_satisfied
>  - test_count_globally_runtime_type_checking_satisfied
>  - test_pardo_type_inference
>  - test_pipeline_inference
>  - test_inferred_bad_kv_type
> *typehints.trivial_inference_test*:
>  - all tests in TrivialInferenceTest
> *io.gcp.pubsub_test.TestReadFromPubSubOverride*:
> * test_expand_with_other_options
> * test_expand_with_subscription
> * test_expand_with_topic



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6955) Support Dataflow --sdk_location with modified version number

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6955?focusedWorklogId=262619=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262619
 ]

ASF GitHub Bot logged work on BEAM-6955:


Author: ASF GitHub Bot
Created on: 18/Jun/19 21:00
Start Date: 18/Jun/19 21:00
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #8885: [BEAM-6955] Use 
base version component of Beam Python SDK version when choosing Dataflow 
container image to use.
URL: https://github.com/apache/beam/pull/8885#issuecomment-503309805
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262619)
Time Spent: 1h  (was: 50m)

> Support Dataflow --sdk_location with modified version number
> 
>
> Key: BEAM-6955
> URL: https://issues.apache.org/jira/browse/BEAM-6955
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.11.0
>Reporter: Daniel Lescohier
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Support Dataflow --sdk_location with modified version number
> Determine the version tag to use for the Google Container Registry, for the 
> service image versions to use on the Dataflow worker nodes. Users of Dataflow 
> may be using a locally-modified version of Apache Beam, which they submit to 
> Dataflow with the --sdk_location option. Those users would most likely modify 
> the version number of Apache Beam, so they can distinguish it from the public 
> distribution of Apache Beam. However, the remote nodes in Dataflow still need 
> to bootsrap the worker service with a Docker image that a version tag exists 
> for. 
> The most appropriate way for system integrators to modify the Apache Beam 
> version number would be to add a Local Version Identifier: 
> https://www.python.org/dev/peps/pep-0440/#local-version-identifiers
> If people only use Local Version Identifiers, then we could use the "public" 
> attribute of the pkg_resources version object.
> If people instead use a post-release version identifier: 
> https://www.python.org/dev/peps/pep-0440/#post-releases then only the 
> "base_version" attribute would work both of these version number changes. 
> Since Dataflow documentation does not specify how to modify version numbers, 
> I am choosing to use "base_version" attribute.
> Will shortly submit a PR with the change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7357) Kinesis IO.write throws LimitExceededException

2019-06-18 Thread Alexey Romanenko (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16867028#comment-16867028
 ] 

Alexey Romanenko commented on BEAM-7357:


[~brachi_packter] Thank you for details, I'll get back to this issue soon.
[~kedin] It's not regression, seems there are two different issues. One issue 
was fixed and I agree to create a separate Jira for the second one.

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7357
> URL: https://issues.apache.org/jira/browse/BEAM-7357
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Brachi Packter
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I used Kinesis IO to write to kinesis. I get very quickly many exceptions 
> like:
> [shard_map.cc:150] Shard map update for stream "***" failed. Code: 
> LimitExceededException Message: Rate exceeded for stream *** under account 
> ***; retrying in ..
> Also, I see many exceptions like:
> Caused by: java.lang.IllegalArgumentException: Stream ** does not exist at 
> org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument(Preconditions.java:191)
>  at 
> org.apache.beam.sdk.io.kinesis.KinesisIO$Write$KinesisWriterFn.setup(KinesisIO.java:515)
> I'm sure this stream exists because I can see some data from my pipeline that 
> was successfully ingested to it.
>  
> Here is my code:
>  
>  
> {code:java}
> .apply(KinesisIO.write()
>        .withStreamName("**")
>        .withPartitioner(new KinesisPartitioner() {
>                        @Override
>                         public String getPartitionKey(byte[] value) {
>                                         return UUID.randomUUID().toString()
>                          }
>                         @Override
>                         public String getExplicitHashKey(byte[] value) {
>                                         return null;
>                         }
>        })
>.withAWSClientsProvider("**","***",Regions.US_EAST_1));{code}
>  
> I tried to not use the Kinesis IO. and everything works well, I can't figure 
> out what went wrong.
> I tried using the same API as the library did.
>  
> {code:java}
> .apply(
>  ParDo.of(new DoFn() {
>  private transient IKinesisProducer inlineProducer;
>  @Setup
>  public void setup(){
>  KinesisProducerConfiguration config =   
> KinesisProducerConfiguration.fromProperties(new Properties());
>  config.setRegion(Regions.US_EAST_1.getName());
>  config.setCredentialsProvider(new AWSStaticCredentialsProvider(new 
> BasicAWSCredentials("***", "***")));
>  inlineProducer = new KinesisProducer(config);
>  }
>  @ProcessElement
>  public void processElement(ProcessContext c) throws Exception {
> ByteBuffer data = ByteBuffer.wrap(c.element());
> String partitionKey =UUID.randomUUID().toString();
> ListenableFuture f =
> getProducer().addUserRecord("***", partitionKey, data);
>Futures.addCallback(f, new UserRecordResultFutureCallback());
> }
>  class UserRecordResultFutureCallback implements 
> FutureCallback {
>  @Override
>  public void onFailure(Throwable cause) {
>throw new RuntimeException("failed produce:"+cause);
>  }
>  @Override
>  public void onSuccess(UserRecordResult result) {
>  }
>  }
>  })
>  );
>  
> {code}
>  
> Any idea what I did wrong? or what the error in the KinesisIO?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2953) Timeseries processing extensions using state API

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2953?focusedWorklogId=262608=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262608
 ]

ASF GitHub Bot logged work on BEAM-2953:


Author: ASF GitHub Bot
Created on: 18/Jun/19 20:48
Start Date: 18/Jun/19 20:48
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #6540: [BEAM-2953] 
Timeseries extensions . 
URL: https://github.com/apache/beam/pull/6540#issuecomment-503305681
 
 
   This pull request has been closed due to lack of activity. If you think that 
is incorrect, or the pull request requires review, you can revive the PR at any 
time.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262608)
Time Spent: 4.5h  (was: 4h 20m)

> Timeseries processing extensions using state API
> 
>
> Key: BEAM-2953
> URL: https://issues.apache.org/jira/browse/BEAM-2953
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-ideas
>Affects Versions: 2.7.0
>Reporter: Reza ardeshir rokni
>Priority: Minor
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> A general set of timeseries transforms that abstract the user from the 
> process of dealing with some of the common problems when dealing with 
> timeseries using BEAM (in stream or batch mode).
> BEAM can be used to build out some very interesting pre-processing stages for 
> time series data. Some examples that will be useful:
>  - Downsampling time series based on simple MIN, MAX, COUNT, SUM, LAST, FIRST
>  - Creating a value for each downsampled window even if no value has been 
> emitted for the specific key. 
>  - Loading the value of a downsample with the previous value (used in FX with 
> previous close being brought into current open value)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4762) Add postCommit scripts and perfkit dashboards for nexmark on Gearpump runner

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4762?focusedWorklogId=262610=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262610
 ]

ASF GitHub Bot logged work on BEAM-4762:


Author: ASF GitHub Bot
Created on: 18/Jun/19 20:48
Start Date: 18/Jun/19 20:48
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on pull request #6808: [BEAM-4762] 
Add Gearpump Nexmark PostCommit scripts
URL: https://github.com/apache/beam/pull/6808
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262610)
Time Spent: 10.5h  (was: 10h 20m)

> Add postCommit scripts and perfkit dashboards for nexmark on Gearpump runner
> 
>
> Key: BEAM-4762
> URL: https://issues.apache.org/jira/browse/BEAM-4762
> Project: Beam
>  Issue Type: Test
>  Components: testing-nexmark
>Reporter: Etienne Chauchot
>Assignee: Manu Zhang
>Priority: Major
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2953) Timeseries processing extensions using state API

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2953?focusedWorklogId=262611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262611
 ]

ASF GitHub Bot logged work on BEAM-2953:


Author: ASF GitHub Bot
Created on: 18/Jun/19 20:48
Start Date: 18/Jun/19 20:48
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on pull request #6540: [BEAM-2953] 
Timeseries extensions . 
URL: https://github.com/apache/beam/pull/6540
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262611)
Time Spent: 4h 40m  (was: 4.5h)

> Timeseries processing extensions using state API
> 
>
> Key: BEAM-2953
> URL: https://issues.apache.org/jira/browse/BEAM-2953
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-ideas
>Affects Versions: 2.7.0
>Reporter: Reza ardeshir rokni
>Priority: Minor
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> A general set of timeseries transforms that abstract the user from the 
> process of dealing with some of the common problems when dealing with 
> timeseries using BEAM (in stream or batch mode).
> BEAM can be used to build out some very interesting pre-processing stages for 
> time series data. Some examples that will be useful:
>  - Downsampling time series based on simple MIN, MAX, COUNT, SUM, LAST, FIRST
>  - Creating a value for each downsampled window even if no value has been 
> emitted for the specific key. 
>  - Loading the value of a downsample with the previous value (used in FX with 
> previous close being brought into current open value)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5806) Allow to change the PubsubClientFactory when using PubsubIO

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5806?focusedWorklogId=262609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262609
 ]

ASF GitHub Bot logged work on BEAM-5806:


Author: ASF GitHub Bot
Created on: 18/Jun/19 20:48
Start Date: 18/Jun/19 20:48
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #6769: WIP: [BEAM-5806] 
Update PubsubIO to be able to change the PubsubClientFactory
URL: https://github.com/apache/beam/pull/6769#issuecomment-503305652
 
 
   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262609)
Time Spent: 3h 10m  (was: 3h)
Remaining Estimate: 20h 50m  (was: 21h)

> Allow to change the PubsubClientFactory when using PubsubIO
> ---
>
> Key: BEAM-5806
> URL: https://issues.apache.org/jira/browse/BEAM-5806
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Logan HAUSPIE
>Priority: Minor
>   Original Estimate: 24h
>  Time Spent: 3h 10m
>  Remaining Estimate: 20h 50m
>
> By using the PubsubIO to read from or write to Pub/Sub we are obliged to use 
> the PubsubJsonClient to interact with the Pub/Sub API.
> This PubsubJsonClient encode the message in base 64 and increase the size of 
> this one by 30% and there is no way to change the PubsubClient used by 
> PubsubIO.
>  
> What I suggest is to allow developper to change the PubsubClientFactory by 
> specifying it at the definition-time like the following:
> {{^PubsubIO.Read read = PubsubIO.readStrings()
>       .fromTopic(StaticValueProvider.of(topic))^}}
>   .withTimestampAttribute("myTimestamp")^}}
>   .withIdAttribute("myId")^}}
>   *.withClientFactory(PubsubGrpcClient.FACTORY)*;^}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4762) Add postCommit scripts and perfkit dashboards for nexmark on Gearpump runner

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4762?focusedWorklogId=262607=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262607
 ]

ASF GitHub Bot logged work on BEAM-4762:


Author: ASF GitHub Bot
Created on: 18/Jun/19 20:48
Start Date: 18/Jun/19 20:48
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #6808: [BEAM-4762] Add 
Gearpump Nexmark PostCommit scripts
URL: https://github.com/apache/beam/pull/6808#issuecomment-503305675
 
 
   This pull request has been closed due to lack of activity. If you think that 
is incorrect, or the pull request requires review, you can revive the PR at any 
time.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262607)
Time Spent: 10h 20m  (was: 10h 10m)

> Add postCommit scripts and perfkit dashboards for nexmark on Gearpump runner
> 
>
> Key: BEAM-4762
> URL: https://issues.apache.org/jira/browse/BEAM-4762
> Project: Beam
>  Issue Type: Test
>  Components: testing-nexmark
>Reporter: Etienne Chauchot
>Assignee: Manu Zhang
>Priority: Major
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7547) StreamingDataflowWorker can observe inconsistent cache for stale work items

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7547?focusedWorklogId=262583=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262583
 ]

ASF GitHub Bot logged work on BEAM-7547:


Author: ASF GitHub Bot
Created on: 18/Jun/19 19:42
Start Date: 18/Jun/19 19:42
Worklog Time Spent: 10m 
  Work Description: scwhittle commented on issue #8842: [BEAM-7547] Avoid 
WindmillStateCache cache hits for stale work.
URL: https://github.com/apache/beam/pull/8842#issuecomment-503282654
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262583)
Time Spent: 1h  (was: 50m)

> StreamingDataflowWorker can observe inconsistent cache for stale work items
> ---
>
> Key: BEAM-7547
> URL: https://issues.apache.org/jira/browse/BEAM-7547
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> 1. Dataflow backend generates a work item with a cache token C.
> 2. StreamingDataflowWorker receives the work item and reads the state using 
> C, it either hits the cache or performs a read.
> 3. Dataflow backend sends a retry of the work item (possibly because it 
> thinks original work item never reached the StreamingDataflowWorker).
> 4. StreamingDataflowWorker commits the work item and gets ack from dataflow 
> backend.  It caches the state for the key using C.
> 5. StreamingDataflowWorker receives the retried work item with cache token C. 
>  It uses the cached state and causes possible user consistency failures 
> because the cache view is of after the work item completed processing.
> Note that this will not cause corrupted Dataflow persistent state because the 
> commit of the retried work item using the inconsistent cache will fail. 
> However it may cause failures in user logic for example if they keep the set 
> of all seen items in state and throw an exception on duplicates which should 
> have been removed by an upstream stage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7547) StreamingDataflowWorker can observe inconsistent cache for stale work items

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7547?focusedWorklogId=262582=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262582
 ]

ASF GitHub Bot logged work on BEAM-7547:


Author: ASF GitHub Bot
Created on: 18/Jun/19 19:41
Start Date: 18/Jun/19 19:41
Worklog Time Spent: 10m 
  Work Description: scwhittle commented on issue #8842: [BEAM-7547] Avoid 
WindmillStateCache cache hits for stale work.
URL: https://github.com/apache/beam/pull/8842#issuecomment-503282368
 
 
   @reuvenlax Increasing work tokens is guaranteed by dataflow streaming 
backend.
   @dpmills Can you take a look?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262582)
Time Spent: 50m  (was: 40m)

> StreamingDataflowWorker can observe inconsistent cache for stale work items
> ---
>
> Key: BEAM-7547
> URL: https://issues.apache.org/jira/browse/BEAM-7547
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> 1. Dataflow backend generates a work item with a cache token C.
> 2. StreamingDataflowWorker receives the work item and reads the state using 
> C, it either hits the cache or performs a read.
> 3. Dataflow backend sends a retry of the work item (possibly because it 
> thinks original work item never reached the StreamingDataflowWorker).
> 4. StreamingDataflowWorker commits the work item and gets ack from dataflow 
> backend.  It caches the state for the key using C.
> 5. StreamingDataflowWorker receives the retried work item with cache token C. 
>  It uses the cached state and causes possible user consistency failures 
> because the cache view is of after the work item completed processing.
> Note that this will not cause corrupted Dataflow persistent state because the 
> commit of the retried work item using the inconsistent cache will fail. 
> However it may cause failures in user logic for example if they keep the set 
> of all seen items in state and throw an exception on duplicates which should 
> have been removed by an upstream stage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7587) Spark portable runner: Streaming mode

2019-06-18 Thread Kyle Weaver (JIRA)
Kyle Weaver created BEAM-7587:
-

 Summary: Spark portable runner: Streaming mode
 Key: BEAM-7587
 URL: https://issues.apache.org/jira/browse/BEAM-7587
 Project: Beam
  Issue Type: Wish
  Components: runner-spark
Reporter: Kyle Weaver


So far all work on the Spark portable runner has been in batch mode. This is 
intended as an uber-issue for tracking progress on adding support for streaming.

It might be advantageous to wait for the structured streaming (non-portable) 
runner to be completed (to some reasonable extent) before undertaking this, 
rather than using the DStream API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7520) DirectRunner timers are not strictly time ordered

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7520?focusedWorklogId=262543=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262543
 ]

ASF GitHub Bot logged work on BEAM-7520:


Author: ASF GitHub Bot
Created on: 18/Jun/19 18:25
Start Date: 18/Jun/19 18:25
Worklog Time Spent: 10m 
  Work Description: je-ik commented on pull request #8815: [BEAM-7520] fire 
timers only for single instant at a moment
URL: https://github.com/apache/beam/pull/8815#discussion_r294964234
 
 

 ##
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java
 ##
 @@ -651,16 +652,12 @@ private void processElement(Map 
windowToMergeResult, WindowedValue
   // it but the local output watermark (also for this key) has not. After 
data is emitted and
   // the output watermark hold is released, the output watermark on this 
key will immediately
   // exceed the end of the window (otherwise we could see multiple ON_TIME 
outputs)
-  this.isEndOfWindow =
-  
timerInternals.currentInputWatermarkTime().isAfter(window.maxTimestamp())
-  && outputWatermarkBeforeEOW;
+  this.isEndOfWindow = !timestamp.isBefore(window.maxTimestamp()) && 
outputWatermarkBeforeEOW;
 
   // The "GC time" is reached when the input watermark surpasses the end 
of the window
   // plus allowed lateness. After this, the window is expired and expunged.
   this.isGarbageCollection =
-  timerInternals
-  .currentInputWatermarkTime()
-  .isAfter(LateDataUtils.garbageCollectionTime(window, 
windowingStrategy));
+  !timestamp.isBefore(LateDataUtils.garbageCollectionTime(window, 
windowingStrategy));
 
 Review comment:
   This one was tough. The problem here was, that previously input watermark 
moved at the end of each bundle. Each bundle also contained its own timers. But 
- because timers can generate new timers and these are processed in different 
bundle - the actual timestamp of timer must be taken into account. Otherwise a 
timer setup for some earlier time might trigger window garbage collection, 
because input watermark is already way ahead. I'm not quite sure of the other 
consequences though. Is it correct, that one bundle might generate multiple 
bundles, or is there a bundle atomicity requirement? On the other hand, runners 
that don't have concept of bundles (and have therefore by definition bundle of 
size 1) have to generate multiple bundles from single bundle, so that might be 
ok. Am I right?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262543)
Time Spent: 0.5h  (was: 20m)

> DirectRunner timers are not strictly time ordered
> -
>
> Key: BEAM-7520
> URL: https://issues.apache.org/jira/browse/BEAM-7520
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.13.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Let's suppose we have the following situation:
>  - statful ParDo with two timers - timerA and timerB
>  - timerA is set for window.maxTimestamp() + 1
>  - timerB is set anywhere between  timerB.timestamp
>  - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE
> Then the order of timers is as follows (correct):
>  - timerB
>  - timerA
> But, if timerB sets another timer (say for timerB.timestamp + 1), then the 
> order of timers will be:
>  - timerB (timerB.timestamp)
>  - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE)
>  - timerB (timerB.timestamp + 1)
> Which is not ordered by timestamp. The reason for this is that when the input 
> watermark update is evaluated, the WatermarkManager,extractFiredTimers() will 
> produce both timerA and timerB. That would be correct, but when timerB sets 
> another timer, that breaks this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7586) Add Integration Test for MongoDbIO in python sdk

2019-06-18 Thread Yichi Zhang (JIRA)
Yichi Zhang created BEAM-7586:
-

 Summary: Add Integration Test for MongoDbIO in python sdk 
 Key: BEAM-7586
 URL: https://issues.apache.org/jira/browse/BEAM-7586
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-py-core
Reporter: Yichi Zhang
Assignee: Yichi Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-7527) Beam Python integration test suites are flaky: ModuleNotFoundError

2019-06-18 Thread Mark Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866900#comment-16866900
 ] 

Mark Liu edited comment on BEAM-7527 at 6/18/19 6:10 PM:
-

It makes beam_PerformanceTests_WordCountIT_Py27 consistently failed.

The gradle command this test executes is:
{code}
/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_WordCountIT_Py27/src/gradlew
 integrationTest 
-Dtests=apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it -p 
sdks/python -Dattr=IT -DpipelineOptions="--project=apache-beam-testing" 
"--staging_location=gs://temp-storage-for-end-to-end-tests/staging-it" 
"--temp_location=gs://temp-storage-for-end-to-end-tests/temp-it" 
"--input=gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*" 
"--output=gs://temp-storage-for-end-to-end-tests/py-it-cloud/output" 
"--expect_checksum=ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710" "--num_workers=10" 
"--autoscaling_algorithm=NONE" "--runner=TestDataflowRunner" 
"--sdk_location=build/apache-beam.tar.gz" --info --scan
{code}

It should run single task (only py27), but actually run few that includes 
multiple python versions:
{code}
05:19:23 Tasks to be executed: [task ':sdks:python:setupVirtualenv', task 
':sdks:python:sdist', task ':sdks:python:installGcpTest', task 
':sdks:python:integrationTest', task 
':sdks:python:test-suites:dataflow:py35:setupVirtualenv', task 
':sdks:python:test-suites:dataflow:py35:sdist', task 
':sdks:python:test-suites:dataflow:py35:installGcpTest', task 
':sdks:python:test-suites:dataflow:py35:integrationTest', task 
':sdks:python:test-suites:dataflow:py36:setupVirtualenv', task 
':sdks:python:test-suites:dataflow:py36:sdist', task 
':sdks:python:test-suites:dataflow:py36:installGcpTest', task 
':sdks:python:test-suites:dataflow:py36:integrationTest', task 
':sdks:python:test-suites:dataflow:py37:setupVirtualenv', task 
':sdks:python:test-suites:dataflow:py37:sdist', task 
':sdks:python:test-suites:dataflow:py37:installGcpTest', task 
':sdks:python:test-suites:dataflow:py37:integrationTest']
{code} 

Probably some changes happened in gradle scripts. I'll keep investigate on that.


was (Author: markflyhigh):
It makes beam_PerformanceTests_WordCountIT_Py27 consistently failed.

The gradle command this test executes is:
{code}
/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_WordCountIT_Py27/src/gradlew
 integrationTest 
-Dtests=apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it -p 
sdks/python -Dattr=IT -DpipelineOptions="--project=apache-beam-testing" 
"--staging_location=gs://temp-storage-for-end-to-end-tests/staging-it" 
"--temp_location=gs://temp-storage-for-end-to-end-tests/temp-it" 
"--input=gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*" 
"--output=gs://temp-storage-for-end-to-end-tests/py-it-cloud/output" 
"--expect_checksum=ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710" "--num_workers=10" 
"--autoscaling_algorithm=NONE" "--runner=TestDataflowRunner" 
"--sdk_location=build/apache-beam.tar.gz" --info --scan
{code}

It should run single task (only in py27), but actually run few that includes 
multiple python versions:
{code}
05:19:23 Tasks to be executed: [task ':sdks:python:setupVirtualenv', task 
':sdks:python:sdist', task ':sdks:python:installGcpTest', task 
':sdks:python:integrationTest', task 
':sdks:python:test-suites:dataflow:py35:setupVirtualenv', task 
':sdks:python:test-suites:dataflow:py35:sdist', task 
':sdks:python:test-suites:dataflow:py35:installGcpTest', task 
':sdks:python:test-suites:dataflow:py35:integrationTest', task 
':sdks:python:test-suites:dataflow:py36:setupVirtualenv', task 
':sdks:python:test-suites:dataflow:py36:sdist', task 
':sdks:python:test-suites:dataflow:py36:installGcpTest', task 
':sdks:python:test-suites:dataflow:py36:integrationTest', task 
':sdks:python:test-suites:dataflow:py37:setupVirtualenv', task 
':sdks:python:test-suites:dataflow:py37:sdist', task 
':sdks:python:test-suites:dataflow:py37:installGcpTest', task 
':sdks:python:test-suites:dataflow:py37:integrationTest']
{code} 

Probably some changes happened in gradle scripts. I'll keep investigate on that.

> Beam Python integration test suites are flaky: ModuleNotFoundError
> --
>
> Key: BEAM-7527
> URL: https://issues.apache.org/jira/browse/BEAM-7527
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Mark Liu
>Priority: Major
>
> I am seeing several errors in Python SDK Integration test suites, such as 
> Dataflow ValidatesRunner and Python PostCommit that fail due to one of the 
> autogenerated files not being found.
> For example:
> {noformat}
> 

[jira] [Comment Edited] (BEAM-7527) Beam Python integration test suites are flaky: ModuleNotFoundError

2019-06-18 Thread Mark Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866900#comment-16866900
 ] 

Mark Liu edited comment on BEAM-7527 at 6/18/19 6:10 PM:
-

It makes beam_PerformanceTests_WordCountIT_Py27 consistently failed.

The gradle command this test executes is:
{code}
/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_WordCountIT_Py27/src/gradlew
 integrationTest 
-Dtests=apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it -p 
sdks/python -Dattr=IT -DpipelineOptions="--project=apache-beam-testing" 
"--staging_location=gs://temp-storage-for-end-to-end-tests/staging-it" 
"--temp_location=gs://temp-storage-for-end-to-end-tests/temp-it" 
"--input=gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*" 
"--output=gs://temp-storage-for-end-to-end-tests/py-it-cloud/output" 
"--expect_checksum=ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710" "--num_workers=10" 
"--autoscaling_algorithm=NONE" "--runner=TestDataflowRunner" 
"--sdk_location=build/apache-beam.tar.gz" --info --scan
{code}

It should run single test task (only in py27), but actually run few that 
includes multiple python versions:
{code}
05:19:23 Tasks to be executed: [task ':sdks:python:setupVirtualenv', task 
':sdks:python:sdist', task ':sdks:python:installGcpTest', task 
':sdks:python:integrationTest', task 
':sdks:python:test-suites:dataflow:py35:setupVirtualenv', task 
':sdks:python:test-suites:dataflow:py35:sdist', task 
':sdks:python:test-suites:dataflow:py35:installGcpTest', task 
':sdks:python:test-suites:dataflow:py35:integrationTest', task 
':sdks:python:test-suites:dataflow:py36:setupVirtualenv', task 
':sdks:python:test-suites:dataflow:py36:sdist', task 
':sdks:python:test-suites:dataflow:py36:installGcpTest', task 
':sdks:python:test-suites:dataflow:py36:integrationTest', task 
':sdks:python:test-suites:dataflow:py37:setupVirtualenv', task 
':sdks:python:test-suites:dataflow:py37:sdist', task 
':sdks:python:test-suites:dataflow:py37:installGcpTest', task 
':sdks:python:test-suites:dataflow:py37:integrationTest']
{code} 

Probably some changes happened in gradle scripts. I'll keep investigate on that.


was (Author: markflyhigh):
It makes beam_PerformanceTests_WordCountIT_Py27 consistently failed.

The gradle command this test executes is:
{code}
/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_WordCountIT_Py27/src/gradlew
 integrationTest 
-Dtests=apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it -p 
sdks/python -Dattr=IT -DpipelineOptions="--project=apache-beam-testing" 
"--staging_location=gs://temp-storage-for-end-to-end-tests/staging-it" 
"--temp_location=gs://temp-storage-for-end-to-end-tests/temp-it" 
"--input=gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*" 
"--output=gs://temp-storage-for-end-to-end-tests/py-it-cloud/output" 
"--expect_checksum=ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710" "--num_workers=10" 
"--autoscaling_algorithm=NONE" "--runner=TestDataflowRunner" 
"--sdk_location=build/apache-beam.tar.gz" --info --scan
{code}

It should only run single test WordcountIT in py27, but actually run test in 
multiple python versions:
{code}
05:19:23 Tasks to be executed: [task ':sdks:python:setupVirtualenv', task 
':sdks:python:sdist', task ':sdks:python:installGcpTest', task 
':sdks:python:integrationTest', task 
':sdks:python:test-suites:dataflow:py35:setupVirtualenv', task 
':sdks:python:test-suites:dataflow:py35:sdist', task 
':sdks:python:test-suites:dataflow:py35:installGcpTest', task 
':sdks:python:test-suites:dataflow:py35:integrationTest', task 
':sdks:python:test-suites:dataflow:py36:setupVirtualenv', task 
':sdks:python:test-suites:dataflow:py36:sdist', task 
':sdks:python:test-suites:dataflow:py36:installGcpTest', task 
':sdks:python:test-suites:dataflow:py36:integrationTest', task 
':sdks:python:test-suites:dataflow:py37:setupVirtualenv', task 
':sdks:python:test-suites:dataflow:py37:sdist', task 
':sdks:python:test-suites:dataflow:py37:installGcpTest', task 
':sdks:python:test-suites:dataflow:py37:integrationTest']
{code} 

Probably some changes happened in gradle scripts. I'll keep investigate on that.

> Beam Python integration test suites are flaky: ModuleNotFoundError
> --
>
> Key: BEAM-7527
> URL: https://issues.apache.org/jira/browse/BEAM-7527
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Mark Liu
>Priority: Major
>
> I am seeing several errors in Python SDK Integration test suites, such as 
> Dataflow ValidatesRunner and Python PostCommit that fail due to one of the 
> autogenerated files not being found.
> For example:
> {noformat}
> 

[jira] [Comment Edited] (BEAM-7527) Beam Python integration test suites are flaky: ModuleNotFoundError

2019-06-18 Thread Mark Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866900#comment-16866900
 ] 

Mark Liu edited comment on BEAM-7527 at 6/18/19 6:10 PM:
-

It makes beam_PerformanceTests_WordCountIT_Py27 consistently failed.

The gradle command this test executes is:
{code}
/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_WordCountIT_Py27/src/gradlew
 integrationTest 
-Dtests=apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it -p 
sdks/python -Dattr=IT -DpipelineOptions="--project=apache-beam-testing" 
"--staging_location=gs://temp-storage-for-end-to-end-tests/staging-it" 
"--temp_location=gs://temp-storage-for-end-to-end-tests/temp-it" 
"--input=gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*" 
"--output=gs://temp-storage-for-end-to-end-tests/py-it-cloud/output" 
"--expect_checksum=ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710" "--num_workers=10" 
"--autoscaling_algorithm=NONE" "--runner=TestDataflowRunner" 
"--sdk_location=build/apache-beam.tar.gz" --info --scan
{code}

It should run single task (only in py27), but actually run few that includes 
multiple python versions:
{code}
05:19:23 Tasks to be executed: [task ':sdks:python:setupVirtualenv', task 
':sdks:python:sdist', task ':sdks:python:installGcpTest', task 
':sdks:python:integrationTest', task 
':sdks:python:test-suites:dataflow:py35:setupVirtualenv', task 
':sdks:python:test-suites:dataflow:py35:sdist', task 
':sdks:python:test-suites:dataflow:py35:installGcpTest', task 
':sdks:python:test-suites:dataflow:py35:integrationTest', task 
':sdks:python:test-suites:dataflow:py36:setupVirtualenv', task 
':sdks:python:test-suites:dataflow:py36:sdist', task 
':sdks:python:test-suites:dataflow:py36:installGcpTest', task 
':sdks:python:test-suites:dataflow:py36:integrationTest', task 
':sdks:python:test-suites:dataflow:py37:setupVirtualenv', task 
':sdks:python:test-suites:dataflow:py37:sdist', task 
':sdks:python:test-suites:dataflow:py37:installGcpTest', task 
':sdks:python:test-suites:dataflow:py37:integrationTest']
{code} 

Probably some changes happened in gradle scripts. I'll keep investigate on that.


was (Author: markflyhigh):
It makes beam_PerformanceTests_WordCountIT_Py27 consistently failed.

The gradle command this test executes is:
{code}
/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_WordCountIT_Py27/src/gradlew
 integrationTest 
-Dtests=apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it -p 
sdks/python -Dattr=IT -DpipelineOptions="--project=apache-beam-testing" 
"--staging_location=gs://temp-storage-for-end-to-end-tests/staging-it" 
"--temp_location=gs://temp-storage-for-end-to-end-tests/temp-it" 
"--input=gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*" 
"--output=gs://temp-storage-for-end-to-end-tests/py-it-cloud/output" 
"--expect_checksum=ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710" "--num_workers=10" 
"--autoscaling_algorithm=NONE" "--runner=TestDataflowRunner" 
"--sdk_location=build/apache-beam.tar.gz" --info --scan
{code}

It should run single test task (only in py27), but actually run few that 
includes multiple python versions:
{code}
05:19:23 Tasks to be executed: [task ':sdks:python:setupVirtualenv', task 
':sdks:python:sdist', task ':sdks:python:installGcpTest', task 
':sdks:python:integrationTest', task 
':sdks:python:test-suites:dataflow:py35:setupVirtualenv', task 
':sdks:python:test-suites:dataflow:py35:sdist', task 
':sdks:python:test-suites:dataflow:py35:installGcpTest', task 
':sdks:python:test-suites:dataflow:py35:integrationTest', task 
':sdks:python:test-suites:dataflow:py36:setupVirtualenv', task 
':sdks:python:test-suites:dataflow:py36:sdist', task 
':sdks:python:test-suites:dataflow:py36:installGcpTest', task 
':sdks:python:test-suites:dataflow:py36:integrationTest', task 
':sdks:python:test-suites:dataflow:py37:setupVirtualenv', task 
':sdks:python:test-suites:dataflow:py37:sdist', task 
':sdks:python:test-suites:dataflow:py37:installGcpTest', task 
':sdks:python:test-suites:dataflow:py37:integrationTest']
{code} 

Probably some changes happened in gradle scripts. I'll keep investigate on that.

> Beam Python integration test suites are flaky: ModuleNotFoundError
> --
>
> Key: BEAM-7527
> URL: https://issues.apache.org/jira/browse/BEAM-7527
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Mark Liu
>Priority: Major
>
> I am seeing several errors in Python SDK Integration test suites, such as 
> Dataflow ValidatesRunner and Python PostCommit that fail due to one of the 
> autogenerated files not being found.
> For example:
> {noformat}
> 

[jira] [Commented] (BEAM-7527) Beam Python integration test suites are flaky: ModuleNotFoundError

2019-06-18 Thread Mark Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866900#comment-16866900
 ] 

Mark Liu commented on BEAM-7527:


It makes beam_PerformanceTests_WordCountIT_Py27 consistently failed.

The gradle command this test executes is:
{code}
/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_WordCountIT_Py27/src/gradlew
 integrationTest 
-Dtests=apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it -p 
sdks/python -Dattr=IT -DpipelineOptions="--project=apache-beam-testing" 
"--staging_location=gs://temp-storage-for-end-to-end-tests/staging-it" 
"--temp_location=gs://temp-storage-for-end-to-end-tests/temp-it" 
"--input=gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*" 
"--output=gs://temp-storage-for-end-to-end-tests/py-it-cloud/output" 
"--expect_checksum=ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710" "--num_workers=10" 
"--autoscaling_algorithm=NONE" "--runner=TestDataflowRunner" 
"--sdk_location=build/apache-beam.tar.gz" --info --scan
{code}

It should only run single test WordcountIT in py27, but actually run test in 
multiple python versions:
{code}
05:19:23 Tasks to be executed: [task ':sdks:python:setupVirtualenv', task 
':sdks:python:sdist', task ':sdks:python:installGcpTest', task 
':sdks:python:integrationTest', task 
':sdks:python:test-suites:dataflow:py35:setupVirtualenv', task 
':sdks:python:test-suites:dataflow:py35:sdist', task 
':sdks:python:test-suites:dataflow:py35:installGcpTest', task 
':sdks:python:test-suites:dataflow:py35:integrationTest', task 
':sdks:python:test-suites:dataflow:py36:setupVirtualenv', task 
':sdks:python:test-suites:dataflow:py36:sdist', task 
':sdks:python:test-suites:dataflow:py36:installGcpTest', task 
':sdks:python:test-suites:dataflow:py36:integrationTest', task 
':sdks:python:test-suites:dataflow:py37:setupVirtualenv', task 
':sdks:python:test-suites:dataflow:py37:sdist', task 
':sdks:python:test-suites:dataflow:py37:installGcpTest', task 
':sdks:python:test-suites:dataflow:py37:integrationTest']
{code} 

Probably some changes happened in gradle scripts. I'll keep investigate on that.

> Beam Python integration test suites are flaky: ModuleNotFoundError
> --
>
> Key: BEAM-7527
> URL: https://issues.apache.org/jira/browse/BEAM-7527
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Mark Liu
>Priority: Major
>
> I am seeing several errors in Python SDK Integration test suites, such as 
> Dataflow ValidatesRunner and Python PostCommit that fail due to one of the 
> autogenerated files not being found.
> For example:
> {noformat}
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/__init__.py:84:
>  UserWarning: Running the Apache Beam SDK on Python 3 is not yet fully 
> supported. You may encounter buggy behavior or missing features.
>   'Running the Apache Beam SDK on Python 3 is not yet fully supported. '
> Failure: ModuleNotFoundError (No module named 'beam_runner_api_pb2') ... 
> ERROR
> ==
> ERROR: Failure: ModuleNotFoundError (No module named 'beam_runner_api_pb2')
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/failure.py",
>  line 39, in runTest
> raise self.exc_val.with_traceback(self.tb)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/loader.py",
>  line 418, in loadTestsFromName
> addr.filename, addr.module)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/importer.py",
>  line 47, in importFromPath
> return self.importFromDir(dir_path, fqname)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/importer.py",
>  line 94, in importFromDir
> mod = load_module(part_fqname, fh, filename, desc)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/imp.py",
>  line 245, in load_module
> return load_package(name, filename)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/imp.py",
>  line 217, in load_package
> return _load(spec)
>   File "", line 684, in _load
>   File "", line 665, in _load_unlocked
>   File "", line 678, in 

[jira] [Work logged] (BEAM-7223) Enable Python3 tests for Flink

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7223?focusedWorklogId=262540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262540
 ]

ASF GitHub Bot logged work on BEAM-7223:


Author: ASF GitHub Bot
Created on: 18/Jun/19 18:06
Start Date: 18/Jun/19 18:06
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #8877: [BEAM-7223] 
Add ValidatesRunner for Flink for python 3.5 - 3.6 - 3.7
URL: https://github.com/apache/beam/pull/8877#discussion_r294956244
 
 

 ##
 File path: sdks/python/test-suites/portable/py36/build.gradle
 ##
 @@ -0,0 +1,79 @@
+/*
 
 Review comment:
   Instead of creating copies for each python versions, can we do something 
like 
https://github.com/apache/beam/blob/a6c9c92544643076c16d06e888c0152e39374564/runners/flink/1.8/job-server/build.gradle#L31
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262540)
Time Spent: 1h  (was: 50m)

> Enable Python3 tests for Flink
> --
>
> Key: BEAM-7223
> URL: https://issues.apache.org/jira/browse/BEAM-7223
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Ankur Goenka
>Assignee: Frederik Bode
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Add py3 integration tests for Flink



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5148) Implement MongoDB IO for Python SDK

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5148?focusedWorklogId=262533=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262533
 ]

ASF GitHub Bot logged work on BEAM-5148:


Author: ASF GitHub Bot
Created on: 18/Jun/19 18:00
Start Date: 18/Jun/19 18:00
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #8826: [BEAM-5148] Implement 
MongoDB IO for Python SDK
URL: https://github.com/apache/beam/pull/8826#issuecomment-503246025
 
 
   > LGTM. Thanks.
   > 
   > Please squash commits for merging according to 
https://beam.apache.org/contribute/
   
   Thanks for reviewing!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262533)
Time Spent: 8h  (was: 7h 50m)

> Implement MongoDB IO for Python SDK
> ---
>
> Key: BEAM-5148
> URL: https://issues.apache.org/jira/browse/BEAM-5148
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Affects Versions: 3.0.0
>Reporter: Pascal Gula
>Assignee: Yichi Zhang
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Currently Java SDK has MongoDB support but Python SDK does not. With current 
> portability efforts other runners may soon be able to use Python SDK. Having 
> mongoDB support will allow these runners to execute large scale jobs using it.
> Since we need this IO components @ Peat, we started working on a PyPi package 
> available at this repository: [https://github.com/PEAT-AI/beam-extended]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6955) Support Dataflow --sdk_location with modified version number

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6955?focusedWorklogId=262517=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262517
 ]

ASF GitHub Bot logged work on BEAM-6955:


Author: ASF GitHub Bot
Created on: 18/Jun/19 17:35
Start Date: 18/Jun/19 17:35
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #8885: [BEAM-6955] Use 
base version component of Beam Python SDK version when choosing Dataflow 
container image to use.
URL: https://github.com/apache/beam/pull/8885#issuecomment-503236176
 
 
   R: @aaltay 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262517)
Time Spent: 50m  (was: 40m)

> Support Dataflow --sdk_location with modified version number
> 
>
> Key: BEAM-6955
> URL: https://issues.apache.org/jira/browse/BEAM-6955
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.11.0
>Reporter: Daniel Lescohier
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Support Dataflow --sdk_location with modified version number
> Determine the version tag to use for the Google Container Registry, for the 
> service image versions to use on the Dataflow worker nodes. Users of Dataflow 
> may be using a locally-modified version of Apache Beam, which they submit to 
> Dataflow with the --sdk_location option. Those users would most likely modify 
> the version number of Apache Beam, so they can distinguish it from the public 
> distribution of Apache Beam. However, the remote nodes in Dataflow still need 
> to bootsrap the worker service with a Docker image that a version tag exists 
> for. 
> The most appropriate way for system integrators to modify the Apache Beam 
> version number would be to add a Local Version Identifier: 
> https://www.python.org/dev/peps/pep-0440/#local-version-identifiers
> If people only use Local Version Identifiers, then we could use the "public" 
> attribute of the pkg_resources version object.
> If people instead use a post-release version identifier: 
> https://www.python.org/dev/peps/pep-0440/#post-releases then only the 
> "base_version" attribute would work both of these version number changes. 
> Since Dataflow documentation does not specify how to modify version numbers, 
> I am choosing to use "base_version" attribute.
> Will shortly submit a PR with the change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6955) Support Dataflow --sdk_location with modified version number

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6955?focusedWorklogId=262510=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262510
 ]

ASF GitHub Bot logged work on BEAM-6955:


Author: ASF GitHub Bot
Created on: 18/Jun/19 17:30
Start Date: 18/Jun/19 17:30
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #8885: [BEAM-6955] 
Use base version component of Beam Python SDK version when choosing Dataflow 
container image to use.
URL: https://github.com/apache/beam/pull/8885
 
 
   When Beam SDK version is modified, for example, when current version of the 
SDK is a release candidate, Dataflow runner should use the base version 
component of the SDK, which we can parse via `pkg_resources.parse_version()`, 
for example:
   ```
   >>> pkg_resources.parse_version("2.14.0").base_version
   '2.14.0'
   >>> pkg_resources.parse_version("2.14.0.rc1").base_version
   '2.14.0'
   ```
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-7130) convertAvroFieldStrict as public static function could handle more types of value for logical type timestamp-millis

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7130?focusedWorklogId=262506=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262506
 ]

ASF GitHub Bot logged work on BEAM-7130:


Author: ASF GitHub Bot
Created on: 18/Jun/19 17:24
Start Date: 18/Jun/19 17:24
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #8376: [BEAM-7130]Support 
Datetime value conversion in convertAvroFieldStrict
URL: https://github.com/apache/beam/pull/8376#issuecomment-503231904
 
 
   @lukecwik As reuvenlax might not be available at this moment, and Beam is 
releasing 2.14. Could you help merge this PR if no other comment so that I can 
add this one to 2.14? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262506)
Time Spent: 1h 20m  (was: 1h 10m)

> convertAvroFieldStrict as public static function could handle more types of 
> value for logical type timestamp-millis
> ---
>
> Key: BEAM-7130
> URL: https://issues.apache.org/jira/browse/BEAM-7130
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> https://lists.apache.org/thread.html/68322dcf9418b1d1640273f1a58f70874b61b4996d08dd982b29492c@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7130) convertAvroFieldStrict as public static function could handle more types of value for logical type timestamp-millis

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7130?focusedWorklogId=262505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262505
 ]

ASF GitHub Bot logged work on BEAM-7130:


Author: ASF GitHub Bot
Created on: 18/Jun/19 17:24
Start Date: 18/Jun/19 17:24
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #8376: [BEAM-7130]Support 
Datetime value conversion in convertAvroFieldStrict
URL: https://github.com/apache/beam/pull/8376#issuecomment-503231904
 
 
   @lukecwik As reuvenlax might be available at this moment, and Beam is 
releasing 2.14. Could you help merge this PR if no other comment so that I can 
add this one to 2.14? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262505)
Time Spent: 1h 10m  (was: 1h)

> convertAvroFieldStrict as public static function could handle more types of 
> value for logical type timestamp-millis
> ---
>
> Key: BEAM-7130
> URL: https://issues.apache.org/jira/browse/BEAM-7130
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> https://lists.apache.org/thread.html/68322dcf9418b1d1640273f1a58f70874b61b4996d08dd982b29492c@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7579) Tests fail with "Cannot create a default bucket when --dataflowKmsKey is set."

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7579?focusedWorklogId=262476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262476
 ]

ASF GitHub Bot logged work on BEAM-7579:


Author: ASF GitHub Bot
Created on: 18/Jun/19 17:02
Start Date: 18/Jun/19 17:02
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #8883: [BEAM-7579] Use 
bucket with default key in ITs
URL: https://github.com/apache/beam/pull/8883#discussion_r294929516
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestPipelineOptions.java
 ##
 @@ -36,6 +36,14 @@
 
   void setTempRoot(String value);
 
+  /**
+   * An alternative tempRoot that has a bucket-default KMS key set. Used for 
GCP CMEK integration
+   * tests.
+   */
+  String getTempRootKms();
 
 Review comment:
   I could add an attribute like NeedsKmsBucket
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262476)
Time Spent: 1h  (was: 50m)

> Tests fail with "Cannot create a default bucket when --dataflowKmsKey is set."
> --
>
> Key: BEAM-7579
> URL: https://issues.apache.org/jira/browse/BEAM-7579
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.13.0
>Reporter: Luke Cwik
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Example run: https://builds.apache.org/job/beam_PostCommit_Java/3535/
> Failing tests:
> {code:java}
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey
> org.apache.beam.sdk.extensions.gcp.util.GcsUtilIT.testRewriteMultiPart
> {code}
> Example trace:
> {code:java}
> Error Message
> java.lang.IllegalArgumentException: Cannot create a default bucket when 
> --dataflowKmsKey is set.
> Stacktrace
> java.lang.IllegalArgumentException: Cannot create a default bucket when 
> --dataflowKmsKey is set.
>   at 
> org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.tryCreateDefaultBucket(GcpOptions.java:297)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:268)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:256)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper(ProxyInvocationHandler.java:592)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault(ProxyInvocationHandler.java:533)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:158)
>   at com.sun.proxy.$Proxy32.getGcpTempLocation(Unknown Source)
>   at 
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey(GcsKmsKeyIT.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5148) Implement MongoDB IO for Python SDK

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5148?focusedWorklogId=262438=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262438
 ]

ASF GitHub Bot logged work on BEAM-5148:


Author: ASF GitHub Bot
Created on: 18/Jun/19 16:32
Start Date: 18/Jun/19 16:32
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #8826: [BEAM-5148] Implement 
MongoDB IO for Python SDK
URL: https://github.com/apache/beam/pull/8826#issuecomment-503212800
 
 
   Run RAT PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262438)
Time Spent: 7h 50m  (was: 7h 40m)

> Implement MongoDB IO for Python SDK
> ---
>
> Key: BEAM-5148
> URL: https://issues.apache.org/jira/browse/BEAM-5148
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Affects Versions: 3.0.0
>Reporter: Pascal Gula
>Assignee: Yichi Zhang
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Currently Java SDK has MongoDB support but Python SDK does not. With current 
> portability efforts other runners may soon be able to use Python SDK. Having 
> mongoDB support will allow these runners to execute large scale jobs using it.
> Since we need this IO components @ Peat, we started working on a PyPi package 
> available at this repository: [https://github.com/PEAT-AI/beam-extended]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4736) Support reading decimal from BigQuery in Beam SQL

2019-06-18 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang reassigned BEAM-4736:
--

Assignee: (was: Rui Wang)

> Support reading decimal from BigQuery in Beam SQL
> -
>
> Key: BEAM-4736
> URL: https://issues.apache.org/jira/browse/BEAM-4736
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Rui Wang
>Priority: Major
>
> Right now BigQueryIO returns avro format data and Beam SQL parses it and 
> convert to Row. More specifically, decimal value is saved into HeapByteBuffer
>  
> Need to investigate what's the way to convert from byte buffer to BigDecimal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-7142) Data Driven testing for BeamSQL

2019-06-18 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang reassigned BEAM-7142:
--

Assignee: (was: Rui Wang)

> Data Driven testing for BeamSQL
> ---
>
> Key: BEAM-7142
> URL: https://issues.apache.org/jira/browse/BEAM-7142
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Priority: Major
>
> Current way to write BeamSQL test cases is too heavy: developers need to 
> initialize pipeline, deal with PCollection, and use PAssert to verify 
> pipeline results (sometime through INSERT INTO table and read data from table 
> for assertion). 
> Data driven testing, instead, should only ask developer to provide SQL query 
> and a expected result in the form of List (simulate rows from result 
> table). The test execution interface should just be a static function like 
> "List run(String query)", and returned rows can be compared with 
> expected result by checking equality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6000) Support DATETIME type in BeamSQL

2019-06-18 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang reassigned BEAM-6000:
--

Assignee: (was: Rui Wang)

> Support DATETIME type in BeamSQL
> 
>
> Key: BEAM-6000
> URL: https://issues.apache.org/jira/browse/BEAM-6000
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Priority: Major
>
> Datetime type
> Name  Description Range
> DATETIME  Represents a year, month, day, hour, minute, second, and 
> subsecond. 0001-01-01 00:00:00 to -12-31 23:59:59.99.
> A DATETIME represents a point in time. Each DATETIME contains the following:
> year
> month
> day
> hour
> minute
> second
> subsecond
> Unlike Timestamps, a DATETIME object does not refer to an absolute instance 
> in time. Instead, it is the civil time, or the time that a user would see on 
> a watch or calendar.
> Canonical format
> -[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.DD]]
> : Four-digit year
> [M]M: One or two digit month
> [D]D: One or two digit day
> ( |T): A space or a T separator
> [H]H: One or two digit hour (valid values from 00 to 23)
> [M]M: One or two digit minutes (valid values from 00 to 59)
> [S]S: One or two digit seconds (valid values from 00 to 59)
> [.DD]: Up to six fractional digits (i.e. up to microsecond precision)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-5046) Implement Missing Standard SQL functions in BeamSQL

2019-06-18 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang closed BEAM-5046.
--
   Resolution: Won't Do
Fix Version/s: Not applicable

> Implement Missing Standard SQL functions in BeamSQL
> ---
>
> Key: BEAM-5046
> URL: https://issues.apache.org/jira/browse/BEAM-5046
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: Not applicable
>
>
> We want BeamSQL follows the SQL standard of calcite, which means we should 
> support standard SQL functions defined by Calcite at the best of effort.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-7461) [beam_PostCommit_SQL] [TestPubsub.checkIfAnySubscriptionExists] DEADLINE_EXCEEDED: PubsubGrpcClient.listSubscriptions

2019-06-18 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang resolved BEAM-7461.

   Resolution: Fixed
Fix Version/s: Not applicable

> [beam_PostCommit_SQL] [TestPubsub.checkIfAnySubscriptionExists] 
> DEADLINE_EXCEEDED: PubsubGrpcClient.listSubscriptions
> -
>
> Key: BEAM-7461
> URL: https://issues.apache.org/jira/browse/BEAM-7461
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql, test-failures
>Reporter: Andrew Pilloud
>Assignee: Rui Wang
>Priority: Major
>  Labels: currently-failing, flaky-test
> Fix For: Not applicable
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins Job|[https://builds.apache.org/job/beam_PostCommit_SQL/1646/]]
>  * [Gradle Build Scan|[https://scans.gradle.com/s/rncz63gm72ksk]]
>  * [Test source code|TODO]
> Initial investigation:
> Could be related to https://issues.apache.org/jira/browse/BEAM-5122
> {code:java}
> io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: Deadline expired before 
> operation could complete.
>   at 
> io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:233)
>   at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:214)
>   at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:139)
>   at 
> com.google.pubsub.v1.SubscriberGrpc$SubscriberBlockingStub.listSubscriptions(SubscriberGrpc.java:1734)
>   at 
> org.apache.beam.sdk.io.gcp.pubsub.PubsubGrpcClient.listSubscriptions(PubsubGrpcClient.java:373)
>   at 
> org.apache.beam.sdk.io.gcp.pubsub.TestPubsub.listSubscriptions(TestPubsub.java:165)
>   at 
> org.apache.beam.sdk.io.gcp.pubsub.TestPubsub.checkIfAnySubscriptionExists(TestPubsub.java:195)
>   at 
> org.apache.beam.sdk.extensions.sql.meta.provider.pubsub.PubsubJsonIT.testSQLLimit(PubsubJsonIT.java:315)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.beam.sdk.io.gcp.pubsub.TestPubsubSignal$1.evaluate(TestPubsubSignal.java:116)
>   at 
> org.apache.beam.sdk.io.gcp.pubsub.TestPubsub$1.evaluate(TestPubsub.java:92)
>   at 
> org.apache.beam.sdk.io.gcp.pubsub.TestPubsub$1.evaluate(TestPubsub.java:92)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at 
> org.apache.beam.sdk.io.gcp.pubsub.TestPubsubSignal$1.evaluate(TestPubsubSignal.java:116)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:349)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:314)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:312)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:396)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> 

[jira] [Work logged] (BEAM-7428) ReadAllViaFileBasedSource does not output the timestamps of the read elements

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7428?focusedWorklogId=262395=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262395
 ]

ASF GitHub Bot logged work on BEAM-7428:


Author: ASF GitHub Bot
Created on: 18/Jun/19 15:45
Start Date: 18/Jun/19 15:45
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8741: [BEAM-7428] Output 
the timestamp on elements in ReadAllViaFileBasedSource
URL: https://github.com/apache/beam/pull/8741#issuecomment-503194658
 
 
   In the Beam model, the watermark propagates forward since the input 
watermark >= output watermark of each transform. The input watermark is a bound 
saying that all data that is before me will be considered late while the output 
watermark says that all data that is output before me will be considered late. 
So lets say T1 provides data to T2, then
   ```
   ... >= input_watermark(T1) >= output_watermark(T1) >= input_watermark(T2) >= 
ouptut_watermark(T2) >= ...
   ```
   Note that the input/output watermarks are computed by the runner and aren't 
ever exposed to the SDK. Whether something is late or ontime is figured out as 
part of the GroupByKey. The runner also automatically advances the output 
watermark when all the data for a given input watermark has been consumed 
(watermark holds which is a special timer that isn't meant to be exposed to 
SDKs and will be modeled properly with the resolution of 
https://issues.apache.org/jira/browse/BEAM-2535 does allow the SDK to hold back 
the watermark).
   
   This seems all great but what about the roots of the pipeline. This is where 
the `UnboundedSource` has the ability to report what it thinks the watermark 
should be and the runner computes the watermark by taking the min across all 
UnboundedSource instances for a particular root.
   
   Another property of the Beam model is that if you output elements with a 
timestamp that is before the output watermark, the data will be classified as 
late (and depending on the trigger, may be dropped).
   
   Combining these two properties of the Beam model, we get to the issue of 
what timestamp we should output with.
   
   My concern is that the proposed solution of using getCurrentTimestamp and if 
it is unknown fallback to using the element timestamp can lead to data being 
marked late by default in the case where ReadAllViaFileBasedSource is used in a 
streaming pipeline.
   
   The reason why I suggest to model this as an SDF is that I believe SDFs 
should be able to hold the output watermark just like an UnboundedSource by 
invoking ProcessContext.updateWatermark() which would then allow for 
ReadAllViaFileBasedSource to output data without any of it being marked late.
   
   Eugene believes that ProcessContext.updateWatermark() should only be used to 
advance the output watermark faster then the input and to not be able to hold 
the output watermark and hence the data could be marked as late.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262395)
Time Spent: 5.5h  (was: 5h 20m)

> ReadAllViaFileBasedSource does not output the timestamps of the read elements
> -
>
> Key: BEAM-7428
> URL: https://issues.apache.org/jira/browse/BEAM-7428
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> This differs from the implementation of JavaReadViaImpulse that tackles a 
> similar problem but does output the timestamps correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7499) ReifyTest.test_window fails in DirectRunner due to 'assign_context.window should not be None.'

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7499?focusedWorklogId=262365=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262365
 ]

ASF GitHub Bot logged work on BEAM-7499:


Author: ASF GitHub Bot
Created on: 18/Jun/19 15:05
Start Date: 18/Jun/19 15:05
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8866: [BEAM-7499] unskip 
ReifyTest.test_window
URL: https://github.com/apache/beam/pull/8866#issuecomment-503177401
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262365)
Time Spent: 1h 40m  (was: 1.5h)

> ReifyTest.test_window fails in DirectRunner due to 'assign_context.window 
> should not be None.'
> --
>
> Key: BEAM-7499
> URL: https://issues.apache.org/jira/browse/BEAM-7499
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, test-failures
>Reporter: Luke Cwik
>Assignee: Pablo Estrada
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
>  
> [PR 8717|https://github.com/apache/beam/pull/8717] added 
> ReifyWindow.test_window which fails on the DirectRunner.
> {code:java}
> ERROR:root:Exception at bundle 
> , 
> due to an exception.
>  Traceback (most recent call last):
>  File "apache_beam/runners/direct/executor.py", line 343, in call
>  finish_state)
>  File "apache_beam/runners/direct/executor.py", line 380, in attempt_call
>  evaluator.process_element(value)
>  File "apache_beam/runners/direct/transform_evaluator.py", line 636, in 
> process_element
>  self.runner.process(element)
>  File "apache_beam/runners/common.py", line 780, in 
> apache_beam.runners.common.DoFnRunner.process
>  def process(self, windowed_value):
>  File "apache_beam/runners/common.py", line 784, in 
> apache_beam.runners.common.DoFnRunner.process
>  self._reraise_augmented(exn)
>  File "apache_beam/runners/common.py", line 851, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
>  raise_with_traceback(new_exn)
>  File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>  return self.do_fn_invoker.invoke_process(windowed_value)
>  File "apache_beam/runners/common.py", line 453, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
>  output_processor.process_outputs(
>  File "apache_beam/runners/common.py", line 915, in 
> apache_beam.runners.common._OutputProcessor.process_outputs
>  self.window_fn.assign(assign_context))
>  File "apache_beam/transforms/util.py", line 557, in assign
>  'assign_context.window should not be None. '
> ValueError: assign_context.window should not be None. This might be due to a 
> DoFn returning a TimestampedValue. [while running 'add_timestamps2']
> Traceback (most recent call last):
>  File "apache_beam/transforms/util_test.py", line 501, in test_window
>  assert_that(reified_pc, equal_to(expected), reify_windows=True)
>  File "apache_beam/pipeline.py", line 426, in __exit__
>  self.run().wait_until_finish()
>  File "apache_beam/testing/test_pipeline.py", line 109, in run
>  state = result.wait_until_finish()
>  File "apache_beam/runners/direct/direct_runner.py", line 430, in 
> wait_until_finish
>  self._executor.await_completion()
>  File "apache_beam/runners/direct/executor.py", line 400, in await_completion
>  self._executor.await_completion()
>  File "apache_beam/runners/direct/executor.py", line 446, in await_completion
>  raise_(t, v, tb)
>  File "apache_beam/runners/direct/executor.py", line 343, in call
>  finish_state)
>  File "apache_beam/runners/direct/executor.py", line 380, in attempt_call
>  evaluator.process_element(value)
>  File "apache_beam/runners/direct/transform_evaluator.py", line 636, in 
> process_element
>  self.runner.process(element)
>  File "apache_beam/runners/common.py", line 780, in 
> apache_beam.runners.common.DoFnRunner.process
>  def process(self, windowed_value):
>  File "apache_beam/runners/common.py", line 784, in 
> apache_beam.runners.common.DoFnRunner.process
>  self._reraise_augmented(exn)
>  File "apache_beam/runners/common.py", line 851, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
>  raise_with_traceback(new_exn)
>  File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>  return self.do_fn_invoker.invoke_process(windowed_value)
>  File "apache_beam/runners/common.py", line 454, 

[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=262358=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262358
 ]

ASF GitHub Bot logged work on BEAM-4046:


Author: ASF GitHub Bot
Created on: 18/Jun/19 14:58
Start Date: 18/Jun/19 14:58
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8881: [BEAM-4046] revert 
PR 8581, migrate jenkins tests to use directory based names instead of artifact 
names
URL: https://github.com/apache/beam/pull/8881#issuecomment-503174806
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262358)
Time Spent: 33.5h  (was: 33h 20m)

> Decouple gradle project names and maven artifact ids
> 
>
> Key: BEAM-4046
> URL: https://issues.apache.org/jira/browse/BEAM-4046
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Michael Luckey
>Priority: Major
>  Time Spent: 33.5h
>  Remaining Estimate: 0h
>
> In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. 
> It is clumsy and requires a hacky settings.gradle that is not idiomatic.
> In our second draft, we changed them to names that work well with Gradle, 
> like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky.
> In our third draft, we regressed to the first draft to get the Maven artifact 
> ids right.
> These should be able to be decoupled. It seems there are many StackOverflow 
> questions on the subject.
> Since it is unidiomatic and a poor user experience, if it does turn out to be 
> mandatory then it needs to be documented inline everywhere - the 
> settings.gradle should say why it is so bizarre, and each build.gradle should 
> indicate what its project id is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5519) Spark Streaming Duplicated Encoding/Decoding Effort

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5519?focusedWorklogId=262347=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262347
 ]

ASF GitHub Bot logged work on BEAM-5519:


Author: ASF GitHub Bot
Created on: 18/Jun/19 14:48
Start Date: 18/Jun/19 14:48
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #6511: [BEAM-5519] Remove 
call to groupByKey in Spark Streaming.
URL: https://github.com/apache/beam/pull/6511#issuecomment-503169824
 
 
   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262347)
Time Spent: 5h 20m  (was: 5h 10m)

> Spark Streaming Duplicated Encoding/Decoding Effort
> ---
>
> Key: BEAM-5519
> URL: https://issues.apache.org/jira/browse/BEAM-5519
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Winkelman
>Assignee: Kyle Winkelman
>Priority: Major
>  Labels: spark, spark-streaming
> Fix For: 2.15.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> When using the SparkRunner in streaming mode. There is a call to groupByKey 
> followed by a call to updateStateByKey. BEAM-1815 fixed an issue where this 
> used to cause 2 shuffles but it still causes 2 encode/decode cycles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7018) Regex transform for Python SDK

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7018?focusedWorklogId=262345=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262345
 ]

ASF GitHub Bot logged work on BEAM-7018:


Author: ASF GitHub Bot
Created on: 18/Jun/19 14:46
Start Date: 18/Jun/19 14:46
Worklog Time Spent: 10m 
  Work Description: mszb commented on issue #8859: [BEAM-7018] Added Regex 
transform for PythonSDK
URL: https://github.com/apache/beam/pull/8859#issuecomment-503169078
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262345)
Time Spent: 1.5h  (was: 1h 20m)

> Regex transform for Python SDK
> --
>
> Key: BEAM-7018
> URL: https://issues.apache.org/jira/browse/BEAM-7018
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rose Nguyen
>Assignee: Shehzaad Nakhoda
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> PTransorms to use Regular Expressions to process elements in a PCollection
> It should offer the same API as its Java counterpart: 
> [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Regex.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7585) Ipython usage raises AttributeError

2019-06-18 Thread SBlackwell (JIRA)
SBlackwell created BEAM-7585:


 Summary: Ipython usage raises AttributeError
 Key: BEAM-7585
 URL: https://issues.apache.org/jira/browse/BEAM-7585
 Project: Beam
  Issue Type: Bug
  Components: runner-direct
Reporter: SBlackwell


[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/display/display_manager.py#L36]

import IPython

_display_progress = IPython.display.display

 

This doesn't work:

In [1]: import IPython; IPython.display.display('test')
---
AttributeError Traceback (most recent call last)
 in ()
> 1 import IPython; IPython.display.display('test')

AttributeError: 'module' object has no attribute 'display'
In [2]: from IPython import display; display.display('test')
'test'
In [3]: import IPython; IPython.display.display('test')
'test'

 

 

Should be:

import IPython

from IPython import display

_display_progress = display.display



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=262321=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262321
 ]

ASF GitHub Bot logged work on BEAM-4046:


Author: ASF GitHub Bot
Created on: 18/Jun/19 14:08
Start Date: 18/Jun/19 14:08
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8881: [BEAM-4046] revert 
PR 8581, migrate jenkins tests to use directory based names instead of artifact 
names
URL: https://github.com/apache/beam/pull/8881#issuecomment-503152006
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262321)
Time Spent: 33h 20m  (was: 33h 10m)

> Decouple gradle project names and maven artifact ids
> 
>
> Key: BEAM-4046
> URL: https://issues.apache.org/jira/browse/BEAM-4046
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Michael Luckey
>Priority: Major
>  Time Spent: 33h 20m
>  Remaining Estimate: 0h
>
> In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. 
> It is clumsy and requires a hacky settings.gradle that is not idiomatic.
> In our second draft, we changed them to names that work well with Gradle, 
> like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky.
> In our third draft, we regressed to the first draft to get the Maven artifact 
> ids right.
> These should be able to be decoupled. It seems there are many StackOverflow 
> questions on the subject.
> Since it is unidiomatic and a poor user experience, if it does turn out to be 
> mandatory then it needs to be documented inline everywhere - the 
> settings.gradle should say why it is so bizarre, and each build.gradle should 
> indicate what its project id is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7018) Regex transform for Python SDK

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7018?focusedWorklogId=262257=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262257
 ]

ASF GitHub Bot logged work on BEAM-7018:


Author: ASF GitHub Bot
Created on: 18/Jun/19 12:23
Start Date: 18/Jun/19 12:23
Worklog Time Spent: 10m 
  Work Description: mszb commented on issue #8859: [BEAM-7018] Added Regex 
transform for PythonSDK
URL: https://github.com/apache/beam/pull/8859#issuecomment-503080279
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262257)
Time Spent: 1h 20m  (was: 1h 10m)

> Regex transform for Python SDK
> --
>
> Key: BEAM-7018
> URL: https://issues.apache.org/jira/browse/BEAM-7018
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rose Nguyen
>Assignee: Shehzaad Nakhoda
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> PTransorms to use Regular Expressions to process elements in a PCollection
> It should offer the same API as its Java counterpart: 
> [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Regex.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3645) Support multi-process execution on the FnApiRunner

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3645?focusedWorklogId=262204=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262204
 ]

ASF GitHub Bot logged work on BEAM-3645:


Author: ASF GitHub Bot
Created on: 18/Jun/19 10:52
Start Date: 18/Jun/19 10:52
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #8769: [WIP] 
[BEAM-3645] support multi processes for Python FnApiRunner with 
EmbeddedGrpcWorkerHandler
URL: https://github.com/apache/beam/pull/8769#discussion_r294728596
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -1418,6 +1449,51 @@ def process_bundle(self, inputs, expected_outputs, 
parallel_uid_counter=None):
 
 return result, split_results
 
+class ParallelBundleManager(BundleManager):
+  _uid_counter = 0
+  def process_bundle(self, inputs, expected_outputs):
+input_value = list(inputs.values())[0]
+if isinstance(input_value, list):
 
 Review comment:
   I was suggesting that we create a newclass (that could subclass list if we 
want, but has a partition method) to accomplish this. This way the fact that 
there is parallelism doesn't leak through the stack up and down (e.g the 
BundleManager code can be used unchanged, rather than being passed all the 
inputs and then a(n easy to forget) flag of which ones to ignore, and also 
simplifies the Buffer class in that it doesn't have an extra attribute 
redundantly remembering the parallelism of the context it must be used in (plus 
all the locking, state-tracking, sleeping etc.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262204)
Time Spent: 5h 10m  (was: 5h)

> Support multi-process execution on the FnApiRunner
> --
>
> Key: BEAM-3645
> URL: https://issues.apache.org/jira/browse/BEAM-3645
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Charles Chen
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> https://issues.apache.org/jira/browse/BEAM-3644 gave us a 15x performance 
> gain over the previous DirectRunner.  We can do even better in multi-core 
> environments by supporting multi-process execution in the FnApiRunner, to 
> scale past Python GIL limitations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3645) Support multi-process execution on the FnApiRunner

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3645?focusedWorklogId=262205=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262205
 ]

ASF GitHub Bot logged work on BEAM-3645:


Author: ASF GitHub Bot
Created on: 18/Jun/19 10:52
Start Date: 18/Jun/19 10:52
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #8769: [WIP] 
[BEAM-3645] support multi processes for Python FnApiRunner with 
EmbeddedGrpcWorkerHandler
URL: https://github.com/apache/beam/pull/8769#discussion_r294728709
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -1418,6 +1449,51 @@ def process_bundle(self, inputs, expected_outputs, 
parallel_uid_counter=None):
 
 return result, split_results
 
+class ParallelBundleManager(BundleManager):
+  _uid_counter = 0
+  def process_bundle(self, inputs, expected_outputs):
+input_value = list(inputs.values())[0]
 
 Review comment:
   This may not be the case in the future. If we want to make this assumption, 
we should at least assert it. But preferable IMHO to be general as that's not 
too hard. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262205)
Time Spent: 5h 20m  (was: 5h 10m)

> Support multi-process execution on the FnApiRunner
> --
>
> Key: BEAM-3645
> URL: https://issues.apache.org/jira/browse/BEAM-3645
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Charles Chen
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> https://issues.apache.org/jira/browse/BEAM-3644 gave us a 15x performance 
> gain over the previous DirectRunner.  We can do even better in multi-core 
> environments by supporting multi-process execution in the FnApiRunner, to 
> scale past Python GIL limitations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7561) HdfsFileSystem is unable to match a directory

2019-06-18 Thread David Moravek (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866353#comment-16866353
 ] 

David Moravek commented on BEAM-7561:
-

at the conference, but I’ll do my best to finish it tonight (CET), so you can 
cut the branch tomorrow

> HdfsFileSystem is unable to match a directory
> -
>
> Key: BEAM-7561
> URL: https://issues.apache.org/jira/browse/BEAM-7561
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hadoop-file-system
>Affects Versions: 2.13.0
>Reporter: David Moravek
>Assignee: David Moravek
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [FileSystems.match|https://beam.apache.org/releases/javadoc/2.3.0/org/apache/beam/sdk/io/FileSystems.html#match-java.util.List-]
>  method should be able to match a directory according to javadoc. Unlike 
> _HdfsFileSystems_, _LocalFileSystems_ behaves as I would so I'm assuming this 
> is a bug on _HdfsFileSystems_ side.
> *Current behavior:*
> _HadoopFileSystem.match("hdfs:///tmp/dir")_ returns a _MatchResult_ with an 
> empty metadata
> *Expected behavior:*
> _HadoopFileSystem.match("hdfs:///tmp/dir")_ returns a _MatchResult_ with 
> metadata about the directory
> **



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7577) Allow the use of ValueProviders in datastore.v1new.datastoreio.ReadFromDatastore query

2019-06-18 Thread EDjur (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

EDjur updated BEAM-7577:

Description: 
The current implementation of ReadFromDatastore does not support specifying the 
query parameter at runtime. This could potentially be fixed through the usage 
of a ValueProvider to specify and build the Datastore query.

Allowing specifying the query at runtime makes it easier to use dynamic queries 
in Dataflow templates. Currently, there is no way to have a Dataflow template 
that includes a dynamic query (such as filtering by a timestamp or similar).

  was:
The current implementation of ReadFromDatastore does not support specifying the 
query parameter at runtime. This could potentially be fixed through the usage 
of a ValueProvider to specify and build the Datastore query at runtime.

 

Allowing specifying the query at runtime makes it easier to use dynamic queries 
in Dataflow templates. Currently, there is no way to have a Dataflow template 
that includes a dynamic query (such as filtering by a timestamp or similar).


> Allow the use of ValueProviders in 
> datastore.v1new.datastoreio.ReadFromDatastore query
> --
>
> Key: BEAM-7577
> URL: https://issues.apache.org/jira/browse/BEAM-7577
> Project: Beam
>  Issue Type: New Feature
>  Components: io-python-gcp
>Affects Versions: 2.13.0
>Reporter: EDjur
>Priority: Minor
>
> The current implementation of ReadFromDatastore does not support specifying 
> the query parameter at runtime. This could potentially be fixed through the 
> usage of a ValueProvider to specify and build the Datastore query.
> Allowing specifying the query at runtime makes it easier to use dynamic 
> queries in Dataflow templates. Currently, there is no way to have a Dataflow 
> template that includes a dynamic query (such as filtering by a timestamp or 
> similar).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7577) Allow the use of ValueProviders in datastore.v1new.datastoreio.ReadFromDatastore query

2019-06-18 Thread EDjur (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866332#comment-16866332
 ] 

EDjur commented on BEAM-7577:
-

I'd be happy to give a go att adding this functionality for the query filters 
but I might need someone to point me in the right direction first.

> Allow the use of ValueProviders in 
> datastore.v1new.datastoreio.ReadFromDatastore query
> --
>
> Key: BEAM-7577
> URL: https://issues.apache.org/jira/browse/BEAM-7577
> Project: Beam
>  Issue Type: New Feature
>  Components: io-python-gcp
>Affects Versions: 2.13.0
>Reporter: EDjur
>Priority: Minor
>
> The current implementation of ReadFromDatastore does not support specifying 
> the query parameter at runtime. This could potentially be fixed through the 
> usage of a ValueProvider to specify and build the Datastore query at runtime.
>  
> Allowing specifying the query at runtime makes it easier to use dynamic 
> queries in Dataflow templates. Currently, there is no way to have a Dataflow 
> template that includes a dynamic query (such as filtering by a timestamp or 
> similar).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7579) Tests fail with "Cannot create a default bucket when --dataflowKmsKey is set."

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7579?focusedWorklogId=262070=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262070
 ]

ASF GitHub Bot logged work on BEAM-7579:


Author: ASF GitHub Bot
Created on: 18/Jun/19 06:12
Start Date: 18/Jun/19 06:12
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #8883: [BEAM-7579] 
Use bucket with default key in ITs
URL: https://github.com/apache/beam/pull/8883#discussion_r294621529
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/storage/GcsKmsKeyIT.java
 ##
 @@ -66,14 +65,16 @@ public static void setup() {
   /**
* Tests writing to gcpTempLocation with --dataflowKmsKey set on the command 
line. Verifies that
 
 Review comment:
   ```suggestion
  * Tests writing to tempLocation with --dataflowKmsKey set on the command 
line. Verifies that
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262070)
Time Spent: 50m  (was: 40m)

> Tests fail with "Cannot create a default bucket when --dataflowKmsKey is set."
> --
>
> Key: BEAM-7579
> URL: https://issues.apache.org/jira/browse/BEAM-7579
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.13.0
>Reporter: Luke Cwik
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Example run: https://builds.apache.org/job/beam_PostCommit_Java/3535/
> Failing tests:
> {code:java}
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey
> org.apache.beam.sdk.extensions.gcp.util.GcsUtilIT.testRewriteMultiPart
> {code}
> Example trace:
> {code:java}
> Error Message
> java.lang.IllegalArgumentException: Cannot create a default bucket when 
> --dataflowKmsKey is set.
> Stacktrace
> java.lang.IllegalArgumentException: Cannot create a default bucket when 
> --dataflowKmsKey is set.
>   at 
> org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.tryCreateDefaultBucket(GcpOptions.java:297)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:268)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:256)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper(ProxyInvocationHandler.java:592)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault(ProxyInvocationHandler.java:533)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:158)
>   at com.sun.proxy.$Proxy32.getGcpTempLocation(Unknown Source)
>   at 
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey(GcsKmsKeyIT.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7579) Tests fail with "Cannot create a default bucket when --dataflowKmsKey is set."

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7579?focusedWorklogId=262069=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262069
 ]

ASF GitHub Bot logged work on BEAM-7579:


Author: ASF GitHub Bot
Created on: 18/Jun/19 06:12
Start Date: 18/Jun/19 06:12
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #8883: [BEAM-7579] 
Use bucket with default key in ITs
URL: https://github.com/apache/beam/pull/8883#discussion_r294622993
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestPipelineOptions.java
 ##
 @@ -36,6 +36,14 @@
 
   void setTempRoot(String value);
 
+  /**
+   * An alternative tempRoot that has a bucket-default KMS key set. Used for 
GCP CMEK integration
+   * tests.
+   */
+  String getTempRootKms();
 
 Review comment:
   We shouldn't define this here since this is really specific to GCP.
   
   Two ways you could solve this:
   * You can create an interface called GcpTestPipelineOptions within the gcp 
extensions package and make sure it gets registered with a 
PipelineOptionsRegistrar.
   * Continue to use tempRoot but make sure the that CMEK related ITs are a 
different gradle test task (with a CMEK appropriate temp root defined) which 
filters which tests it runs to only the CMEK ones that need a different temp 
root while those ITs are excluded from any other IT suite.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262069)
Time Spent: 40m  (was: 0.5h)

> Tests fail with "Cannot create a default bucket when --dataflowKmsKey is set."
> --
>
> Key: BEAM-7579
> URL: https://issues.apache.org/jira/browse/BEAM-7579
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.13.0
>Reporter: Luke Cwik
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Example run: https://builds.apache.org/job/beam_PostCommit_Java/3535/
> Failing tests:
> {code:java}
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey
> org.apache.beam.sdk.extensions.gcp.util.GcsUtilIT.testRewriteMultiPart
> {code}
> Example trace:
> {code:java}
> Error Message
> java.lang.IllegalArgumentException: Cannot create a default bucket when 
> --dataflowKmsKey is set.
> Stacktrace
> java.lang.IllegalArgumentException: Cannot create a default bucket when 
> --dataflowKmsKey is set.
>   at 
> org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.tryCreateDefaultBucket(GcpOptions.java:297)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:268)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:256)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper(ProxyInvocationHandler.java:592)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault(ProxyInvocationHandler.java:533)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:158)
>   at com.sun.proxy.$Proxy32.getGcpTempLocation(Unknown Source)
>   at 
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey(GcsKmsKeyIT.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7579) Tests fail with "Cannot create a default bucket when --dataflowKmsKey is set."

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7579?focusedWorklogId=262068=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262068
 ]

ASF GitHub Bot logged work on BEAM-7579:


Author: ASF GitHub Bot
Created on: 18/Jun/19 06:12
Start Date: 18/Jun/19 06:12
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #8883: [BEAM-7579] 
Use bucket with default key in ITs
URL: https://github.com/apache/beam/pull/8883#discussion_r294622537
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/storage/GcsKmsKeyIT.java
 ##
 @@ -66,14 +65,16 @@ public static void setup() {
   /**
 
 Review comment:
   The `@BeforeClass` method shouldn't be needed since TestPipelineOptions 
should be automatically registered with PipelineOptionsFactory
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262068)
Time Spent: 40m  (was: 0.5h)

> Tests fail with "Cannot create a default bucket when --dataflowKmsKey is set."
> --
>
> Key: BEAM-7579
> URL: https://issues.apache.org/jira/browse/BEAM-7579
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.13.0
>Reporter: Luke Cwik
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Example run: https://builds.apache.org/job/beam_PostCommit_Java/3535/
> Failing tests:
> {code:java}
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey
> org.apache.beam.sdk.extensions.gcp.util.GcsUtilIT.testRewriteMultiPart
> {code}
> Example trace:
> {code:java}
> Error Message
> java.lang.IllegalArgumentException: Cannot create a default bucket when 
> --dataflowKmsKey is set.
> Stacktrace
> java.lang.IllegalArgumentException: Cannot create a default bucket when 
> --dataflowKmsKey is set.
>   at 
> org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.tryCreateDefaultBucket(GcpOptions.java:297)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:268)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:256)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper(ProxyInvocationHandler.java:592)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault(ProxyInvocationHandler.java:533)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:158)
>   at com.sun.proxy.$Proxy32.getGcpTempLocation(Unknown Source)
>   at 
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey(GcsKmsKeyIT.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7579) Tests fail with "Cannot create a default bucket when --dataflowKmsKey is set."

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7579?focusedWorklogId=262060=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262060
 ]

ASF GitHub Bot logged work on BEAM-7579:


Author: ASF GitHub Bot
Created on: 18/Jun/19 06:02
Start Date: 18/Jun/19 06:02
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8883: [BEAM-7579] Use 
bucket with default key in ITs
URL: https://github.com/apache/beam/pull/8883#issuecomment-502959746
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262060)
Time Spent: 0.5h  (was: 20m)

> Tests fail with "Cannot create a default bucket when --dataflowKmsKey is set."
> --
>
> Key: BEAM-7579
> URL: https://issues.apache.org/jira/browse/BEAM-7579
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.13.0
>Reporter: Luke Cwik
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Example run: https://builds.apache.org/job/beam_PostCommit_Java/3535/
> Failing tests:
> {code:java}
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey
> org.apache.beam.sdk.extensions.gcp.util.GcsUtilIT.testRewriteMultiPart
> {code}
> Example trace:
> {code:java}
> Error Message
> java.lang.IllegalArgumentException: Cannot create a default bucket when 
> --dataflowKmsKey is set.
> Stacktrace
> java.lang.IllegalArgumentException: Cannot create a default bucket when 
> --dataflowKmsKey is set.
>   at 
> org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.tryCreateDefaultBucket(GcpOptions.java:297)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:268)
>   at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:256)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper(ProxyInvocationHandler.java:592)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault(ProxyInvocationHandler.java:533)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:158)
>   at com.sun.proxy.$Proxy32.getGcpTempLocation(Unknown Source)
>   at 
> org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT.testGcsWriteWithKmsKey(GcsKmsKeyIT.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=262058=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262058
 ]

ASF GitHub Bot logged work on BEAM-4046:


Author: ASF GitHub Bot
Created on: 18/Jun/19 05:59
Start Date: 18/Jun/19 05:59
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8881: [BEAM-4046] revert 
PR 8581, migrate jenkins tests to use directory based names instead of artifact 
names
URL: https://github.com/apache/beam/pull/8881#issuecomment-502959117
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262058)
Time Spent: 33h  (was: 32h 50m)

> Decouple gradle project names and maven artifact ids
> 
>
> Key: BEAM-4046
> URL: https://issues.apache.org/jira/browse/BEAM-4046
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Michael Luckey
>Priority: Major
>  Time Spent: 33h
>  Remaining Estimate: 0h
>
> In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. 
> It is clumsy and requires a hacky settings.gradle that is not idiomatic.
> In our second draft, we changed them to names that work well with Gradle, 
> like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky.
> In our third draft, we regressed to the first draft to get the Maven artifact 
> ids right.
> These should be able to be decoupled. It seems there are many StackOverflow 
> questions on the subject.
> Since it is unidiomatic and a poor user experience, if it does turn out to be 
> mandatory then it needs to be documented inline everywhere - the 
> settings.gradle should say why it is so bizarre, and each build.gradle should 
> indicate what its project id is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=262059=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262059
 ]

ASF GitHub Bot logged work on BEAM-4046:


Author: ASF GitHub Bot
Created on: 18/Jun/19 05:59
Start Date: 18/Jun/19 05:59
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8881: [BEAM-4046] revert 
PR 8581, migrate jenkins tests to use directory based names instead of artifact 
names
URL: https://github.com/apache/beam/pull/8881#issuecomment-502959227
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262059)
Time Spent: 33h 10m  (was: 33h)

> Decouple gradle project names and maven artifact ids
> 
>
> Key: BEAM-4046
> URL: https://issues.apache.org/jira/browse/BEAM-4046
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Michael Luckey
>Priority: Major
>  Time Spent: 33h 10m
>  Remaining Estimate: 0h
>
> In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. 
> It is clumsy and requires a hacky settings.gradle that is not idiomatic.
> In our second draft, we changed them to names that work well with Gradle, 
> like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky.
> In our third draft, we regressed to the first draft to get the Maven artifact 
> ids right.
> These should be able to be decoupled. It seems there are many StackOverflow 
> questions on the subject.
> Since it is unidiomatic and a poor user experience, if it does turn out to be 
> mandatory then it needs to be documented inline everywhere - the 
> settings.gradle should say why it is so bizarre, and each build.gradle should 
> indicate what its project id is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)