[jira] [Work logged] (BEAM-9869) adding self-contained Kafka service jar for testing

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9869?focusedWorklogId=438563=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438563
 ]

ASF GitHub Bot logged work on BEAM-9869:


Author: ASF GitHub Bot
Created on: 29/May/20 04:32
Start Date: 29/May/20 04:32
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11846:
URL: https://github.com/apache/beam/pull/11846#issuecomment-635752434


   Run Python2_PVR_Flink PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438563)
Time Spent: 1h 20m  (was: 1h 10m)

> adding self-contained Kafka service jar for testing
> ---
>
> Key: BEAM-9869
> URL: https://issues.apache.org/jira/browse/BEAM-9869
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> adding self-contained Kafka service jar for testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10125) adding cross-language KafkaIO integration test

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10125?focusedWorklogId=438564=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438564
 ]

ASF GitHub Bot logged work on BEAM-10125:
-

Author: ASF GitHub Bot
Created on: 29/May/20 04:32
Start Date: 29/May/20 04:32
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11847:
URL: https://github.com/apache/beam/pull/11847#issuecomment-635752517


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438564)
Time Spent: 2h 20m  (was: 2h 10m)

> adding cross-language KafkaIO integration test
> --
>
> Key: BEAM-10125
> URL: https://issues.apache.org/jira/browse/BEAM-10125
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> adding cross-language KafkaIO integration test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9869) adding self-contained Kafka service jar for testing

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9869?focusedWorklogId=438562=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438562
 ]

ASF GitHub Bot logged work on BEAM-9869:


Author: ASF GitHub Bot
Created on: 29/May/20 04:31
Start Date: 29/May/20 04:31
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11846:
URL: https://github.com/apache/beam/pull/11846#issuecomment-635752368


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438562)
Time Spent: 1h 10m  (was: 1h)

> adding self-contained Kafka service jar for testing
> ---
>
> Key: BEAM-9869
> URL: https://issues.apache.org/jira/browse/BEAM-9869
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> adding self-contained Kafka service jar for testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10125) adding cross-language KafkaIO integration test

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10125?focusedWorklogId=438560=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438560
 ]

ASF GitHub Bot logged work on BEAM-10125:
-

Author: ASF GitHub Bot
Created on: 29/May/20 04:31
Start Date: 29/May/20 04:31
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11847:
URL: https://github.com/apache/beam/pull/11847#issuecomment-635752138


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438560)
Time Spent: 2h  (was: 1h 50m)

> adding cross-language KafkaIO integration test
> --
>
> Key: BEAM-10125
> URL: https://issues.apache.org/jira/browse/BEAM-10125
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> adding cross-language KafkaIO integration test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10125) adding cross-language KafkaIO integration test

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10125?focusedWorklogId=438561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438561
 ]

ASF GitHub Bot logged work on BEAM-10125:
-

Author: ASF GitHub Bot
Created on: 29/May/20 04:31
Start Date: 29/May/20 04:31
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11847:
URL: https://github.com/apache/beam/pull/11847#issuecomment-635752197


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438561)
Time Spent: 2h 10m  (was: 2h)

> adding cross-language KafkaIO integration test
> --
>
> Key: BEAM-10125
> URL: https://issues.apache.org/jira/browse/BEAM-10125
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> adding cross-language KafkaIO integration test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10005) Unable to use ApproximateQuantiles.globally/ApproximateUnique.globally when inputs not windowed by GlobalWindows

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10005?focusedWorklogId=438554=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438554
 ]

ASF GitHub Bot logged work on BEAM-10005:
-

Author: ASF GitHub Bot
Created on: 29/May/20 03:22
Start Date: 29/May/20 03:22
Worklog Time Spent: 10m 
  Work Description: darshanj commented on pull request #11855:
URL: https://github.com/apache/beam/pull/11855#issuecomment-635734663


   R: @kennknowles



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438554)
Time Spent: 20m  (was: 10m)

> Unable to use ApproximateQuantiles.globally/ApproximateUnique.globally when 
> inputs not windowed by GlobalWindows
> 
>
> Key: BEAM-10005
> URL: https://issues.apache.org/jira/browse/BEAM-10005
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.20.0
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Unable to use ApproximateQuantiles.globally or ApproximateUnique.globally 
> with input windowed not using GlobalWindows.
> To make it run we need to set either 
> {code:java}
> .withoutDefaults()
> {code}
> or
> {code:java}
> .asSingletonView()
> {code}
> Currently we can't call any of the above on 
> ApproximateQuantiles.globally()/ApproximateUnique.globally as it does not 
> return underlying Combine.globally, but PTransform or Globally in case of 
> ApproximateUnique.
> Example failing case:
> {code:java}
> PCollection elements = p.apply(GenerateSequence.from(0).to(100)
>   .withRate(1,Duration.millis(1)).withTimestampFn(Instant::new));
>   PCollection> input = elements
>   
> .apply(Window.into(SlidingWindows.of(Duration.millis(3)).every(Duration.millis(1
>   .apply(ApproximateQuantiles.globally(17));
> {code}
> It throws expected error from internal Combine.globally() transform:
> {code:java}
> Default values are not supported in Combine.globally() if the input 
> PCollection is not windowed by GlobalWindows. Instead, use 
> Combine.globally().withoutDefaults() to output an empty PCollection if the 
> input PCollection is empty, or Combine.globally().asSingletonView() to get 
> the default output of the CombineFn if the input PCollection is empty.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10005) Unable to use ApproximateQuantiles.globally/ApproximateUnique.globally when inputs not windowed by GlobalWindows

2020-05-28 Thread Darshan Jani (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119239#comment-17119239
 ] 

Darshan Jani commented on BEAM-10005:
-

Hi Kenneth,

I have created a PR for this. Please review. Thanks


> Unable to use ApproximateQuantiles.globally/ApproximateUnique.globally when 
> inputs not windowed by GlobalWindows
> 
>
> Key: BEAM-10005
> URL: https://issues.apache.org/jira/browse/BEAM-10005
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.20.0
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Unable to use ApproximateQuantiles.globally or ApproximateUnique.globally 
> with input windowed not using GlobalWindows.
> To make it run we need to set either 
> {code:java}
> .withoutDefaults()
> {code}
> or
> {code:java}
> .asSingletonView()
> {code}
> Currently we can't call any of the above on 
> ApproximateQuantiles.globally()/ApproximateUnique.globally as it does not 
> return underlying Combine.globally, but PTransform or Globally in case of 
> ApproximateUnique.
> Example failing case:
> {code:java}
> PCollection elements = p.apply(GenerateSequence.from(0).to(100)
>   .withRate(1,Duration.millis(1)).withTimestampFn(Instant::new));
>   PCollection> input = elements
>   
> .apply(Window.into(SlidingWindows.of(Duration.millis(3)).every(Duration.millis(1
>   .apply(ApproximateQuantiles.globally(17));
> {code}
> It throws expected error from internal Combine.globally() transform:
> {code:java}
> Default values are not supported in Combine.globally() if the input 
> PCollection is not windowed by GlobalWindows. Instead, use 
> Combine.globally().withoutDefaults() to output an empty PCollection if the 
> input PCollection is empty, or Combine.globally().asSingletonView() to get 
> the default output of the CombineFn if the input PCollection is empty.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10005) Unable to use ApproximateQuantiles.globally/ApproximateUnique.globally when inputs not windowed by GlobalWindows

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10005?focusedWorklogId=438553=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438553
 ]

ASF GitHub Bot logged work on BEAM-10005:
-

Author: ASF GitHub Bot
Created on: 29/May/20 03:21
Start Date: 29/May/20 03:21
Worklog Time Spent: 10m 
  Work Description: darshanj opened a new pull request #11855:
URL: https://github.com/apache/beam/pull/11855


   Added api combineFn that can be used for inputs windowed with non global 
windows
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 

[jira] [Assigned] (BEAM-1754) Will Dataflow ever support Node.js with an SDK similar to Java or Python?

2020-05-28 Thread Ashwin Ramaswami (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Ramaswami reassigned BEAM-1754:
--

Assignee: Ashwin Ramaswami

> Will Dataflow ever support Node.js with an SDK similar to Java or Python?
> -
>
> Key: BEAM-1754
> URL: https://issues.apache.org/jira/browse/BEAM-1754
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-ideas
>Reporter: Diego Zuluaga
>Assignee: Ashwin Ramaswami
>Priority: P2
>  Labels: node.js
>
> I like the philosophy behind DataFlow and found the Java and Python samples 
> highly comprehensible. However, I have to admit that for most Node.js 
> developers who have little background on typed languages and are used to get 
> up to speed with frameworks incredibly fast, learning Dataflow might take 
> some learning curve that they/we're not used to. So, I wonder if at any point 
> in time Dataflow will provide a Node.js SDK. Maybe this is out of the 
> question, but I wanted to run it by the team as it would be awesome to have 
> something along these lines!
> Thanks,
> Diego
> Question originaly posted in SO:
> http://stackoverflow.com/questions/42893436/will-dataflow-ever-support-node-js-with-and-sdk-similar-to-java-or-python



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=438551=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438551
 ]

ASF GitHub Bot logged work on BEAM-10097:
-

Author: ASF GitHub Bot
Created on: 29/May/20 02:38
Start Date: 29/May/20 02:38
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-635722845


   Run Spark ValidatesRunner



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438551)
Time Spent: 3.5h  (was: 3h 20m)

> Migrate PCollection views to use both iterable and multimap 
> materializations/access patterns
> 
>
> Key: BEAM-10097
> URL: https://issues.apache.org/jira/browse/BEAM-10097
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, 
> map, multimap.
> We should be using the primitive views (iterable, multimap) directly without 
> going through the naive mapping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10115) Staging requirements.txt fails but staging setup.py succeeds

2020-05-28 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10115:
---
Status: Open  (was: Triage Needed)

> Staging requirements.txt fails but staging setup.py succeeds
> 
>
> Key: BEAM-10115
> URL: https://issues.apache.org/jira/browse/BEAM-10115
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Kenneth Knowles
>Priority: P2
>
> User reports on StackOverflow: 
> https://stackoverflow.com/questions/62032382/dataflow-fails-when-i-add-requirements-txt-python
> The issue appears to be a problem with staging, and a difference between 
> using `requirements.txt` and `setup.py` for some reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9869) adding self-contained Kafka service jar for testing

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9869?focusedWorklogId=438540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438540
 ]

ASF GitHub Bot logged work on BEAM-9869:


Author: ASF GitHub Bot
Created on: 29/May/20 01:21
Start Date: 29/May/20 01:21
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11846:
URL: https://github.com/apache/beam/pull/11846#issuecomment-635702305


   LGTM. Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438540)
Time Spent: 40m  (was: 0.5h)

> adding self-contained Kafka service jar for testing
> ---
>
> Key: BEAM-9869
> URL: https://issues.apache.org/jira/browse/BEAM-9869
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> adding self-contained Kafka service jar for testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9869) adding self-contained Kafka service jar for testing

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9869?focusedWorklogId=438541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438541
 ]

ASF GitHub Bot logged work on BEAM-9869:


Author: ASF GitHub Bot
Created on: 29/May/20 01:21
Start Date: 29/May/20 01:21
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11846:
URL: https://github.com/apache/beam/pull/11846#issuecomment-635702360


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438541)
Time Spent: 50m  (was: 40m)

> adding self-contained Kafka service jar for testing
> ---
>
> Key: BEAM-9869
> URL: https://issues.apache.org/jira/browse/BEAM-9869
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> adding self-contained Kafka service jar for testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9869) adding self-contained Kafka service jar for testing

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9869?focusedWorklogId=438543=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438543
 ]

ASF GitHub Bot logged work on BEAM-9869:


Author: ASF GitHub Bot
Created on: 29/May/20 01:21
Start Date: 29/May/20 01:21
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11846:
URL: https://github.com/apache/beam/pull/11846#issuecomment-635702404


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438543)
Time Spent: 1h  (was: 50m)

> adding self-contained Kafka service jar for testing
> ---
>
> Key: BEAM-9869
> URL: https://issues.apache.org/jira/browse/BEAM-9869
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> adding self-contained Kafka service jar for testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9964) Setting workerCacheMb to make its way to the WindmillStateCache Constructor

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9964?focusedWorklogId=438538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438538
 ]

ASF GitHub Bot logged work on BEAM-9964:


Author: ASF GitHub Bot
Created on: 29/May/20 01:03
Start Date: 29/May/20 01:03
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on pull request #11849:
URL: https://github.com/apache/beam/pull/11849#issuecomment-635697264


   =/ looks like the dataflow precommit succeeded but the API call to update it 
here failed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438538)
Time Spent: 5h 10m  (was: 5h)

> Setting workerCacheMb to make its way to the WindmillStateCache Constructor
> ---
>
> Key: BEAM-9964
> URL: https://issues.apache.org/jira/browse/BEAM-9964
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Omar Ismail
>Assignee: Omar Ismail
>Priority: P3
> Fix For: 2.22.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Setting --workerCacheMB seems to affect batch pipelines only. For Streaming, 
> the cache seems to be hardcoded to 100Mb [1]. If possible, I would like to 
> make it allowable to change the cache value in Streaming when setting 
> -workerCacheMB.
> I've never made changes to the Beam SDK, so I am super excited to work on 
> this! 
>  
> [[1] 
> https://github.com/apache/beam/blob/5e659bb80bcbf70795f6806e05a255ee72706d9f/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java#L73|https://github.com/apache/beam/blob/5e659bb80bcbf70795f6806e05a255ee72706d9f/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java#L73]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9785) Add PostCommit suite for Python 3.8

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9785?focusedWorklogId=438536=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438536
 ]

ASF GitHub Bot logged work on BEAM-9785:


Author: ASF GitHub Bot
Created on: 29/May/20 01:02
Start Date: 29/May/20 01:02
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #11788:
URL: https://github.com/apache/beam/pull/11788#issuecomment-635696787


   @epicfaace Thanks for your initiative to help with Python 3.8.
   Please see the discussion on introducing high-priority/low priority 
versions: 
https://lists.apache.org/thread.html/r643cae69e5be136e6bca75bf896991fa313f79623ca056271588c87d%40%3Cdev.beam.apache.org%3E
   cc: Yoshiki (@lazylynx) who is was also working on this and may have some 
thoughts how to best integrate these changes.   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438536)
Time Spent: 1h 10m  (was: 1h)

> Add PostCommit suite for Python 3.8
> ---
>
> Key: BEAM-9785
> URL: https://issues.apache.org/jira/browse/BEAM-9785
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: yoshiki obata
>Assignee: Ashwin Ramaswami
>Priority: P2
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Add PostCommit suites for Python 3.8.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9964) Setting workerCacheMb to make its way to the WindmillStateCache Constructor

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9964?focusedWorklogId=438535=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438535
 ]

ASF GitHub Bot logged work on BEAM-9964:


Author: ASF GitHub Bot
Created on: 29/May/20 01:00
Start Date: 29/May/20 01:00
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on pull request #11849:
URL: https://github.com/apache/beam/pull/11849#issuecomment-635696194


   > gahh so sorry that I missed this. I guess you did have to end up 
contributing this : )
   
   heh no problem, teamwork! :highfive:
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438535)
Time Spent: 5h  (was: 4h 50m)

> Setting workerCacheMb to make its way to the WindmillStateCache Constructor
> ---
>
> Key: BEAM-9964
> URL: https://issues.apache.org/jira/browse/BEAM-9964
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Omar Ismail
>Assignee: Omar Ismail
>Priority: P3
> Fix For: 2.22.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Setting --workerCacheMB seems to affect batch pipelines only. For Streaming, 
> the cache seems to be hardcoded to 100Mb [1]. If possible, I would like to 
> make it allowable to change the cache value in Streaming when setting 
> -workerCacheMB.
> I've never made changes to the Beam SDK, so I am super excited to work on 
> this! 
>  
> [[1] 
> https://github.com/apache/beam/blob/5e659bb80bcbf70795f6806e05a255ee72706d9f/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java#L73|https://github.com/apache/beam/blob/5e659bb80bcbf70795f6806e05a255ee72706d9f/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java#L73]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10125) adding cross-language KafkaIO integration test

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10125?focusedWorklogId=438532=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438532
 ]

ASF GitHub Bot logged work on BEAM-10125:
-

Author: ASF GitHub Bot
Created on: 29/May/20 00:58
Start Date: 29/May/20 00:58
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11847:
URL: https://github.com/apache/beam/pull/11847#issuecomment-635695687


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438532)
Time Spent: 1h 50m  (was: 1h 40m)

> adding cross-language KafkaIO integration test
> --
>
> Key: BEAM-10125
> URL: https://issues.apache.org/jira/browse/BEAM-10125
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> adding cross-language KafkaIO integration test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10078) uniquify Dataflow specific jars when staging

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10078?focusedWorklogId=438534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438534
 ]

ASF GitHub Bot logged work on BEAM-10078:
-

Author: ASF GitHub Bot
Created on: 29/May/20 00:58
Start Date: 29/May/20 00:58
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11814:
URL: https://github.com/apache/beam/pull/11814#issuecomment-635695779


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438534)
Time Spent: 2h 40m  (was: 2.5h)

> uniquify Dataflow specific jars when staging
> 
>
> Key: BEAM-10078
> URL: https://issues.apache.org/jira/browse/BEAM-10078
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> After BEAM-9383, Dataflow specific jars (dataflow-worker.jar, windmill_main) 
> could be overwritten when two or more jobs share the same staging location. 
> Since they 1) should have specific predefined names AND 2) should have unique 
> location for avoiding collision, they need special handling when staging 
> artifacts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10078) uniquify Dataflow specific jars when staging

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10078?focusedWorklogId=438524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438524
 ]

ASF GitHub Bot logged work on BEAM-10078:
-

Author: ASF GitHub Bot
Created on: 28/May/20 23:55
Start Date: 28/May/20 23:55
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11814:
URL: https://github.com/apache/beam/pull/11814#issuecomment-635677737







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438524)
Time Spent: 2.5h  (was: 2h 20m)

> uniquify Dataflow specific jars when staging
> 
>
> Key: BEAM-10078
> URL: https://issues.apache.org/jira/browse/BEAM-10078
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> After BEAM-9383, Dataflow specific jars (dataflow-worker.jar, windmill_main) 
> could be overwritten when two or more jobs share the same staging location. 
> Since they 1) should have specific predefined names AND 2) should have unique 
> location for avoiding collision, they need special handling when staging 
> artifacts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10125) adding cross-language KafkaIO integration test

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10125?focusedWorklogId=438523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438523
 ]

ASF GitHub Bot logged work on BEAM-10125:
-

Author: ASF GitHub Bot
Created on: 28/May/20 23:51
Start Date: 28/May/20 23:51
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11847:
URL: https://github.com/apache/beam/pull/11847#issuecomment-635676513


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438523)
Time Spent: 1h 40m  (was: 1.5h)

> adding cross-language KafkaIO integration test
> --
>
> Key: BEAM-10125
> URL: https://issues.apache.org/jira/browse/BEAM-10125
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> adding cross-language KafkaIO integration test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10125) adding cross-language KafkaIO integration test

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10125?focusedWorklogId=438522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438522
 ]

ASF GitHub Bot logged work on BEAM-10125:
-

Author: ASF GitHub Bot
Created on: 28/May/20 23:51
Start Date: 28/May/20 23:51
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11847:
URL: https://github.com/apache/beam/pull/11847#issuecomment-635676452


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438522)
Time Spent: 1.5h  (was: 1h 20m)

> adding cross-language KafkaIO integration test
> --
>
> Key: BEAM-10125
> URL: https://issues.apache.org/jira/browse/BEAM-10125
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> adding cross-language KafkaIO integration test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9785) Add PostCommit suite for Python 3.8

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9785?focusedWorklogId=438520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438520
 ]

ASF GitHub Bot logged work on BEAM-9785:


Author: ASF GitHub Bot
Created on: 28/May/20 23:48
Start Date: 28/May/20 23:48
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11788:
URL: https://github.com/apache/beam/pull/11788#issuecomment-635675870


   /cc @tvalentyn 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438520)
Time Spent: 1h  (was: 50m)

> Add PostCommit suite for Python 3.8
> ---
>
> Key: BEAM-9785
> URL: https://issues.apache.org/jira/browse/BEAM-9785
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: yoshiki obata
>Assignee: Ashwin Ramaswami
>Priority: P2
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Add PostCommit suites for Python 3.8.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9500) Refactor load tests

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9500?focusedWorklogId=438518=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438518
 ]

ASF GitHub Bot logged work on BEAM-9500:


Author: ASF GitHub Bot
Created on: 28/May/20 23:45
Start Date: 28/May/20 23:45
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11181:
URL: https://github.com/apache/beam/pull/11181#issuecomment-635674912


   @piotr-szuberski - what is the next step for this PR? Is it still active? 
Should we close it?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438518)
Time Spent: 50m  (was: 40m)

> Refactor load tests
> ---
>
> Key: BEAM-9500
> URL: https://issues.apache.org/jira/browse/BEAM-9500
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michał Walenia
>Assignee: Piotr Szuberski
>Priority: P3
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> `make {{LoadTest}} parameterized instead of a base class.
> Each subclass of {{LoadTest}} is really just the main {{loadTest}} function 
> and really that function is about the same as writing a {{PTransform}}. If 
> you eliminate subclassing you can have {{LoadTest}} own the pipeline setup 
> with so it will never be possible to forget or mess up 
> {{readSourceFromOptions}} and {{ParDo.of(runtimeMonitor)}}. It will be less 
> repeat boilerplate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=438517=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438517
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 28/May/20 23:42
Start Date: 28/May/20 23:42
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #11632:
URL: https://github.com/apache/beam/pull/11632#issuecomment-635673990


   LGTM!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438517)
Time Spent: 85h 10m  (was: 85h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: P2
>  Time Spent: 85h 10m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10078) uniquify Dataflow specific jars when staging

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10078?focusedWorklogId=438516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438516
 ]

ASF GitHub Bot logged work on BEAM-10078:
-

Author: ASF GitHub Bot
Created on: 28/May/20 23:42
Start Date: 28/May/20 23:42
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11814:
URL: https://github.com/apache/beam/pull/11814#issuecomment-635674009


   I don't think so. We changed from using "key=value" strings to StagedFile 
objects in https://github.com/apache/beam/pull/11039/files.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438516)
Time Spent: 2h 20m  (was: 2h 10m)

> uniquify Dataflow specific jars when staging
> 
>
> Key: BEAM-10078
> URL: https://issues.apache.org/jira/browse/BEAM-10078
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> After BEAM-9383, Dataflow specific jars (dataflow-worker.jar, windmill_main) 
> could be overwritten when two or more jobs share the same staging location. 
> Since they 1) should have specific predefined names AND 2) should have unique 
> location for avoiding collision, they need special handling when staging 
> artifacts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9956) Bad formatting on environments page

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9956?focusedWorklogId=438515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438515
 ]

ASF GitHub Bot logged work on BEAM-9956:


Author: ASF GitHub Bot
Created on: 28/May/20 23:39
Start Date: 28/May/20 23:39
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11704:
URL: https://github.com/apache/beam/pull/11704#issuecomment-635673169


   Could you rebase? Is this still an issue after the website change?
   
   R: @rosetn 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438515)
Time Spent: 0.5h  (was: 20m)

> Bad formatting on environments page
> ---
>
> Key: BEAM-9956
> URL: https://issues.apache.org/jira/browse/BEAM-9956
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Kyle Weaver
>Assignee: Brent Worden
>Priority: P3
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> https://beam.apache.org/documentation/runtime/environments/
> At the bottom of the page, it looks like there are text instructions that are 
> formatted as code and vice-versa.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8451) Interactive Beam example failing from stack overflow

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8451?focusedWorklogId=438514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438514
 ]

ASF GitHub Bot logged work on BEAM-8451:


Author: ASF GitHub Bot
Created on: 28/May/20 23:38
Start Date: 28/May/20 23:38
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11706:
URL: https://github.com/apache/beam/pull/11706#issuecomment-635672892


   R: @rosetn 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438514)
Time Spent: 2h 10m  (was: 2h)

> Interactive Beam example failing from stack overflow
> 
>
> Key: BEAM-8451
> URL: https://issues.apache.org/jira/browse/BEAM-8451
> Project: Beam
>  Issue Type: Bug
>  Components: examples-python, runner-py-interactive
>Reporter: Igor Durovic
>Assignee: Chun Yang
>Priority: P2
> Fix For: 2.18.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
>  
> RecursionError: maximum recursion depth exceeded in __instancecheck__
> at 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py#L405]
>  
> This occurred after the execution of the last cell in 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/examples/Interactive%20Beam%20Example.ipynb]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10078) uniquify Dataflow specific jars when staging

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10078?focusedWorklogId=438511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438511
 ]

ASF GitHub Bot logged work on BEAM-10078:
-

Author: ASF GitHub Bot
Created on: 28/May/20 23:36
Start Date: 28/May/20 23:36
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #11814:
URL: https://github.com/apache/beam/pull/11814#issuecomment-635672193


   @chamikaramj, @ihji is the `filesToStage` change a problem?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438511)
Time Spent: 2h 10m  (was: 2h)

> uniquify Dataflow specific jars when staging
> 
>
> Key: BEAM-10078
> URL: https://issues.apache.org/jira/browse/BEAM-10078
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> After BEAM-9383, Dataflow specific jars (dataflow-worker.jar, windmill_main) 
> could be overwritten when two or more jobs share the same staging location. 
> Since they 1) should have specific predefined names AND 2) should have unique 
> location for avoiding collision, they need special handling when staging 
> artifacts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9825) Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9825?focusedWorklogId=438509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438509
 ]

ASF GitHub Bot logged work on BEAM-9825:


Author: ASF GitHub Bot
Created on: 28/May/20 23:35
Start Date: 28/May/20 23:35
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11610:
URL: https://github.com/apache/beam/pull/11610#issuecomment-635672074


   All tests passed. I do not see a LGTM, maybe I am missing. Is this ready to 
be merged?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438509)
Remaining Estimate: 85h 20m  (was: 85.5h)
Time Spent: 10h 40m  (was: 10.5h)

> Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll
> --
>
> Key: BEAM-9825
> URL: https://issues.apache.org/jira/browse/BEAM-9825
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: P2
>   Original Estimate: 96h
>  Time Spent: 10h 40m
>  Remaining Estimate: 85h 20m
>
> I'd like to propose following new high-level transforms.
>  * Intersect
> Compute the intersection between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that common to both _leftCollection_ and 
> _rightCollection_
>  
>  * Except
> Compute the difference between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that are in _leftCollection_ but not in 
> _rightCollection_
>  * Union
> Find the elements that are either of two PCollection.
> Implement IntersetAll, ExceptAll and UnionAll variants of transforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10055) Add --region to 3 of the python examples

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10055?focusedWorklogId=438508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438508
 ]

ASF GitHub Bot logged work on BEAM-10055:
-

Author: ASF GitHub Bot
Created on: 28/May/20 23:34
Start Date: 28/May/20 23:34
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11779:
URL: https://github.com/apache/beam/pull/11779#issuecomment-635671668


   LGTM. I will merge after tests pass. Thank you @tedromer 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438508)
Remaining Estimate: 20m  (was: 0.5h)
Time Spent: 40m  (was: 0.5h)

> Add --region to 3 of the python examples
> 
>
> Key: BEAM-10055
> URL: https://issues.apache.org/jira/browse/BEAM-10055
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ted Romer
>Assignee: Ted Romer
>Priority: P3
>   Original Estimate: 1h
>  Time Spent: 40m
>  Remaining Estimate: 20m
>
> Proposed fix: 
> {color:#FF}[https://github.com/tedromer/beam/compare/tedromer:ef811fe...tedromer:1f39865]{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10055) Add --region to 3 of the python examples

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10055?focusedWorklogId=438507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438507
 ]

ASF GitHub Bot logged work on BEAM-10055:
-

Author: ASF GitHub Bot
Created on: 28/May/20 23:34
Start Date: 28/May/20 23:34
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11779:
URL: https://github.com/apache/beam/pull/11779#issuecomment-635671566


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438507)
Remaining Estimate: 0.5h  (was: 40m)
Time Spent: 0.5h  (was: 20m)

> Add --region to 3 of the python examples
> 
>
> Key: BEAM-10055
> URL: https://issues.apache.org/jira/browse/BEAM-10055
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ted Romer
>Assignee: Ted Romer
>Priority: P3
>   Original Estimate: 1h
>  Time Spent: 0.5h
>  Remaining Estimate: 0.5h
>
> Proposed fix: 
> {color:#FF}[https://github.com/tedromer/beam/compare/tedromer:ef811fe...tedromer:1f39865]{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10149) Support OnTimeBehavior and closing_behavior for triggers in Python.

2020-05-28 Thread Robert Bradshaw (Jira)
Robert Bradshaw created BEAM-10149:
--

 Summary: Support OnTimeBehavior and closing_behavior for triggers 
in Python.
 Key: BEAM-10149
 URL: https://issues.apache.org/jira/browse/BEAM-10149
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Robert Bradshaw






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8532) Beam Python trigger driver sets incorrect timestamp for output windows.

2020-05-28 Thread Robert Bradshaw (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw resolved BEAM-8532.
---
Fix Version/s: 2.20.0
   Resolution: Fixed

> Beam Python trigger driver sets incorrect timestamp for output windows.
> ---
>
> Key: BEAM-8532
> URL: https://issues.apache.org/jira/browse/BEAM-8532
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P1
> Fix For: 2.20.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The timestamp should lie in the window, otherwise re-windowing will not be 
> idempotent. 
> https://github.com/apache/beam/blob/release-2.16.0/sdks/python/apache_beam/transforms/trigger.py#L1183
>  should be using {{window.max_timestamp()}} rather than {{.end}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10125) adding cross-language KafkaIO integration test

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10125?focusedWorklogId=438505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438505
 ]

ASF GitHub Bot logged work on BEAM-10125:
-

Author: ASF GitHub Bot
Created on: 28/May/20 23:30
Start Date: 28/May/20 23:30
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11847:
URL: https://github.com/apache/beam/pull/11847#issuecomment-635670461


   LGTM. Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438505)
Time Spent: 1h 20m  (was: 1h 10m)

> adding cross-language KafkaIO integration test
> --
>
> Key: BEAM-10125
> URL: https://issues.apache.org/jira/browse/BEAM-10125
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> adding cross-language KafkaIO integration test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8371) Sunset Beam Python 2 support in new releases in 2020.

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8371?focusedWorklogId=438503=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438503
 ]

ASF GitHub Bot logged work on BEAM-8371:


Author: ASF GitHub Bot
Created on: 28/May/20 23:27
Start Date: 28/May/20 23:27
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11819:
URL: https://github.com/apache/beam/pull/11819#issuecomment-635669412


   Should we close this pull request for now?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438503)
Time Spent: 1h  (was: 50m)

> Sunset Beam Python 2 support in new releases in 2020.
> -
>
> Key: BEAM-8371
> URL: https://issues.apache.org/jira/browse/BEAM-8371
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Ashwin Ramaswami
>Priority: P2
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Creating this Jira to track eventual sunset of Python 2 support in Beam.
> This was previously discussed in [1], [2], [3].
> We can use this issue to communicate next steps, collect feedback and give 
> updates on current status.
> [1] 
> https://lists.apache.org/thread.html/eba6caa58ea79a7ecbc8560d1c680a366b44c531d96ce5c699d41535@%3Cdev.beam.apache.org%3E
> [2] 
> https://lists.apache.org/thread.html/456631fe1a696c537ef8ebfee42cd3ea8121bf7c639c52da5f7032e7@%3Cdev.beam.apache.org%3E
> [3] 
> https://lists.apache.org/thread.html/r0d5c309a7e3107854f4892ccfeb1a17c0cec25dfce188678ab8df072%40%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9946) Enhance Partition transform to provide partitionfn with SideInputs

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9946?focusedWorklogId=438502=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438502
 ]

ASF GitHub Bot logged work on BEAM-9946:


Author: ASF GitHub Bot
Created on: 28/May/20 23:26
Start Date: 28/May/20 23:26
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11682:
URL: https://github.com/apache/beam/pull/11682#issuecomment-635668922


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438502)
Remaining Estimate: 92.5h  (was: 92h 40m)
Time Spent: 3.5h  (was: 3h 20m)

> Enhance Partition transform to provide partitionfn with SideInputs
> --
>
> Key: BEAM-9946
> URL: https://issues.apache.org/jira/browse/BEAM-9946
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: P2
>   Original Estimate: 96h
>  Time Spent: 3.5h
>  Remaining Estimate: 92.5h
>
> Currently _Partition_ transform can partition a collection into n collections 
> based on only _element_ value in _PartitionFn_ to decide on which partition a 
> particular element belongs to.
> {code:java}
> public interface PartitionFn extends Serializable {
> int partitionFor(T elem, int numPartitions);
>   }
> public static  Partition of(int numPartitions, PartitionFn 
> partitionFn) {
> return new Partition<>(new PartitionDoFn(numPartitions, partitionFn));
>   }
> {code}
> It will be useful to introduce new API with additional _sideInputs_ provided 
> to partition function. User will be able to write logic to use both _element_ 
> value and _sideInputs_ to decide on which partition a particular element 
> belongs to.
> Option-1: Proposed new API:
> {code:java}
>   public interface PartitionWithSideInputsFn extends Serializable {
> int partitionFor(T elem, int numPartitions, Context c);
>   }
> public static  Partition of(int numPartitions, 
> PartitionWithSideInputsFn partitionFn, Requirements requirements) {
>  ...
>   }
> {code}
> User can use any of the two APIs as per there partitioning function logic.
> Option-2: Redesign old API with Builder Pattern which can provide optionally 
> a _Requirements_ with _sideInputs._ Deprecate old API.
> {code:java}
> // using sideviews
> Partition.into(numberOfPartitions).via(
> fn(
>   (input,c) ->  {
> // use c.sideInput(view)
> // use input
> // return partitionnumber
>  },requiresSideInputs(view))
> )
> // without using sideviews
> Partition.into(numberOfPartitions).via(
> fn((input,c) ->  {
> // use input
> // return partitionnumber
>  })
> )
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10148) Data loaded using ReadAllFromText is not projected properly as side input

2020-05-28 Thread Prathap Kumar Parvathareddy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prathap Kumar Parvathareddy updated BEAM-10148:
---
Description: 
*Context:*

Data Enrichment Pattern with 2 Sources. 

Source 1:  Google Cloud PubSub delivering JSON messages

Source 2:  Google Cloud Storage Files (acts as a Side Input) 

*Steps:* 

1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based on 
file path inside a PCollection and convert each record as a tuple with 2 fields

2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to 
the main input that is being loaded from PubSub.

3. Expectation is that the side inputs should be available but for some reason 
AsDict is not containing all the data that was loaded from GCS

 

*Possible Issues:*

      Below are few possible issues that can be ruled out as I already 
validated them.
 # Window Mismatches -  Main Input window is within the scope of Side Input 
window.
 # Delay in updating Side Input State -  Pipeline has just 1 VM and Side Input 
has only 8 json messages and total size of side input is around 40 KB

 

*Troubleshooting:*
 # Working fine in a DirectRunner   
 # Validated that ReadAllFromText transform is loading all the data properly 
from multiple files with subsequent transform building KV as well. However when 
the output of KV transform is used as a sideinput using ASDict() for some 
reason certain elements are skipped and not available inside look up.

*Code* 

  Will update the link pointing to complete code shortly which helps in 
providing more visibility.

 

 

 

  was:
*Context:*

Data Enrichment Pattern with 2 Sources. 

Source 1:  Google Cloud PubSub delivering JSON messages

Source 2:  Google Cloud Storage Files (acts as a Side Input) 

*Steps:* 

1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based on 
file path inside a PCollection and convert each record as a tuple with 2 fields

2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to 
the main input that is being loaded from PubSub.

3. Expectation is that the side inputs should be available but for some reason 
AsDict is not containing all the data that was loaded from GCS

*Possible Issues:*
 # Window Mismatches -  Main Input window is within the scope of Side Input 
window.
 # Delay in updating Side Input State -  Pipeline has just 1 VM and Side Input 
has only 8 json messages and total size of side input is around 40 KB

 

*Troubleshooting:*
 # Working fine in a DirectRunner   
 # Validated that ReadAllFromText transform is loading all the data properly 
from multiple files with subsequent transform building KV as well. However when 
the output of KV transform is used as a sideinput using ASDict() for some 
reason certain elements are skipped and not available inside look up.

*Code* 

  Will update the link pointing to complete code shortly which helps in 
providing more visibility.

 

 

 


> Data loaded using ReadAllFromText is not projected properly as side input
> -
>
> Key: BEAM-10148
> URL: https://issues.apache.org/jira/browse/BEAM-10148
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-files
>Affects Versions: 2.19.0
> Environment: Runner: Dataflow Runner
> Beam SDK:  Python 2.19
>Reporter: Prathap Kumar Parvathareddy
>Priority: P2
>
> *Context:*
> Data Enrichment Pattern with 2 Sources. 
> Source 1:  Google Cloud PubSub delivering JSON messages
> Source 2:  Google Cloud Storage Files (acts as a Side Input) 
> *Steps:* 
> 1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based 
> on file path inside a PCollection and convert each record as a tuple with 2 
> fields
> 2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to 
> the main input that is being loaded from PubSub.
> 3. Expectation is that the side inputs should be available but for some 
> reason AsDict is not containing all the data that was loaded from GCS
>  
> *Possible Issues:*
>       Below are few possible issues that can be ruled out as I already 
> validated them.
>  # Window Mismatches -  Main Input window is within the scope of Side Input 
> window.
>  # Delay in updating Side Input State -  Pipeline has just 1 VM and Side 
> Input has only 8 json messages and total size of side input is around 40 KB
>  
> *Troubleshooting:*
>  # Working fine in a DirectRunner   
>  # Validated that ReadAllFromText transform is loading all the data properly 
> from multiple files with subsequent transform building KV as well. However 
> when the output of KV transform is used as a sideinput using ASDict() for 
> some reason certain elements are skipped and not available inside look up.
> *Code* 
>   Will update the link 

[jira] [Updated] (BEAM-10148) Data loaded using ReadAllFromText is not projected properly as side input

2020-05-28 Thread Prathap Kumar Parvathareddy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prathap Kumar Parvathareddy updated BEAM-10148:
---
Description: 
*Context:*

Data Enrichment Pattern with 2 Sources. 

Source 1:  Google Cloud PubSub delivering JSON messages

Source 2:  Google Cloud Storage Files (acts as a Side Input) 

*Steps:* 

1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based on 
file path inside a PCollection and convert each record as a tuple with 2 fields

2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to 
the main input that is being loaded from PubSub.

3. Expectation is that the side inputs should be available but for some reason 
AsDict is not containing all the data that was loaded from GCS

*Possible Issues:*
 # Window Mismatches -  Main Input window is within the scope of Side Input 
window.
 # Delay in updating Side Input State -  Pipeline has just 1 VM and Side Input 
has only 8 json messages and total size of side input is around 40 KB

 

*Troubleshooting:*
 # Working fine in a DirectRunner   
 # Validated that ReadAllFromText transform is loading all the data properly 
from multiple files with subsequent transform building KV as well. However when 
the output of KV transform is used as a sideinput using ASDict() for some 
reason certain elements are skipped and not available inside look up.

*Code* 

  Will update the link pointing to complete code shortly that provides more 
visibility

 

 

 

  was:
*Context:*

Data Enrichment Pattern with 2 Sources. 

Source 1:  Google Cloud PubSub delivering JSON messages

Source 2:  Google Cloud Storage Files (acts as a Side Input) 

*Steps:*  

1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based on 
file path inside a PCollection and convert each record as a tuple with 2 fields

2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to 
the main input that is being loaded from PubSub.

3. Expectation is that the side inputs should be available but for some reason 
AsDict is not containing all the data that was loaded from GCS

*Possible Issues:*
 # Window Mismatches -  Main Input window is within the scope of Side Input 
window.
 # Delay in updating Side Input State -  Pipeline has just 1 VM and Side Input 
has only 8 json messages and total size of side input is around 40 KB

 

*Troubleshooting:*
 # Working fine in a DirectRunner   
 # Validated that ReadAllFromText transform is loading all the data properly 
from multiple files with subsequent transform building KV as well. However when 
the output of KV transform is used as a sideinput using ASDict() for some 
reason certain elements are skipped and not available inside look up.

*Code* 

  Will update the link pointing to complete code that provides more visibility

 

 

 


> Data loaded using ReadAllFromText is not projected properly as side input
> -
>
> Key: BEAM-10148
> URL: https://issues.apache.org/jira/browse/BEAM-10148
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-files
>Affects Versions: 2.19.0
> Environment: Runner: Dataflow Runner
> Beam SDK:  Python 2.19
>Reporter: Prathap Kumar Parvathareddy
>Priority: P2
>
> *Context:*
> Data Enrichment Pattern with 2 Sources. 
> Source 1:  Google Cloud PubSub delivering JSON messages
> Source 2:  Google Cloud Storage Files (acts as a Side Input) 
> *Steps:* 
> 1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based 
> on file path inside a PCollection and convert each record as a tuple with 2 
> fields
> 2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to 
> the main input that is being loaded from PubSub.
> 3. Expectation is that the side inputs should be available but for some 
> reason AsDict is not containing all the data that was loaded from GCS
> *Possible Issues:*
>  # Window Mismatches -  Main Input window is within the scope of Side Input 
> window.
>  # Delay in updating Side Input State -  Pipeline has just 1 VM and Side 
> Input has only 8 json messages and total size of side input is around 40 KB
>  
> *Troubleshooting:*
>  # Working fine in a DirectRunner   
>  # Validated that ReadAllFromText transform is loading all the data properly 
> from multiple files with subsequent transform building KV as well. However 
> when the output of KV transform is used as a sideinput using ASDict() for 
> some reason certain elements are skipped and not available inside look up.
> *Code* 
>   Will update the link pointing to complete code shortly that provides more 
> visibility
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=438498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438498
 ]

ASF GitHub Bot logged work on BEAM-10097:
-

Author: ASF GitHub Bot
Created on: 28/May/20 23:14
Start Date: 28/May/20 23:14
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-635664997


   Run Spark ValidatesRunner



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438498)
Time Spent: 3h 20m  (was: 3h 10m)

> Migrate PCollection views to use both iterable and multimap 
> materializations/access patterns
> 
>
> Key: BEAM-10097
> URL: https://issues.apache.org/jira/browse/BEAM-10097
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, 
> map, multimap.
> We should be using the primitive views (iterable, multimap) directly without 
> going through the naive mapping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10148) Data loaded using ReadAllFromText is not projected properly as side input

2020-05-28 Thread Prathap Kumar Parvathareddy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prathap Kumar Parvathareddy updated BEAM-10148:
---
Description: 
*Context:*

Data Enrichment Pattern with 2 Sources. 

Source 1:  Google Cloud PubSub delivering JSON messages

Source 2:  Google Cloud Storage Files (acts as a Side Input) 

*Steps:* 

1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based on 
file path inside a PCollection and convert each record as a tuple with 2 fields

2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to 
the main input that is being loaded from PubSub.

3. Expectation is that the side inputs should be available but for some reason 
AsDict is not containing all the data that was loaded from GCS

*Possible Issues:*
 # Window Mismatches -  Main Input window is within the scope of Side Input 
window.
 # Delay in updating Side Input State -  Pipeline has just 1 VM and Side Input 
has only 8 json messages and total size of side input is around 40 KB

 

*Troubleshooting:*
 # Working fine in a DirectRunner   
 # Validated that ReadAllFromText transform is loading all the data properly 
from multiple files with subsequent transform building KV as well. However when 
the output of KV transform is used as a sideinput using ASDict() for some 
reason certain elements are skipped and not available inside look up.

*Code* 

  Will update the link pointing to complete code shortly which helps in 
providing more visibility.

 

 

 

  was:
*Context:*

Data Enrichment Pattern with 2 Sources. 

Source 1:  Google Cloud PubSub delivering JSON messages

Source 2:  Google Cloud Storage Files (acts as a Side Input) 

*Steps:* 

1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based on 
file path inside a PCollection and convert each record as a tuple with 2 fields

2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to 
the main input that is being loaded from PubSub.

3. Expectation is that the side inputs should be available but for some reason 
AsDict is not containing all the data that was loaded from GCS

*Possible Issues:*
 # Window Mismatches -  Main Input window is within the scope of Side Input 
window.
 # Delay in updating Side Input State -  Pipeline has just 1 VM and Side Input 
has only 8 json messages and total size of side input is around 40 KB

 

*Troubleshooting:*
 # Working fine in a DirectRunner   
 # Validated that ReadAllFromText transform is loading all the data properly 
from multiple files with subsequent transform building KV as well. However when 
the output of KV transform is used as a sideinput using ASDict() for some 
reason certain elements are skipped and not available inside look up.

*Code* 

  Will update the link pointing to complete code shortly that provides more 
visibility

 

 

 


> Data loaded using ReadAllFromText is not projected properly as side input
> -
>
> Key: BEAM-10148
> URL: https://issues.apache.org/jira/browse/BEAM-10148
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-files
>Affects Versions: 2.19.0
> Environment: Runner: Dataflow Runner
> Beam SDK:  Python 2.19
>Reporter: Prathap Kumar Parvathareddy
>Priority: P2
>
> *Context:*
> Data Enrichment Pattern with 2 Sources. 
> Source 1:  Google Cloud PubSub delivering JSON messages
> Source 2:  Google Cloud Storage Files (acts as a Side Input) 
> *Steps:* 
> 1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based 
> on file path inside a PCollection and convert each record as a tuple with 2 
> fields
> 2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to 
> the main input that is being loaded from PubSub.
> 3. Expectation is that the side inputs should be available but for some 
> reason AsDict is not containing all the data that was loaded from GCS
> *Possible Issues:*
>  # Window Mismatches -  Main Input window is within the scope of Side Input 
> window.
>  # Delay in updating Side Input State -  Pipeline has just 1 VM and Side 
> Input has only 8 json messages and total size of side input is around 40 KB
>  
> *Troubleshooting:*
>  # Working fine in a DirectRunner   
>  # Validated that ReadAllFromText transform is loading all the data properly 
> from multiple files with subsequent transform building KV as well. However 
> when the output of KV transform is used as a sideinput using ASDict() for 
> some reason certain elements are skipped and not available inside look up.
> *Code* 
>   Will update the link pointing to complete code shortly which helps in 
> providing more visibility.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10148) Data loaded using ReadAllFromText is not projected properly as side input

2020-05-28 Thread Prathap Kumar Parvathareddy (Jira)
Prathap Kumar Parvathareddy created BEAM-10148:
--

 Summary: Data loaded using ReadAllFromText is not projected 
properly as side input
 Key: BEAM-10148
 URL: https://issues.apache.org/jira/browse/BEAM-10148
 Project: Beam
  Issue Type: Bug
  Components: io-py-files
Affects Versions: 2.19.0
 Environment: Runner: Dataflow Runner
Beam SDK:  Python 2.19
Reporter: Prathap Kumar Parvathareddy


*Context:*

Data Enrichment Pattern with 2 Sources. 

Source 1:  Google Cloud PubSub delivering JSON messages

Source 2:  Google Cloud Storage Files (acts as a Side Input) 

*Steps:*  

1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based on 
file path inside a PCollection and convert each record as a tuple with 2 fields

2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to 
the main input that is being loaded from PubSub.

3. Expectation is that the side inputs should be available but for some reason 
AsDict is not containing all the data that was loaded from GCS

*Possible Issues:*
 # Window Mismatches -  Main Input window is within the scope of Side Input 
window.
 # Delay in updating Side Input State -  Pipeline has just 1 VM and Side Input 
has only 8 json messages and total size of side input is around 40 KB

 

*Troubleshooting:*
 # Working fine in a DirectRunner   
 # Validated that ReadAllFromText transform is loading all the data properly 
from multiple files with subsequent transform building KV as well. However when 
the output of KV transform is used as a sideinput using ASDict() for some 
reason certain elements are skipped and not available inside look up.

*Code* 

  Will update the link pointing to complete code that provides more visibility

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9692) Clean Python DataflowRunner to use portable pipelines

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9692?focusedWorklogId=438497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438497
 ]

ASF GitHub Bot logged work on BEAM-9692:


Author: ASF GitHub Bot
Created on: 28/May/20 23:11
Start Date: 28/May/20 23:11
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #11594:
URL: https://github.com/apache/beam/pull/11594#issuecomment-635664357


   Looks like the PreCommit failed with "Exception: Dataflow only supports 
Python versions 2 and 3.5+, got: (3, 8)". Is that a known failure?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438497)
Time Spent: 5h 40m  (was: 5.5h)

> Clean Python DataflowRunner to use portable pipelines
> -
>
> Key: BEAM-9692
> URL: https://issues.apache.org/jira/browse/BEAM-9692
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: P2
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9964) Setting workerCacheMb to make its way to the WindmillStateCache Constructor

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9964?focusedWorklogId=438496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438496
 ]

ASF GitHub Bot logged work on BEAM-9964:


Author: ASF GitHub Bot
Created on: 28/May/20 23:10
Start Date: 28/May/20 23:10
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11849:
URL: https://github.com/apache/beam/pull/11849#issuecomment-635663865


   gahh so sorry that I missed this. I guess you did have to end up 
contributing this : )



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438496)
Time Spent: 4h 50m  (was: 4h 40m)

> Setting workerCacheMb to make its way to the WindmillStateCache Constructor
> ---
>
> Key: BEAM-9964
> URL: https://issues.apache.org/jira/browse/BEAM-9964
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Omar Ismail
>Assignee: Omar Ismail
>Priority: P3
> Fix For: 2.22.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Setting --workerCacheMB seems to affect batch pipelines only. For Streaming, 
> the cache seems to be hardcoded to 100Mb [1]. If possible, I would like to 
> make it allowable to change the cache value in Streaming when setting 
> -workerCacheMB.
> I've never made changes to the Beam SDK, so I am super excited to work on 
> this! 
>  
> [[1] 
> https://github.com/apache/beam/blob/5e659bb80bcbf70795f6806e05a255ee72706d9f/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java#L73|https://github.com/apache/beam/blob/5e659bb80bcbf70795f6806e05a255ee72706d9f/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java#L73]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10144) Update pipeline options snippets for best practices

2020-05-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-10144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-10144:

Status: Open  (was: Triage Needed)

> Update pipeline options snippets for best practices
> ---
>
> Key: BEAM-10144
> URL: https://issues.apache.org/jira/browse/BEAM-10144
> Project: Beam
>  Issue Type: Improvement
>  Components: examples-python
>Reporter: David Cavazos
>Assignee: David Cavazos
>Priority: P3
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10112) Add python sdk state and timer examples to website

2020-05-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-10112:

Status: Open  (was: Triage Needed)

> Add python sdk state and timer examples to website
> --
>
> Key: BEAM-10112
> URL: https://issues.apache.org/jira/browse/BEAM-10112
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: P2
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=438493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438493
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 28/May/20 23:04
Start Date: 28/May/20 23:04
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #11632:
URL: https://github.com/apache/beam/pull/11632#discussion_r432170796



##
File path: sdks/python/apache_beam/dataframe/transforms.py
##
@@ -16,13 +16,28 @@
 
 from __future__ import absolute_import
 
+import typing
+from typing import Any
+from typing import Dict
+from typing import List
+from typing import Mapping
+from typing import Tuple
+from typing import TypeVar
+from typing import Union
+
 import pandas as pd
 
 import apache_beam as beam
 from apache_beam import transforms
 from apache_beam.dataframe import expressions
 from apache_beam.dataframe import frames  # pylint: disable=unused-import
 
+if typing.TYPE_CHECKING:

Review comment:
   +1 for consistency. Changed. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438493)
Time Spent: 85h  (was: 84h 50m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: P2
>  Time Spent: 85h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=438492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438492
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 28/May/20 23:03
Start Date: 28/May/20 23:03
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11854:
URL: https://github.com/apache/beam/pull/11854#issuecomment-635657611


   R: @TheNeuralBit 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438492)
Time Spent: 23h 40m  (was: 23.5h)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 23h 40m
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=438491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438491
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 28/May/20 23:02
Start Date: 28/May/20 23:02
Worklog Time Spent: 10m 
  Work Description: chamikaramj opened a new pull request #11854:
URL: https://github.com/apache/beam/pull/11854


   Without this cross-language KafkaIO users may have to do
   pipeline.run(False)
   
   instead of
   pipeline.run()
   
   when executing a pipeline using Dataflow.
   
   @TheNeuralBit this should be a pretty safe change to cherry-pick if you are 
OK with it.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=438488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438488
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 28/May/20 22:57
Start Date: 28/May/20 22:57
Worklog Time Spent: 10m 
  Work Description: robertwb merged pull request #11844:
URL: https://github.com/apache/beam/pull/11844


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438488)
Time Spent: 23h 10m  (was: 23h)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 23h 10m
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=438489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438489
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 28/May/20 22:57
Start Date: 28/May/20 22:57
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11844:
URL: https://github.com/apache/beam/pull/11844#issuecomment-635651784


   Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438489)
Time Spent: 23h 20m  (was: 23h 10m)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 23h 20m
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10136) Add cross-language wrapper for Java's JdbcIO Write

2020-05-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-10136:

Status: Open  (was: Triage Needed)

> Add cross-language wrapper for Java's JdbcIO Write
> --
>
> Key: BEAM-10136
> URL: https://issues.apache.org/jira/browse/BEAM-10136
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Priority: P2
>  Labels: portability
> Fix For: Not applicable
>
>
> Add cross-language wrapper for Java's Jdbc Write



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10140) Add cross-language wrapper for Java's SpannerIO Read

2020-05-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-10140:

Status: Open  (was: Triage Needed)

> Add cross-language wrapper for Java's SpannerIO Read
> 
>
> Key: BEAM-10140
> URL: https://issues.apache.org/jira/browse/BEAM-10140
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Priority: P2
>  Labels: portability
> Fix For: Not applicable
>
>
> Add cross-language wrapper for Java's SpannerIO Read



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10137) Add cross-language wrapper for Java's KinesisIO Read

2020-05-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-10137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-10137:

Status: Open  (was: Triage Needed)

> Add cross-language wrapper for Java's KinesisIO Read
> 
>
> Key: BEAM-10137
> URL: https://issues.apache.org/jira/browse/BEAM-10137
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Priority: P2
>  Labels: portability
> Fix For: Not applicable
>
>
> Add cross-language wrapper for Java's KinesisIO Read



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10138) Add cross-language wrapper for Java's KinesisIO Write

2020-05-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-10138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-10138:

Status: Open  (was: Triage Needed)

> Add cross-language wrapper for Java's KinesisIO Write
> -
>
> Key: BEAM-10138
> URL: https://issues.apache.org/jira/browse/BEAM-10138
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Priority: P2
>  Labels: portability
> Fix For: Not applicable
>
>
> Add cross-language wrapper for Java's KinesisIO Write



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10139) Add cross-language wrapper for Java's SpannerIO Write

2020-05-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-10139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-10139:

Status: Open  (was: Triage Needed)

> Add cross-language wrapper for Java's SpannerIO Write
> -
>
> Key: BEAM-10139
> URL: https://issues.apache.org/jira/browse/BEAM-10139
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Priority: P2
>  Labels: portability
> Fix For: Not applicable
>
>
> Add cross-language wrapper for Java's SpannerIO Write



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10144) Update pipeline options snippets for best practices

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10144?focusedWorklogId=438486=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438486
 ]

ASF GitHub Bot logged work on BEAM-10144:
-

Author: ASF GitHub Bot
Created on: 28/May/20 22:49
Start Date: 28/May/20 22:49
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11851:
URL: https://github.com/apache/beam/pull/11851#issuecomment-635648807


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438486)
Time Spent: 1h 20m  (was: 1h 10m)

> Update pipeline options snippets for best practices
> ---
>
> Key: BEAM-10144
> URL: https://issues.apache.org/jira/browse/BEAM-10144
> Project: Beam
>  Issue Type: Improvement
>  Components: examples-python
>Reporter: David Cavazos
>Assignee: David Cavazos
>Priority: P3
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10135) Add cross-language wrapper for Java's JdbcIO Read

2020-05-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-10135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-10135:

Status: Open  (was: Triage Needed)

> Add cross-language wrapper for Java's JdbcIO Read
> -
>
> Key: BEAM-10135
> URL: https://issues.apache.org/jira/browse/BEAM-10135
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Priority: P2
>  Labels: portability
> Fix For: Not applicable
>
>
> Add cross-language wrapper for Java's Jdbc Read



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10134) Add Cross-language wrappers for Java IOs

2020-05-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-10134:

Status: Open  (was: Triage Needed)

> Add Cross-language wrappers for Java IOs
> 
>
> Key: BEAM-10134
> URL: https://issues.apache.org/jira/browse/BEAM-10134
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Priority: P2
>  Labels: portability
> Fix For: Not applicable
>
>
> Add cross-language wrappers for Java IOs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9679) Core Transforms | Go SDK Code Katas

2020-05-28 Thread Damon Douglas (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damon Douglas updated BEAM-9679:

Description: 
A kata devoted to core beam transforms patterns after 
[https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
 where the take away is an individual's ability to master the following using 
an Apache Beam pipeline using the Golang SDK.

 
||Transform||Pull Request||Status||
|Map|[11564|https://github.com/apache/beam/pull/11564]|Closed|
|GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed|
|CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Open|
|Combine| | |
|Flatten|[11806|https://github.com/apache/beam/pull/11806]|Closed|
|Partition| | |
|Side Input| | |
|Side Output| | |
|Branching| | |
|Composite Transform| | |
|DoFn Additional Parameters| | |

  was:
A kata devoted to core beam transforms patterns after 
[https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
 where the take away is an individual's ability to master the following using 
an Apache Beam pipeline using the Golang SDK.

 
||Transform||Pull Request||Status||
|Map|[11564|https://github.com/apache/beam/pull/11564]|Closed|
|GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed|
|CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Open|
|Combine| | |
|Flatten|[11806|https://github.com/apache/beam/pull/11806]|Open|
|Partition| | |
|Side Input| | |
|Side Output| | |
|Branching| | |
|Composite Transform| | |
|DoFn Additional Parameters| | |


> Core Transforms | Go SDK Code Katas
> ---
>
> Key: BEAM-9679
> URL: https://issues.apache.org/jira/browse/BEAM-9679
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: P2
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> A kata devoted to core beam transforms patterns after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
>  where the take away is an individual's ability to master the following using 
> an Apache Beam pipeline using the Golang SDK.
>  
> ||Transform||Pull Request||Status||
> |Map|[11564|https://github.com/apache/beam/pull/11564]|Closed|
> |GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed|
> |CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Open|
> |Combine| | |
> |Flatten|[11806|https://github.com/apache/beam/pull/11806]|Closed|
> |Partition| | |
> |Side Input| | |
> |Side Output| | |
> |Branching| | |
> |Composite Transform| | |
> |DoFn Additional Parameters| | |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9679) Core Transforms | Go SDK Code Katas

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9679?focusedWorklogId=438481=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438481
 ]

ASF GitHub Bot logged work on BEAM-9679:


Author: ASF GitHub Bot
Created on: 28/May/20 22:37
Start Date: 28/May/20 22:37
Worklog Time Spent: 10m 
  Work Description: damondouglas commented on pull request #11803:
URL: https://github.com/apache/beam/pull/11803#issuecomment-635644834


   I updated [the stepik course](https://stepik.org/course/70387) and commited 
the updated `*-remote.yaml` files to this PR.  It is ready to merge into 
master.  Thank you, @henryken and @lostluck for your help.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438481)
Time Spent: 5h  (was: 4h 50m)

> Core Transforms | Go SDK Code Katas
> ---
>
> Key: BEAM-9679
> URL: https://issues.apache.org/jira/browse/BEAM-9679
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: P2
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> A kata devoted to core beam transforms patterns after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
>  where the take away is an individual's ability to master the following using 
> an Apache Beam pipeline using the Golang SDK.
>  
> ||Transform||Pull Request||Status||
> |Map|[11564|https://github.com/apache/beam/pull/11564]|Closed|
> |GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed|
> |CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Open|
> |Combine| | |
> |Flatten|[11806|https://github.com/apache/beam/pull/11806]|Open|
> |Partition| | |
> |Side Input| | |
> |Side Output| | |
> |Branching| | |
> |Composite Transform| | |
> |DoFn Additional Parameters| | |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=438480=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438480
 ]

ASF GitHub Bot logged work on BEAM-10097:
-

Author: ASF GitHub Bot
Created on: 28/May/20 22:37
Start Date: 28/May/20 22:37
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-635644599


   Run Spark ValidatesRunner



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438480)
Time Spent: 3h 10m  (was: 3h)

> Migrate PCollection views to use both iterable and multimap 
> materializations/access patterns
> 
>
> Key: BEAM-10097
> URL: https://issues.apache.org/jira/browse/BEAM-10097
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, 
> map, multimap.
> We should be using the primitive views (iterable, multimap) directly without 
> going through the naive mapping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9692) Clean Python DataflowRunner to use portable pipelines

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9692?focusedWorklogId=438479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438479
 ]

ASF GitHub Bot logged work on BEAM-9692:


Author: ASF GitHub Bot
Created on: 28/May/20 22:26
Start Date: 28/May/20 22:26
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11594:
URL: https://github.com/apache/beam/pull/11594#issuecomment-635640892


   Run Python 2 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438479)
Time Spent: 5.5h  (was: 5h 20m)

> Clean Python DataflowRunner to use portable pipelines
> -
>
> Key: BEAM-9692
> URL: https://issues.apache.org/jira/browse/BEAM-9692
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: P2
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9692) Clean Python DataflowRunner to use portable pipelines

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9692?focusedWorklogId=438478=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438478
 ]

ASF GitHub Bot logged work on BEAM-9692:


Author: ASF GitHub Bot
Created on: 28/May/20 22:25
Start Date: 28/May/20 22:25
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11594:
URL: https://github.com/apache/beam/pull/11594#issuecomment-635640799







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438478)
Time Spent: 5h 20m  (was: 5h 10m)

> Clean Python DataflowRunner to use portable pipelines
> -
>
> Key: BEAM-9692
> URL: https://issues.apache.org/jira/browse/BEAM-9692
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: P2
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9692) Clean Python DataflowRunner to use portable pipelines

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9692?focusedWorklogId=438477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438477
 ]

ASF GitHub Bot logged work on BEAM-9692:


Author: ASF GitHub Bot
Created on: 28/May/20 22:23
Start Date: 28/May/20 22:23
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #11594:
URL: https://github.com/apache/beam/pull/11594#issuecomment-635640073


   R: @pabloem 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438477)
Time Spent: 5h 10m  (was: 5h)

> Clean Python DataflowRunner to use portable pipelines
> -
>
> Key: BEAM-9692
> URL: https://issues.apache.org/jira/browse/BEAM-9692
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: P2
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8280) re-enable IOTypeHints.from_callable

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8280?focusedWorklogId=438476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438476
 ]

ASF GitHub Bot logged work on BEAM-8280:


Author: ASF GitHub Bot
Created on: 28/May/20 22:14
Start Date: 28/May/20 22:14
Worklog Time Spent: 10m 
  Work Description: udim merged pull request #11070:
URL: https://github.com/apache/beam/pull/11070


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438476)
Time Spent: 15h 50m  (was: 15h 40m)

> re-enable IOTypeHints.from_callable
> ---
>
> Key: BEAM-8280
> URL: https://issues.apache.org/jira/browse/BEAM-8280
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: P2
> Fix For: 2.21.0
>
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> See https://issues.apache.org/jira/browse/BEAM-8279



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-10115) Staging requirements.txt fails but staging setup.py succeeds

2020-05-28 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119112#comment-17119112
 ] 

Ankur Goenka edited comment on BEAM-10115 at 5/28/20, 10:12 PM:


An action item from this seems to be cleaning /tmp/dataflow-requirements-cache  
before uploading the requirements as it might contain old artifacts.

https://issues.apache.org/jira/browse/BEAM-10147


was (Author: angoenka):
An action item from this seems to be cleaning /tmp/dataflow-requirements-cache  
before uploading the requirements as it might contain old artifacts.

> Staging requirements.txt fails but staging setup.py succeeds
> 
>
> Key: BEAM-10115
> URL: https://issues.apache.org/jira/browse/BEAM-10115
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Kenneth Knowles
>Priority: P2
>
> User reports on StackOverflow: 
> https://stackoverflow.com/questions/62032382/dataflow-fails-when-i-add-requirements-txt-python
> The issue appears to be a problem with staging, and a difference between 
> using `requirements.txt` and `setup.py` for some reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10147) Clean /tmp/dataflow-requirements-cache before downloading the requirements from requirements.txt

2020-05-28 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-10147:
---

 Summary: Clean /tmp/dataflow-requirements-cache before downloading 
the requirements from requirements.txt
 Key: BEAM-10147
 URL: https://issues.apache.org/jira/browse/BEAM-10147
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow, sdk-py-core
Reporter: Ankur Goenka


when using requirements.txt with beam pipelines, 
/tmp/dataflow-requirements-cache is not cleaned and all the new dependencies 
are downloaded and then all the files in /tmp/dataflow-requirements-cache  are 
staged which is wasteful and unecessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10115) Staging requirements.txt fails but staging setup.py succeeds

2020-05-28 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119112#comment-17119112
 ] 

Ankur Goenka commented on BEAM-10115:
-

An action item from this seems to be cleaning /tmp/dataflow-requirements-cache  
before uploading the requirements as it might contain old artifacts.

> Staging requirements.txt fails but staging setup.py succeeds
> 
>
> Key: BEAM-10115
> URL: https://issues.apache.org/jira/browse/BEAM-10115
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Kenneth Knowles
>Priority: P2
>
> User reports on StackOverflow: 
> https://stackoverflow.com/questions/62032382/dataflow-fails-when-i-add-requirements-txt-python
> The issue appears to be a problem with staging, and a difference between 
> using `requirements.txt` and `setup.py` for some reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=438470=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438470
 ]

ASF GitHub Bot logged work on BEAM-10097:
-

Author: ASF GitHub Bot
Created on: 28/May/20 21:59
Start Date: 28/May/20 21:59
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-635630717


   Run Spark ValidatesRunner



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438470)
Time Spent: 3h  (was: 2h 50m)

> Migrate PCollection views to use both iterable and multimap 
> materializations/access patterns
> 
>
> Key: BEAM-10097
> URL: https://issues.apache.org/jira/browse/BEAM-10097
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, 
> map, multimap.
> We should be using the primitive views (iterable, multimap) directly without 
> going through the naive mapping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10125) adding cross-language KafkaIO integration test

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10125?focusedWorklogId=438468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438468
 ]

ASF GitHub Bot logged work on BEAM-10125:
-

Author: ASF GitHub Bot
Created on: 28/May/20 21:47
Start Date: 28/May/20 21:47
Worklog Time Spent: 10m 
  Work Description: ihji commented on a change in pull request #11847:
URL: https://github.com/apache/beam/pull/11847#discussion_r432142992



##
File path: sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py
##
@@ -0,0 +1,145 @@
+"""Integration test for Python cross-language pipelines for Java KafkaIO."""
+
+from __future__ import absolute_import
+
+import contextlib
+import logging
+import os
+import socket
+import subprocess
+import time
+import typing
+import unittest
+
+import grpc
+
+import apache_beam as beam
+from apache_beam.io.external.kafka import ReadFromKafka
+from apache_beam.io.external.kafka import WriteToKafka
+from apache_beam.metrics import Metrics
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.testing.test_pipeline import TestPipeline
+
+
+class CrossLanguageKafkaIO(object):
+  def __init__(self, bootstrap_servers, topic, expansion_service=None):
+self.bootstrap_servers = bootstrap_servers
+self.topic = topic
+self.expansion_service = expansion_service or (
+'localhost:%s' % os.environ.get('EXPANSION_PORT'))
+self.sum_counter = Metrics.counter('source', 'elements_sum')
+
+  def build_write_pipeline(self, pipeline):
+_ = (
+pipeline
+| 'Impulse' >> beam.Impulse()
+| 'Generate' >> beam.FlatMap(lambda x: range(1000)) # pylint: 
disable=range-builtin-not-iterating
+| 'Reshuffle' >> beam.Reshuffle()
+| 'MakeKV' >> beam.Map(lambda x:
+   (b'', str(x).encode())).with_output_types(
+   typing.Tuple[bytes, bytes])
+| 'WriteToKafka' >> WriteToKafka(
+producer_config={'bootstrap.servers': self.bootstrap_servers},
+topic=self.topic,
+expansion_service=self.expansion_service))
+
+  def build_read_pipeline(self, pipeline):
+_ = (
+pipeline
+| 'ReadFromKafka' >> ReadFromKafka(
+consumer_config={
+'bootstrap.servers': self.bootstrap_servers,
+'auto.offset.reset': 'earliest'
+},
+topics=[self.topic],
+expansion_service=self.expansion_service)
+| 'Windowing' >> beam.WindowInto(
+beam.window.FixedWindows(300),
+trigger=beam.transforms.trigger.AfterProcessingTime(60),
+accumulation_mode=beam.transforms.trigger.AccumulationMode.
+DISCARDING)
+| 'DecodingValue' >> beam.Map(lambda elem: int(elem[1].decode()))
+| 'CombineGlobally' >> beam.CombineGlobally(sum).without_defaults()
+| 'SetSumCounter' >> beam.Map(self.sum_counter.inc))
+
+  def run_xlang_kafkaio(self, pipeline):
+self.build_write_pipeline(pipeline)
+self.build_read_pipeline(pipeline)
+pipeline.run(False)
+
+
+@unittest.skipUnless(
+os.environ.get('LOCAL_KAFKA_JAR'),
+"LOCAL_KAFKA_JAR environment var is not provided.")
+@unittest.skipUnless(
+os.environ.get('EXPANSION_JAR'),
+"EXPANSION_JAR environment var is not provided.")
+class CrossLanguageKafkaIOTest(unittest.TestCase):
+  def get_open_port(self):
+s = None
+try:
+  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+except:  # pylint: disable=bare-except
+  # Above call will fail for nodes that only support IPv6.
+  s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM)
+s.bind(('localhost', 0))
+s.listen(1)
+port = s.getsockname()[1]
+s.close()
+return port
+
+  @contextlib.contextmanager
+  def local_services(self, expansion_service_jar_file, local_kafka_jar_file):
+expansion_service_port = str(self.get_open_port())
+kafka_port = str(self.get_open_port())
+zookeeper_port = str(self.get_open_port())
+
+expansion_server = None
+kafka_server = None
+try:
+  expansion_server = subprocess.Popen(

Review comment:
   updated.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438468)
Time Spent: 1h 10m  (was: 1h)

> adding cross-language KafkaIO integration test
> --
>
> Key: BEAM-10125
> URL: https://issues.apache.org/jira/browse/BEAM-10125
> Project: Beam
> 

[jira] [Work logged] (BEAM-10125) adding cross-language KafkaIO integration test

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10125?focusedWorklogId=438466=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438466
 ]

ASF GitHub Bot logged work on BEAM-10125:
-

Author: ASF GitHub Bot
Created on: 28/May/20 21:47
Start Date: 28/May/20 21:47
Worklog Time Spent: 10m 
  Work Description: ihji commented on a change in pull request #11847:
URL: https://github.com/apache/beam/pull/11847#discussion_r432142695



##
File path: sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py
##
@@ -0,0 +1,145 @@
+"""Integration test for Python cross-language pipelines for Java KafkaIO."""
+
+from __future__ import absolute_import
+
+import contextlib
+import logging
+import os
+import socket
+import subprocess
+import time
+import typing
+import unittest
+
+import grpc
+
+import apache_beam as beam
+from apache_beam.io.external.kafka import ReadFromKafka
+from apache_beam.io.external.kafka import WriteToKafka
+from apache_beam.metrics import Metrics
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.testing.test_pipeline import TestPipeline
+
+
+class CrossLanguageKafkaIO(object):
+  def __init__(self, bootstrap_servers, topic, expansion_service=None):
+self.bootstrap_servers = bootstrap_servers
+self.topic = topic
+self.expansion_service = expansion_service or (
+'localhost:%s' % os.environ.get('EXPANSION_PORT'))
+self.sum_counter = Metrics.counter('source', 'elements_sum')
+
+  def build_write_pipeline(self, pipeline):
+_ = (
+pipeline
+| 'Impulse' >> beam.Impulse()
+| 'Generate' >> beam.FlatMap(lambda x: range(1000)) # pylint: 
disable=range-builtin-not-iterating
+| 'Reshuffle' >> beam.Reshuffle()
+| 'MakeKV' >> beam.Map(lambda x:
+   (b'', str(x).encode())).with_output_types(
+   typing.Tuple[bytes, bytes])
+| 'WriteToKafka' >> WriteToKafka(
+producer_config={'bootstrap.servers': self.bootstrap_servers},
+topic=self.topic,
+expansion_service=self.expansion_service))
+
+  def build_read_pipeline(self, pipeline):
+_ = (
+pipeline
+| 'ReadFromKafka' >> ReadFromKafka(
+consumer_config={
+'bootstrap.servers': self.bootstrap_servers,
+'auto.offset.reset': 'earliest'
+},
+topics=[self.topic],
+expansion_service=self.expansion_service)
+| 'Windowing' >> beam.WindowInto(
+beam.window.FixedWindows(300),
+trigger=beam.transforms.trigger.AfterProcessingTime(60),
+accumulation_mode=beam.transforms.trigger.AccumulationMode.
+DISCARDING)
+| 'DecodingValue' >> beam.Map(lambda elem: int(elem[1].decode()))
+| 'CombineGlobally' >> beam.CombineGlobally(sum).without_defaults()
+| 'SetSumCounter' >> beam.Map(self.sum_counter.inc))
+
+  def run_xlang_kafkaio(self, pipeline):
+self.build_write_pipeline(pipeline)
+self.build_read_pipeline(pipeline)
+pipeline.run(False)
+
+
+@unittest.skipUnless(
+os.environ.get('LOCAL_KAFKA_JAR'),
+"LOCAL_KAFKA_JAR environment var is not provided.")
+@unittest.skipUnless(
+os.environ.get('EXPANSION_JAR'),
+"EXPANSION_JAR environment var is not provided.")
+class CrossLanguageKafkaIOTest(unittest.TestCase):
+  def get_open_port(self):
+s = None
+try:
+  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+except:  # pylint: disable=bare-except
+  # Above call will fail for nodes that only support IPv6.
+  s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM)
+s.bind(('localhost', 0))
+s.listen(1)
+port = s.getsockname()[1]
+s.close()
+return port
+
+  @contextlib.contextmanager
+  def local_services(self, expansion_service_jar_file, local_kafka_jar_file):
+expansion_service_port = str(self.get_open_port())
+kafka_port = str(self.get_open_port())
+zookeeper_port = str(self.get_open_port())
+
+expansion_server = None
+kafka_server = None
+try:
+  expansion_server = subprocess.Popen(
+  ['java', '-jar', expansion_service_jar_file, expansion_service_port])
+  kafka_server = subprocess.Popen(

Review comment:
   Yes, this is for external testing (probably only for small scale 
correctness tests, we may still need kubernetes cluster for large scale 
performance tests).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438466)

[jira] [Work logged] (BEAM-10125) adding cross-language KafkaIO integration test

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10125?focusedWorklogId=438467=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438467
 ]

ASF GitHub Bot logged work on BEAM-10125:
-

Author: ASF GitHub Bot
Created on: 28/May/20 21:47
Start Date: 28/May/20 21:47
Worklog Time Spent: 10m 
  Work Description: ihji commented on a change in pull request #11847:
URL: https://github.com/apache/beam/pull/11847#discussion_r432142814



##
File path: sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py
##
@@ -0,0 +1,145 @@
+"""Integration test for Python cross-language pipelines for Java KafkaIO."""
+
+from __future__ import absolute_import
+
+import contextlib
+import logging
+import os
+import socket
+import subprocess
+import time
+import typing
+import unittest
+
+import grpc
+
+import apache_beam as beam
+from apache_beam.io.external.kafka import ReadFromKafka
+from apache_beam.io.external.kafka import WriteToKafka
+from apache_beam.metrics import Metrics
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.testing.test_pipeline import TestPipeline
+
+
+class CrossLanguageKafkaIO(object):
+  def __init__(self, bootstrap_servers, topic, expansion_service=None):
+self.bootstrap_servers = bootstrap_servers
+self.topic = topic
+self.expansion_service = expansion_service or (
+'localhost:%s' % os.environ.get('EXPANSION_PORT'))
+self.sum_counter = Metrics.counter('source', 'elements_sum')
+
+  def build_write_pipeline(self, pipeline):
+_ = (
+pipeline
+| 'Impulse' >> beam.Impulse()
+| 'Generate' >> beam.FlatMap(lambda x: range(1000)) # pylint: 
disable=range-builtin-not-iterating
+| 'Reshuffle' >> beam.Reshuffle()
+| 'MakeKV' >> beam.Map(lambda x:
+   (b'', str(x).encode())).with_output_types(
+   typing.Tuple[bytes, bytes])
+| 'WriteToKafka' >> WriteToKafka(
+producer_config={'bootstrap.servers': self.bootstrap_servers},
+topic=self.topic,
+expansion_service=self.expansion_service))
+
+  def build_read_pipeline(self, pipeline):
+_ = (
+pipeline
+| 'ReadFromKafka' >> ReadFromKafka(
+consumer_config={
+'bootstrap.servers': self.bootstrap_servers,
+'auto.offset.reset': 'earliest'
+},
+topics=[self.topic],
+expansion_service=self.expansion_service)
+| 'Windowing' >> beam.WindowInto(
+beam.window.FixedWindows(300),
+trigger=beam.transforms.trigger.AfterProcessingTime(60),
+accumulation_mode=beam.transforms.trigger.AccumulationMode.
+DISCARDING)
+| 'DecodingValue' >> beam.Map(lambda elem: int(elem[1].decode()))
+| 'CombineGlobally' >> beam.CombineGlobally(sum).without_defaults()
+| 'SetSumCounter' >> beam.Map(self.sum_counter.inc))
+
+  def run_xlang_kafkaio(self, pipeline):
+self.build_write_pipeline(pipeline)
+self.build_read_pipeline(pipeline)
+pipeline.run(False)
+
+
+@unittest.skipUnless(
+os.environ.get('LOCAL_KAFKA_JAR'),
+"LOCAL_KAFKA_JAR environment var is not provided.")
+@unittest.skipUnless(
+os.environ.get('EXPANSION_JAR'),
+"EXPANSION_JAR environment var is not provided.")
+class CrossLanguageKafkaIOTest(unittest.TestCase):
+  def get_open_port(self):
+s = None
+try:
+  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+except:  # pylint: disable=bare-except
+  # Above call will fail for nodes that only support IPv6.
+  s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM)
+s.bind(('localhost', 0))
+s.listen(1)
+port = s.getsockname()[1]
+s.close()
+return port
+
+  @contextlib.contextmanager
+  def local_services(self, expansion_service_jar_file, local_kafka_jar_file):
+expansion_service_port = str(self.get_open_port())
+kafka_port = str(self.get_open_port())
+zookeeper_port = str(self.get_open_port())
+
+expansion_server = None
+kafka_server = None
+try:
+  expansion_server = subprocess.Popen(
+  ['java', '-jar', expansion_service_jar_file, expansion_service_port])
+  kafka_server = subprocess.Popen(
+  ['java', '-jar', local_kafka_jar_file, kafka_port, zookeeper_port])
+  time.sleep(3)
+  channel_creds = grpc.local_channel_credentials()
+  with grpc.secure_channel('localhost:%s' % expansion_service_port,
+   channel_creds) as channel:
+grpc.channel_ready_future(channel).result()
+
+  yield expansion_service_port, kafka_port
+finally:
+  if expansion_server:
+expansion_server.kill()
+  if kafka_server:
+kafka_server.kill()
+
+  def get_options(self):
+options = PipelineOptions([
+

[jira] [Work logged] (BEAM-9198) BeamSQL aggregation analytics functionality

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9198?focusedWorklogId=438465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438465
 ]

ASF GitHub Bot logged work on BEAM-9198:


Author: ASF GitHub Bot
Created on: 28/May/20 21:44
Start Date: 28/May/20 21:44
Worklog Time Spent: 10m 
  Work Description: jhnmora000 closed pull request #11845:
URL: https://github.com/apache/beam/pull/11845


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438465)
Time Spent: 50m  (was: 40m)

> BeamSQL aggregation analytics functionality 
> 
>
> Key: BEAM-9198
> URL: https://issues.apache.org/jira/browse/BEAM-9198
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: John Mora
>Priority: P2
>  Labels: gsoc, gsoc2020, mentor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Mentor email: ruw...@google.com. Feel free to send emails for your questions.
> Project Information
> -
> BeamSQL has a long list of of aggregation/aggregation analytics 
> functionalities to support. 
> To begin with, you will need to support this syntax:
> {code:sql}
> analytic_function_name ( [ argument_list ] )
>   OVER (
> [ PARTITION BY partition_expression_list ]
> [ ORDER BY expression [{ ASC | DESC }] [, ...] ]
> [ window_frame_clause ]
>   )
> {code}
> As there is a long list of analytics functions, a good start point is support 
> rank() first.
> This will requires touch core components of BeamSQL:
> 1. SQL parser to support the syntax above.
> 2. SQL core to implement physical relational operator.
> 3. Distributed algorithms to implement a list of functions in a distributed 
> manner. 
> 4. Enable in ZetaSQL dialect.
> To understand what SQL analytics functionality is, you could check this great 
> explanation doc: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts.
> To know about Beam's programming model, check: 
> https://beam.apache.org/documentation/programming-guide/#overview



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9198) BeamSQL aggregation analytics functionality

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9198?focusedWorklogId=438464=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438464
 ]

ASF GitHub Bot logged work on BEAM-9198:


Author: ASF GitHub Bot
Created on: 28/May/20 21:44
Start Date: 28/May/20 21:44
Worklog Time Spent: 10m 
  Work Description: jhnmora000 commented on pull request #11845:
URL: https://github.com/apache/beam/pull/11845#issuecomment-635624778


   Thanks for your help @amaliujia . I will close this PR and continue 
experimenting with BeamSQL/Calcite. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438464)
Time Spent: 40m  (was: 0.5h)

> BeamSQL aggregation analytics functionality 
> 
>
> Key: BEAM-9198
> URL: https://issues.apache.org/jira/browse/BEAM-9198
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: John Mora
>Priority: P2
>  Labels: gsoc, gsoc2020, mentor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Mentor email: ruw...@google.com. Feel free to send emails for your questions.
> Project Information
> -
> BeamSQL has a long list of of aggregation/aggregation analytics 
> functionalities to support. 
> To begin with, you will need to support this syntax:
> {code:sql}
> analytic_function_name ( [ argument_list ] )
>   OVER (
> [ PARTITION BY partition_expression_list ]
> [ ORDER BY expression [{ ASC | DESC }] [, ...] ]
> [ window_frame_clause ]
>   )
> {code}
> As there is a long list of analytics functions, a good start point is support 
> rank() first.
> This will requires touch core components of BeamSQL:
> 1. SQL parser to support the syntax above.
> 2. SQL core to implement physical relational operator.
> 3. Distributed algorithms to implement a list of functions in a distributed 
> manner. 
> 4. Enable in ZetaSQL dialect.
> To understand what SQL analytics functionality is, you could check this great 
> explanation doc: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts.
> To know about Beam's programming model, check: 
> https://beam.apache.org/documentation/programming-guide/#overview



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9876) Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-28 Thread Aizhamal Nurmamat kyzy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aizhamal Nurmamat kyzy resolved BEAM-9876.
--
Fix Version/s: 2.21.0
   Resolution: Fixed

> Migrate the Beam website from Jekyll to Hugo to enable localization of the 
> site content
> ---
>
> Key: BEAM-9876
> URL: https://issues.apache.org/jira/browse/BEAM-9876
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Aizhamal Nurmamat kyzy
>Assignee: Aizhamal Nurmamat kyzy
>Priority: P2
> Fix For: 2.21.0
>
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> Enable internationalization of the Apache Beam website to increase the reach 
> of the project, and facilitate adoption and growth of its community.
> The proposal was to do this by migrating the current Apache Beam website from 
> Jekyll do Hugo [1]. Hugo supports internationalization out-of-the-box, making 
> it easier both for contributors and maintainers support the 
> internationalization effort.
> The further discussion on implementation can be viewed here  [2]
> [1] 
> [https://lists.apache.org/thread.html/rfab4cc1411318c3f4667bee051df68f37be11846ada877f3576c41a9%40%3Cdev.beam.apache.org%3E]
> [2] 
> [https://lists.apache.org/thread.html/r6b999b6d7d1f6cbb94e16bb2deed2b65098a6b14c4ac98707fe0c36a%40%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=438462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438462
 ]

ASF GitHub Bot logged work on BEAM-10097:
-

Author: ASF GitHub Bot
Created on: 28/May/20 21:35
Start Date: 28/May/20 21:35
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-635620991


   Run Java Flink PortableValidatesRunner Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438462)
Time Spent: 2h 40m  (was: 2.5h)

> Migrate PCollection views to use both iterable and multimap 
> materializations/access patterns
> 
>
> Key: BEAM-10097
> URL: https://issues.apache.org/jira/browse/BEAM-10097
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, 
> map, multimap.
> We should be using the primitive views (iterable, multimap) directly without 
> going through the naive mapping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=438463=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438463
 ]

ASF GitHub Bot logged work on BEAM-10097:
-

Author: ASF GitHub Bot
Created on: 28/May/20 21:35
Start Date: 28/May/20 21:35
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-635621052







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438463)
Time Spent: 2h 50m  (was: 2h 40m)

> Migrate PCollection views to use both iterable and multimap 
> materializations/access patterns
> 
>
> Key: BEAM-10097
> URL: https://issues.apache.org/jira/browse/BEAM-10097
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, 
> map, multimap.
> We should be using the primitive views (iterable, multimap) directly without 
> going through the naive mapping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=438461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438461
 ]

ASF GitHub Bot logged work on BEAM-10097:
-

Author: ASF GitHub Bot
Created on: 28/May/20 21:35
Start Date: 28/May/20 21:35
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-635620885







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438461)
Time Spent: 2.5h  (was: 2h 20m)

> Migrate PCollection views to use both iterable and multimap 
> materializations/access patterns
> 
>
> Key: BEAM-10097
> URL: https://issues.apache.org/jira/browse/BEAM-10097
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, 
> map, multimap.
> We should be using the primitive views (iterable, multimap) directly without 
> going through the naive mapping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9935) Resolve differences in allowed_split_point implementations

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9935?focusedWorklogId=438460=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438460
 ]

ASF GitHub Bot logged work on BEAM-9935:


Author: ASF GitHub Bot
Created on: 28/May/20 21:31
Start Date: 28/May/20 21:31
Worklog Time Spent: 10m 
  Work Description: youngoli merged pull request #11791:
URL: https://github.com/apache/beam/pull/11791


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438460)
Time Spent: 3h 20m  (was: 3h 10m)

> Resolve differences in allowed_split_point implementations
> --
>
> Key: BEAM-9935
> URL: https://issues.apache.org/jira/browse/BEAM-9935
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Daniel Oliveira
>Priority: P2
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> [Java SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/BeamFnDataReadRunner.java#L223]
>  doesn't support it yet which is also safe.
> [Go SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L273]
>  only supports splits if points are specified and it doesn't use the fraction 
> at all.
> [Python SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/python/apache_beam/runners/worker/bundle_processor.py#L947]
>  ignores the split points meaning that it may return an invalid split 
> location based upon the runners limitations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10144) Update pipeline options snippets for best practices

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10144?focusedWorklogId=438458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438458
 ]

ASF GitHub Bot logged work on BEAM-10144:
-

Author: ASF GitHub Bot
Created on: 28/May/20 21:29
Start Date: 28/May/20 21:29
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #11851:
URL: https://github.com/apache/beam/pull/11851#issuecomment-635618216


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438458)
Time Spent: 1h 10m  (was: 1h)

> Update pipeline options snippets for best practices
> ---
>
> Key: BEAM-10144
> URL: https://issues.apache.org/jira/browse/BEAM-10144
> Project: Beam
>  Issue Type: Improvement
>  Components: examples-python
>Reporter: David Cavazos
>Assignee: David Cavazos
>Priority: P3
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10078) uniquify Dataflow specific jars when staging

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10078?focusedWorklogId=438453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438453
 ]

ASF GitHub Bot logged work on BEAM-10078:
-

Author: ASF GitHub Bot
Created on: 28/May/20 21:23
Start Date: 28/May/20 21:23
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #11814:
URL: https://github.com/apache/beam/pull/11814#issuecomment-635615410


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438453)
Time Spent: 2h  (was: 1h 50m)

> uniquify Dataflow specific jars when staging
> 
>
> Key: BEAM-10078
> URL: https://issues.apache.org/jira/browse/BEAM-10078
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> After BEAM-9383, Dataflow specific jars (dataflow-worker.jar, windmill_main) 
> could be overwritten when two or more jobs share the same staging location. 
> Since they 1) should have specific predefined names AND 2) should have unique 
> location for avoiding collision, they need special handling when staging 
> artifacts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9935) Resolve differences in allowed_split_point implementations

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9935?focusedWorklogId=438452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438452
 ]

ASF GitHub Bot logged work on BEAM-9935:


Author: ASF GitHub Bot
Created on: 28/May/20 21:22
Start Date: 28/May/20 21:22
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #11791:
URL: https://github.com/apache/beam/pull/11791#issuecomment-635615190


   Run Go PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438452)
Time Spent: 3h 10m  (was: 3h)

> Resolve differences in allowed_split_point implementations
> --
>
> Key: BEAM-9935
> URL: https://issues.apache.org/jira/browse/BEAM-9935
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Daniel Oliveira
>Priority: P2
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> [Java SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/BeamFnDataReadRunner.java#L223]
>  doesn't support it yet which is also safe.
> [Go SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L273]
>  only supports splits if points are specified and it doesn't use the fraction 
> at all.
> [Python SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/python/apache_beam/runners/worker/bundle_processor.py#L947]
>  ignores the split points meaning that it may return an invalid split 
> location based upon the runners limitations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=438450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438450
 ]

ASF GitHub Bot logged work on BEAM-10097:
-

Author: ASF GitHub Bot
Created on: 28/May/20 21:19
Start Date: 28/May/20 21:19
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-635613803


   Run Spark ValidatesRunner



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438450)
Time Spent: 2h 20m  (was: 2h 10m)

> Migrate PCollection views to use both iterable and multimap 
> materializations/access patterns
> 
>
> Key: BEAM-10097
> URL: https://issues.apache.org/jira/browse/BEAM-10097
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, 
> map, multimap.
> We should be using the primitive views (iterable, multimap) directly without 
> going through the naive mapping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8910) Use AVRO instead of JSON in BigQuery bounded source.

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8910?focusedWorklogId=438440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438440
 ]

ASF GitHub Bot logged work on BEAM-8910:


Author: ASF GitHub Bot
Created on: 28/May/20 21:05
Start Date: 28/May/20 21:05
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #11086:
URL: https://github.com/apache/beam/pull/11086#discussion_r432112443



##
File path: sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py
##
@@ -254,11 +256,36 @@ def test_big_query_new_types(self):
 'output_schema': NEW_TYPES_OUTPUT_SCHEMA,
 'use_standard_sql': False,
 'wait_until_finish_duration': WAIT_UNTIL_FINISH_DURATION_MS,
+'use_json_exports': True,
 'on_success_matcher': all_of(*pipeline_verifiers)
 }
 options = self.test_pipeline.get_full_options_as_args(**extra_opts)
 big_query_query_to_table_pipeline.run_bq_pipeline(options)
 
+  @attr('IT')
+  def test_big_query_new_types(self):

Review comment:
   Looks like we run this test internally using this name. Since this test 
now changes to  `use_beam_bq_sink`, please check that  maintain reasonable 
internal test coverage for the native sink, or use a new test name for new 
behavior.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438440)
Time Spent: 14h 20m  (was: 14h 10m)

> Use AVRO instead of JSON in BigQuery bounded source.
> 
>
> Key: BEAM-8910
> URL: https://issues.apache.org/jira/browse/BEAM-8910
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kamil Wasilewski
>Assignee: Pablo Estrada
>Priority: P3
>  Time Spent: 14h 20m
>  Remaining Estimate: 0h
>
> The proposed BigQuery bounded source in Python SDK (see PR: 
> [https://github.com/apache/beam/pull/9772)] uses a BigQuery export job to 
> take a snapshot of the table and read from each produced JSON file. A 
> performance improvement can be gain by switching to AVRO instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9852) C-ares status is not ARES_SUCCESS: Misformatted domain name

2020-05-28 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119077#comment-17119077
 ] 

Kyle Weaver commented on BEAM-9852:
---

The error message was improved in GRPC, so hopefully we will get more debugging 
information in the future. https://github.com/grpc/grpc/pull/22865 In the mean 
time, it's difficult to ascertain the root cause.

> C-ares status is not ARES_SUCCESS: Misformatted domain name
> ---
>
> Key: BEAM-9852
> URL: https://issues.apache.org/jira/browse/BEAM-9852
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink, runner-spark
>Reporter: Kyle Weaver
>Priority: P2
>  Labels: portability-flink, portability-spark
>
> This affects both Flink and Spark portable runners. It does not appear to 
> cause pipelines to fail.
> Exception in thread read_grpc_client_inputs:
> Traceback (most recent call last):
> File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
> self.run()
> File "/usr/local/lib/python3.7/threading.py", line 870, in run
> self._target(*self._args, **self._kwargs)
> File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 545, in 
> target=lambda: self._read_inputs(elements_iterator),
> File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 528, in _read_inputs
> for elements in elements_iterator:
> File "/usr/local/lib/python3.7/site-packages/grpc/channel.py", line 388, in 
> __next_
> return self._next()
> File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 365, in 
> _next
> raise self
> grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
> status = StatusCode.UNAVAILABLE
> details = "DNS resolution failed"
> debug_error_string = 
> "{"created":"@1587426512.443144965","description":"Failed to pick 
> subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3876,"referenced_errors":[{"created":"@1587426512.443142363","description":"Resolver
>  transient 
> failure","file":"src/core/ext/filters/client_channel/resolving_lb_policy.cc","file_line":263,"referenced_errors":[{"created":"@1587426512.443141313","description":"DNS
>  resolution 
> failed","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc","file_line":357,"grpc_status":14,"referenced_errors":[{"created":"@1587426512.443136986","description":"C-ares
>  status is not ARES_SUCCESS: Misformatted domain 
> name","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":244,"referenced_errors":[
> {"created":"@1587426512.443126564","description":"C-ares status is not 
> ARES_SUCCESS: Misformatted domain 
> name","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":244}
> ]}]}]}]}"
> >



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7774) Stop usign Perfkit Benchmarker tool in Python performance tests

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7774?focusedWorklogId=438426=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438426
 ]

ASF GitHub Bot logged work on BEAM-7774:


Author: ASF GitHub Bot
Created on: 28/May/20 20:36
Start Date: 28/May/20 20:36
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #11661:
URL: https://github.com/apache/beam/pull/11661#issuecomment-635593321


   @kamilwu - please merge once this looks good to you, I don't have other 
input here.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438426)
Time Spent: 5h 50m  (was: 5h 40m)

> Stop usign Perfkit Benchmarker tool in Python performance tests
> ---
>
> Key: BEAM-7774
> URL: https://issues.apache.org/jira/browse/BEAM-7774
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Piotr Szuberski
>Priority: P2
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7774) Stop usign Perfkit Benchmarker tool in Python performance tests

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7774?focusedWorklogId=438425=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438425
 ]

ASF GitHub Bot logged work on BEAM-7774:


Author: ASF GitHub Bot
Created on: 28/May/20 20:34
Start Date: 28/May/20 20:34
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #11661:
URL: https://github.com/apache/beam/pull/11661#discussion_r432107251



##
File path: 
.test-infra/metrics/grafana/dashboards/perftests_metrics/Python_Performance_Tests.json
##
@@ -77,7 +77,7 @@
   ],
   "orderByTime": "ASC",
   "policy": "default",
-  "query": "SELECT mean(\"value\") FROM \"wordcount_py27_results\" 
WHERE metric = 'Python performance test' AND $timeFilter GROUP BY 
time($__interval),  \"metric\"",
+  "query": "SELECT mean(\"value\") FROM \"wordcount_py27_results\" 
WHERE metric = 'wordcount_it_runtime' AND $timeFilter GROUP BY 
time($__interval),  \"metric\"",

Review comment:
   Seeing this query now - yes, I wound just keep the metric 'runtime', 
since we already know it is wordcount_py27_results, and it would be simpler 
that pipeline does not need to know the name of the suite. In the future we 
might add different metrics like 'cost' or total cputime consumed by other 
workers as opposed to runtime.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438425)
Time Spent: 5h 40m  (was: 5.5h)

> Stop usign Perfkit Benchmarker tool in Python performance tests
> ---
>
> Key: BEAM-7774
> URL: https://issues.apache.org/jira/browse/BEAM-7774
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Piotr Szuberski
>Priority: P2
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10078) uniquify Dataflow specific jars when staging

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10078?focusedWorklogId=438424=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438424
 ]

ASF GitHub Bot logged work on BEAM-10078:
-

Author: ASF GitHub Bot
Created on: 28/May/20 20:32
Start Date: 28/May/20 20:32
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11814:
URL: https://github.com/apache/beam/pull/11814#issuecomment-635591331







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438424)
Time Spent: 1h 50m  (was: 1h 40m)

> uniquify Dataflow specific jars when staging
> 
>
> Key: BEAM-10078
> URL: https://issues.apache.org/jira/browse/BEAM-10078
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> After BEAM-9383, Dataflow specific jars (dataflow-worker.jar, windmill_main) 
> could be overwritten when two or more jobs share the same staging location. 
> Since they 1) should have specific predefined names AND 2) should have unique 
> location for avoiding collision, they need special handling when staging 
> artifacts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7774) Stop usign Perfkit Benchmarker tool in Python performance tests

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7774?focusedWorklogId=438423=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438423
 ]

ASF GitHub Bot logged work on BEAM-7774:


Author: ASF GitHub Bot
Created on: 28/May/20 20:31
Start Date: 28/May/20 20:31
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #11661:
URL: https://github.com/apache/beam/pull/11661#discussion_r432106093



##
File path: sdks/python/apache_beam/examples/wordcount_it_test.py
##
@@ -84,11 +87,45 @@ def _run_wordcount_it(self, run_wordcount, **opts):
 # Register clean up before pipeline execution
 self.addCleanup(delete_files, [test_output + '*'])
 
+publish_to_bq = bool(
+test_pipeline.get_option('publish_to_big_query') or False)
+
+# Start measure time for performance test
+start_time = time.time()
+
 # Get pipeline options from command argument: --test-pipeline-options,
 # and start pipeline job by calling pipeline main function.
 run_wordcount(
 test_pipeline.get_full_options_as_args(**extra_opts),
-save_main_session=False)
+save_main_session=False,
+)
+
+end_time = time.time()
+run_time = end_time - start_time
+
+if publish_to_bq:
+  self._publish_metrics(test_pipeline, run_time)
+
+  def _publish_metrics(self, pipeline, metric_value):
+influx_options = InfluxDBMetricsPublisherOptions(
+pipeline.get_option('influx_measurement'),
+pipeline.get_option('influx_db_name'),
+pipeline.get_option('influx_hostname'),
+os.getenv('INFLUXDB_USER'),
+os.getenv('INFLUXDB_USER_PASSWORD'),
+)
+metric_reader = MetricsReader(
+project_name=pipeline.get_option('project'),
+bq_table=pipeline.get_option('metrics_table'),
+bq_dataset=pipeline.get_option('metrics_dataset'),
+publish_to_bq=True,
+influxdb_options=influx_options,
+)
+
+metric_reader.publish_values((
+metric_value,

Review comment:
   Do we need "wordcount_it" in the metric name? It depends on how these 
metrics will be stored, if they are already associated in the database with a 
test suite that launches the pipeline  `Python WordCount IT Benchmarks`, then 
this information is captured and perhaps we don't need to repeat it in two 
different places. Leaving this up to you and @kamilwu.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438423)
Time Spent: 5.5h  (was: 5h 20m)

> Stop usign Perfkit Benchmarker tool in Python performance tests
> ---
>
> Key: BEAM-7774
> URL: https://issues.apache.org/jira/browse/BEAM-7774
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Piotr Szuberski
>Priority: P2
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9082) "Socket closed" Spurious GRPC errors in Flink/Spark runner log output

2020-05-28 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119056#comment-17119056
 ] 

Kyle Weaver commented on BEAM-9082:
---

Thanks Daniel. FYI I filed a separate issue for 1. (BEAM-9852).

> "Socket closed" Spurious GRPC errors in Flink/Spark runner log output
> -
>
> Key: BEAM-9082
> URL: https://issues.apache.org/jira/browse/BEAM-9082
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink, runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: portability-flink, portability-spark
>
> We often see "Socket closed" errors on job shutdown, even though the pipeline 
> has finished successfully. They are misleading and especially annoying at 
> scale.
> ERROR:root:Failed to read inputs in the data plane.
> ...
> grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that 
> terminated with:
>   status = StatusCode.UNAVAILABLE
>   details = "Socket closed"
>   debug_error_string = 
> "{"created":"@1578597616.309419460","description":"Error received from peer 
> ipv6:[::1]:37211","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Socket
>  closed","grpc_status":14}"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9220) Add use_runner_v2 argument for dataflow

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9220?focusedWorklogId=438422=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438422
 ]

ASF GitHub Bot logged work on BEAM-9220:


Author: ASF GitHub Bot
Created on: 28/May/20 20:29
Start Date: 28/May/20 20:29
Worklog Time Spent: 10m 
  Work Description: lostluck merged pull request #11207:
URL: https://github.com/apache/beam/pull/11207


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438422)
Time Spent: 4h  (was: 3h 50m)

> Add use_runner_v2 argument for dataflow
> ---
>
> Key: BEAM-9220
> URL: https://issues.apache.org/jira/browse/BEAM-9220
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Ankur Goenka
>Priority: P2
> Fix For: 2.20.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10027) Support for Kotlin-based Beam Katas

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10027?focusedWorklogId=438421=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438421
 ]

ASF GitHub Bot logged work on BEAM-10027:
-

Author: ASF GitHub Bot
Created on: 28/May/20 20:28
Start Date: 28/May/20 20:28
Worklog Time Spent: 10m 
  Work Description: rionmonster commented on a change in pull request 
#11761:
URL: https://github.com/apache/beam/pull/11761#discussion_r432104332



##
File path: learning/katas/kotlin/Windowing/Fixed Time Window/Fixed Time 
Window/test/org/apache/beam/learning/katas/windowing/fixedwindow/WindowedEvent.kt
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.learning.katas.windowing.fixedwindow
+
+import java.io.Serializable
+import java.util.*
+
+class WindowedEvent(private val event: String?, private val count: Long, 
private val window: String) : Serializable {

Review comment:
   Done!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438421)
Time Spent: 11h 20m  (was: 11h 10m)

> Support for Kotlin-based Beam Katas
> ---
>
> Key: BEAM-10027
> URL: https://issues.apache.org/jira/browse/BEAM-10027
> Project: Beam
>  Issue Type: Improvement
>  Components: katas
>Reporter: Rion Williams
>Assignee: Rion Williams
>Priority: P2
>   Original Estimate: 8h
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> Currently, there are a series of examples available demonstrating the use of 
> Apache Beam with Kotlin. It would be nice to have support for the same Beam 
> Katas that exist for Python, Go, and Java to also support Kotlin. 
> The port itself shouldn't be that involved since it can still target the JVM, 
> so it would likely just require the inclusion for Kotlin dependencies and a 
> conversion for all of the existing Java examples. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10027) Support for Kotlin-based Beam Katas

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10027?focusedWorklogId=438420=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438420
 ]

ASF GitHub Bot logged work on BEAM-10027:
-

Author: ASF GitHub Bot
Created on: 28/May/20 20:25
Start Date: 28/May/20 20:25
Worklog Time Spent: 10m 
  Work Description: rionmonster commented on a change in pull request 
#11761:
URL: https://github.com/apache/beam/pull/11761#discussion_r432103074



##
File path: learning/katas/kotlin/Core 
Transforms/Combine/CombineFn/src/org/apache/beam/learning/katas/coretransforms/combine/combinefn/Task.kt
##
@@ -0,0 +1,99 @@
+/*
+ *  Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+package org.apache.beam.learning.katas.coretransforms.combine.combinefn
+
+import 
org.apache.beam.learning.katas.coretransforms.combine.combinefn.Task.AverageFn.Accum
+import org.apache.beam.learning.katas.util.Log
+import org.apache.beam.sdk.Pipeline
+import org.apache.beam.sdk.options.PipelineOptionsFactory
+import org.apache.beam.sdk.transforms.Combine
+import org.apache.beam.sdk.transforms.Combine.CombineFn
+import org.apache.beam.sdk.transforms.Create
+import org.apache.beam.sdk.values.PCollection
+import java.io.Serializable
+import java.util.*
+
+object Task {
+@JvmStatic
+fun main(args: Array) {
+val options = PipelineOptionsFactory.fromArgs(*args).create()
+val pipeline = Pipeline.create(options)
+
+val numbers = pipeline.apply(Create.of(10, 20, 50, 70, 90))
+
+val output = applyTransform(numbers)
+
+output.apply(Log.ofElements())
+
+pipeline.run()
+}
+
+@JvmStatic
+fun applyTransform(input: PCollection): PCollection {
+return input.apply(Combine.globally(AverageFn()))
+}
+
+internal class AverageFn : CombineFn() {
+
+internal inner class Accum : Serializable {

Review comment:
   Done

##
File path: 
learning/katas/kotlin/util/test/org/apache/beam/learning/katas/util/ContainsKvs.kt
##
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.learning.katas.util
+
+import com.google.common.collect.ImmutableList
+import com.google.common.collect.Iterables
+import org.apache.beam.sdk.transforms.SerializableFunction
+import org.apache.beam.sdk.values.KV
+import org.hamcrest.CoreMatchers
+import org.hamcrest.Matcher
+import org.hamcrest.collection.IsIterableContainingInAnyOrder
+import org.junit.Assert
+import java.util.*
+
+class ContainsKvs private constructor(private val expectedKvs: List>>) : SerializableFunction>>, Void?> {
+
+companion object {
+@SafeVarargs
+fun containsKvs(vararg kvs: KV>): 
SerializableFunction>>, Void?> {
+return ContainsKvs(ImmutableList.copyOf(kvs))
+}
+}
+
+override fun apply(input: Iterable>>): Void? {
+val matchers: MutableList>>> = 
ArrayList()
+
+for (expected in expectedKvs) {
+val values = Iterables.toArray(expected.value, String::class.java)
+
matchers.add(KvMatcher.Companion.isKv(CoreMatchers.equalTo(expected.key), 
IsIterableContainingInAnyOrder.containsInAnyOrder(*values)))

Review comment:
   Done!





This is an automated message from the Apache Git Service.

[jira] [Work logged] (BEAM-10144) Update pipeline options snippets for best practices

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10144?focusedWorklogId=438419=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438419
 ]

ASF GitHub Bot logged work on BEAM-10144:
-

Author: ASF GitHub Bot
Created on: 28/May/20 20:24
Start Date: 28/May/20 20:24
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11851:
URL: https://github.com/apache/beam/pull/11851#issuecomment-635587627


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438419)
Time Spent: 1h  (was: 50m)

> Update pipeline options snippets for best practices
> ---
>
> Key: BEAM-10144
> URL: https://issues.apache.org/jira/browse/BEAM-10144
> Project: Beam
>  Issue Type: Improvement
>  Components: examples-python
>Reporter: David Cavazos
>Assignee: David Cavazos
>Priority: P3
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7774) Stop usign Perfkit Benchmarker tool in Python performance tests

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7774?focusedWorklogId=438418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438418
 ]

ASF GitHub Bot logged work on BEAM-7774:


Author: ASF GitHub Bot
Created on: 28/May/20 20:22
Start Date: 28/May/20 20:22
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #11661:
URL: https://github.com/apache/beam/pull/11661#discussion_r432101289



##
File path: 
.test-infra/metrics/grafana/dashboards/perftests_metrics/Python_Performance_Tests.json
##
@@ -0,0 +1,297 @@
+{

Review comment:
   >  exporting a new dashboard to JSON file
   
   Thanks, does this mean: creating a new dashboard manually via UI?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438418)
Time Spent: 5h 20m  (was: 5h 10m)

> Stop usign Perfkit Benchmarker tool in Python performance tests
> ---
>
> Key: BEAM-7774
> URL: https://issues.apache.org/jira/browse/BEAM-7774
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Piotr Szuberski
>Priority: P2
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10078) uniquify Dataflow specific jars when staging

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10078?focusedWorklogId=438417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438417
 ]

ASF GitHub Bot logged work on BEAM-10078:
-

Author: ASF GitHub Bot
Created on: 28/May/20 20:19
Start Date: 28/May/20 20:19
Worklog Time Spent: 10m 
  Work Description: ihji commented on a change in pull request #11814:
URL: https://github.com/apache/beam/pull/11814#discussion_r432099933



##
File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java
##
@@ -397,10 +397,21 @@ public static PackageAttributes forFileToStage(
 String.format("Non-existent file to stage: %s", 
file.getAbsolutePath()));
   }
   checkState(!file.isDirectory(), "Source file must not be a directory.");
+  String target;
+  switch (dest) {
+case "dataflow-worker.jar":

Review comment:
   Put some comments and logging





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 438417)
Time Spent: 1h 40m  (was: 1.5h)

> uniquify Dataflow specific jars when staging
> 
>
> Key: BEAM-10078
> URL: https://issues.apache.org/jira/browse/BEAM-10078
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> After BEAM-9383, Dataflow specific jars (dataflow-worker.jar, windmill_main) 
> could be overwritten when two or more jobs share the same staging location. 
> Since they 1) should have specific predefined names AND 2) should have unique 
> location for avoiding collision, they need special handling when staging 
> artifacts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10146) Clean Docker images after tests

2020-05-28 Thread Kyle Weaver (Jira)
Kyle Weaver created BEAM-10146:
--

 Summary: Clean Docker images after tests
 Key: BEAM-10146
 URL: https://issues.apache.org/jira/browse/BEAM-10146
 Project: Beam
  Issue Type: Improvement
  Components: testing
Reporter: Kyle Weaver


We build Docker images for many tests, but as far as I know we never clean them 
up. On Jenkins, we prune Docker images every day 
(https://github.com/apache/beam/blob/b0844c9326841f8ff30950b526015b23e6c3af9b/.test-infra/jenkins/job_Inventory.groovy#L69).
 But this is an issue for developers' workstations. Over time this can result 
in O(100GB) disk usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10107) beam website PR listed twice in release guide with contradictory instructions

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10107?focusedWorklogId=438416=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438416
 ]

ASF GitHub Bot logged work on BEAM-10107:
-

Author: ASF GitHub Bot
Created on: 28/May/20 20:10
Start Date: 28/May/20 20:10
Worklog Time Spent: 10m 
  Work Description: ibzib opened a new pull request #11852:
URL: https://github.com/apache/beam/pull/11852


   …ase guide.
   
   Just some minor docs cleanup.
   
   R: @TheNeuralBit 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[jira] [Updated] (BEAM-10145) Kafka IO performance tests leaving behind unused disks on apache-beam-testing

2020-05-28 Thread Udi Meiri (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri updated BEAM-10145:
-
Attachment: VfRJduZigE1.png

> Kafka IO performance tests leaving behind unused disks on apache-beam-testing
> -
>
> Key: BEAM-10145
> URL: https://issues.apache.org/jira/browse/BEAM-10145
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Udi Meiri
>Priority: P2
> Attachments: VfRJduZigE1.png
>
>
> Sample disk description:
> {"kubernetes.io/created-for/pv/name":"pvc-97dd8abb-a0ac-11ea-aa65-42010a80013b","kubernetes.io/created-for/pvc/name":"data-pzoo-0","kubernetes.io/created-for/pvc/namespace":"beam-performancetests-kafka-io-826"}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >