[jira] [Work logged] (BEAM-9770) Add BigQuery DeadLetter pattern to Patterns Page

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9770?focusedWorklogId=437665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437665
 ]

ASF GitHub Bot logged work on BEAM-9770:


Author: ASF GitHub Bot
Created on: 27/May/20 05:57
Start Date: 27/May/20 05:57
Worklog Time Spent: 10m 
  Work Description: aaltay merged pull request #11815:
URL: https://github.com/apache/beam/pull/11815


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437665)
Time Spent: 5h 50m  (was: 5h 40m)

> Add BigQuery DeadLetter pattern to Patterns Page
> 
>
> Key: BEAM-9770
> URL: https://issues.apache.org/jira/browse/BEAM-9770
> Project: Beam
>  Issue Type: New Feature
>  Components: website
>Reporter: Reza ardeshir rokni
>Assignee: Reza ardeshir rokni
>Priority: P4
>  Labels: pipeline-patterns
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10090) Java 11 Precommit failing tasks

2020-05-26 Thread Pawel Pasterz (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pawel Pasterz updated BEAM-10090:
-
Description: 
This is the aggregate for all failing tasks in Java 11 Precommit job

 

Jenkins job URL [https://builds.apache.org/job/beam_PreCommit_Java11_Phrase/11/]

Groovy script
{code:groovy}
import PrecommitJobBuilder
import CommonJobProperties as properties

PrecommitJobBuilder builder = new PrecommitJobBuilder(
scope: this,
nameBase: 'Java11',
gradleTask: ':javaPreCommit',
commitTriggering: false,
gradleSwitches: [
'-Pdockerfile=Dockerfile-java11',
'-PdisableSpotlessCheck=true',
'-PcompileAndRunTestsWithJava11',
"-Pjava11Home=${properties.JAVA_11_HOME}",
'--info' // for debug purposes
], // spotless checked in separate pre-commit
triggerPathPatterns: [
  '^model/.*$',
  '^sdks/java/.*$',
  '^runners/.*$',
  '^examples/java/.*$',
  '^examples/kotlin/.*$',
  '^release/.*$',
],
excludePathPatterns: [
  '^sdks/java/extensions/sql/.*$'
]
)
builder.build {
  publishers {
archiveJunit('**/build/test-results/**/*.xml')
recordIssues {
  tools {
errorProne()
java()
checkStyle {
  pattern('**/build/reports/checkstyle/*.xml')
}
configure { node ->
  node / 'spotBugs' << 'io.jenkins.plugins.analysis.warnings.SpotBugs' {
pattern('**/build/reports/spotbugs/*.xml')
  }
   }
  }
  enabledForFailure(true)
}
jacocoCodeCoverage {
  execPattern('**/build/jacoco/*.exec')
}
  }
}
{code}

  was:
This is aggregate for all failing tasks in Java 11 Precommit job

 

Jenkins job URL [https://builds.apache.org/job/beam_PreCommit_Java11_Phrase/11/]

Groovy script
{code:groovy}
import PrecommitJobBuilder
import CommonJobProperties as properties

PrecommitJobBuilder builder = new PrecommitJobBuilder(
scope: this,
nameBase: 'Java11',
gradleTask: ':javaPreCommit',
commitTriggering: false,
gradleSwitches: [
'-Pdockerfile=Dockerfile-java11',
'-PdisableSpotlessCheck=true',
'-PcompileAndRunTestsWithJava11',
"-Pjava11Home=${properties.JAVA_11_HOME}",
'--info' // for debug purposes
], // spotless checked in separate pre-commit
triggerPathPatterns: [
  '^model/.*$',
  '^sdks/java/.*$',
  '^runners/.*$',
  '^examples/java/.*$',
  '^examples/kotlin/.*$',
  '^release/.*$',
],
excludePathPatterns: [
  '^sdks/java/extensions/sql/.*$'
]
)
builder.build {
  publishers {
archiveJunit('**/build/test-results/**/*.xml')
recordIssues {
  tools {
errorProne()
java()
checkStyle {
  pattern('**/build/reports/checkstyle/*.xml')
}
configure { node ->
  node / 'spotBugs' << 'io.jenkins.plugins.analysis.warnings.SpotBugs' {
pattern('**/build/reports/spotbugs/*.xml')
  }
   }
  }
  enabledForFailure(true)
}
jacocoCodeCoverage {
  execPattern('**/build/jacoco/*.exec')
}
  }
}
{code}


> Java 11 Precommit failing tasks
> ---
>
> Key: BEAM-10090
> URL: https://issues.apache.org/jira/browse/BEAM-10090
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Pawel Pasterz
>Priority: P2
>
> This is the aggregate for all failing tasks in Java 11 Precommit job
>  
> Jenkins job URL 
> [https://builds.apache.org/job/beam_PreCommit_Java11_Phrase/11/]
> Groovy script
> {code:groovy}
> import PrecommitJobBuilder
> import CommonJobProperties as properties
> PrecommitJobBuilder builder = new PrecommitJobBuilder(
> scope: this,
> nameBase: 'Java11',
> gradleTask: ':javaPreCommit',
> commitTriggering: false,
> gradleSwitches: [
> '-Pdockerfile=Dockerfile-java11',
> '-PdisableSpotlessCheck=true',
> '-PcompileAndRunTestsWithJava11',
> "-Pjava11Home=${properties.JAVA_11_HOME}",
> '--info' // for debug purposes
> ], // spotless checked in separate pre-commit
> triggerPathPatterns: [
>   '^model/.*$',
>   '^sdks/java/.*$',
>   '^runners/.*$',
>   '^examples/java/.*$',
>   '^examples/kotlin/.*$',
>   '^release/.*$',
> ],
> excludePathPatterns: [
>   '^sdks/java/extensions/sql/.*$'
> ]
> )
> builder.build {
>   publishers {
> archiveJunit('**/build/test-results/**/*.xml')
> recordIssues {
>   tools {
> errorProne()
> java()
> checkStyle {
>   pattern('**/build/reports/checkstyle/*.xml')
> }
> configure { node ->
>   node / 'spotBugs' << 
> 

[jira] [Work logged] (BEAM-9916) Update IOs documentation links

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9916?focusedWorklogId=437657=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437657
 ]

ASF GitHub Bot logged work on BEAM-9916:


Author: ASF GitHub Bot
Created on: 27/May/20 05:30
Start Date: 27/May/20 05:30
Worklog Time Spent: 10m 
  Work Description: datancoffee commented on pull request #11802:
URL: https://github.com/apache/beam/pull/11802#issuecomment-634437572


   > FhirIO and HL7v2IO are not yet part of a release (they will be in 2.22.0 
and 2.21.0 respectively), so we can skip that update for now. @jaketf will add 
them once their corresponding releases are out.
   
   Sounds good. Thanks, Pablo!
   New page looks much more readable than before (and can be extended with new 
sections)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437657)
Time Spent: 1.5h  (was: 1h 20m)

> Update IOs documentation links
> --
>
> Key: BEAM-9916
> URL: https://issues.apache.org/jira/browse/BEAM-9916
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Alexey Romanenko
>Assignee: Ashwin Ramaswami
>Priority: P3
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently, on the webpage 
> [https://beam.apache.org/documentation/io/built-in/] , we link all IOs to 
> their code on github, which could be quite odd for users. After a 
> [discussion|https://lists.apache.org/thread.html/r2c7bb9a2e13f1b8c296930fc8a36edafe731f49b5182567989afcb98%40%3Cdev.beam.apache.org%3E]
>  on dev@, it was decided to link them to the latest doc reference (like 
> [https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileSystems.html]
>  for FileIO).
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10067) Minify website assets with --minify flag

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10067?focusedWorklogId=437655=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437655
 ]

ASF GitHub Bot logged work on BEAM-10067:
-

Author: ASF GitHub Bot
Created on: 27/May/20 05:28
Start Date: 27/May/20 05:28
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11801:
URL: https://github.com/apache/beam/pull/11801#issuecomment-634437165


   @bntnam - Any additional feedback on this PR?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437655)
Time Spent: 50m  (was: 40m)

> Minify website assets with --minify flag
> 
>
> Key: BEAM-10067
> URL: https://issues.apache.org/jira/browse/BEAM-10067
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Ashwin Ramaswami
>Priority: P2
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We should minify website assets with Hugo's --minify flag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10067) Minify website assets with --minify flag

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10067?focusedWorklogId=437654=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437654
 ]

ASF GitHub Bot logged work on BEAM-10067:
-

Author: ASF GitHub Bot
Created on: 27/May/20 05:27
Start Date: 27/May/20 05:27
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11801:
URL: https://github.com/apache/beam/pull/11801#issuecomment-634436791


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437654)
Time Spent: 40m  (was: 0.5h)

> Minify website assets with --minify flag
> 
>
> Key: BEAM-10067
> URL: https://issues.apache.org/jira/browse/BEAM-10067
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Ashwin Ramaswami
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We should minify website assets with Hugo's --minify flag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9421) AI Platform pipeline patterns

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9421?focusedWorklogId=437653=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437653
 ]

ASF GitHub Bot logged work on BEAM-9421:


Author: ASF GitHub Bot
Created on: 27/May/20 05:26
Start Date: 27/May/20 05:26
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #11776:
URL: https://github.com/apache/beam/pull/11776#discussion_r430863465



##
File path: sdks/python/apache_beam/examples/snippets/snippets.py
##
@@ -1520,3 +1528,90 @@ def bigqueryio_deadletter():
   # [END BigQueryIODeadLetter]
 
   return result
+
+
+def extract_sentiments(response):
+  # [START nlp_extract_sentiments]
+  return {
+  'sentences': [{
+  sentence.text.content: sentence.sentiment.score
+  } for sentence in response.sentences],
+  'document_sentiment': response.document_sentiment.score,
+  }
+

Review comment:
   Could you remove trailing empty lines?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437653)
Time Spent: 12h 20m  (was: 12h 10m)

> AI Platform pipeline patterns
> -
>
> Key: BEAM-9421
> URL: https://issues.apache.org/jira/browse/BEAM-9421
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: P2
>  Labels: pipeline-patterns
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>
> New pipeline patterns should be contributed to the Beam's website in order to 
> demonstrate how newly implemented Google Cloud AI PTransforms can be used in 
> pipelines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10067) Minify website assets with --minify flag

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10067?focusedWorklogId=437652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437652
 ]

ASF GitHub Bot logged work on BEAM-10067:
-

Author: ASF GitHub Bot
Created on: 27/May/20 05:24
Start Date: 27/May/20 05:24
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11801:
URL: https://github.com/apache/beam/pull/11801#issuecomment-634436198


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437652)
Time Spent: 0.5h  (was: 20m)

> Minify website assets with --minify flag
> 
>
> Key: BEAM-10067
> URL: https://issues.apache.org/jira/browse/BEAM-10067
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Ashwin Ramaswami
>Priority: P2
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We should minify website assets with Hugo's --minify flag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10067) Minify website assets with --minify flag

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10067?focusedWorklogId=437651=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437651
 ]

ASF GitHub Bot logged work on BEAM-10067:
-

Author: ASF GitHub Bot
Created on: 27/May/20 05:22
Start Date: 27/May/20 05:22
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11801:
URL: https://github.com/apache/beam/pull/11801#issuecomment-634435582


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437651)
Time Spent: 20m  (was: 10m)

> Minify website assets with --minify flag
> 
>
> Key: BEAM-10067
> URL: https://issues.apache.org/jira/browse/BEAM-10067
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Ashwin Ramaswami
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We should minify website assets with Hugo's --minify flag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9770) Add BigQuery DeadLetter pattern to Patterns Page

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9770?focusedWorklogId=437650=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437650
 ]

ASF GitHub Bot logged work on BEAM-9770:


Author: ASF GitHub Bot
Created on: 27/May/20 05:22
Start Date: 27/May/20 05:22
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11815:
URL: https://github.com/apache/beam/pull/11815#issuecomment-634435440


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437650)
Time Spent: 5h 40m  (was: 5.5h)

> Add BigQuery DeadLetter pattern to Patterns Page
> 
>
> Key: BEAM-9770
> URL: https://issues.apache.org/jira/browse/BEAM-9770
> Project: Beam
>  Issue Type: New Feature
>  Components: website
>Reporter: Reza ardeshir rokni
>Assignee: Reza ardeshir rokni
>Priority: P4
>  Labels: pipeline-patterns
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9946) Enhance Partition transform to provide partitionfn with SideInputs

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9946?focusedWorklogId=437649=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437649
 ]

ASF GitHub Bot logged work on BEAM-9946:


Author: ASF GitHub Bot
Created on: 27/May/20 05:20
Start Date: 27/May/20 05:20
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11682:
URL: https://github.com/apache/beam/pull/11682#issuecomment-634434904


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437649)
Remaining Estimate: 93h 10m  (was: 93h 20m)
Time Spent: 2h 50m  (was: 2h 40m)

> Enhance Partition transform to provide partitionfn with SideInputs
> --
>
> Key: BEAM-9946
> URL: https://issues.apache.org/jira/browse/BEAM-9946
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: P2
>   Original Estimate: 96h
>  Time Spent: 2h 50m
>  Remaining Estimate: 93h 10m
>
> Currently _Partition_ transform can partition a collection into n collections 
> based on only _element_ value in _PartitionFn_ to decide on which partition a 
> particular element belongs to.
> {code:java}
> public interface PartitionFn extends Serializable {
> int partitionFor(T elem, int numPartitions);
>   }
> public static  Partition of(int numPartitions, PartitionFn 
> partitionFn) {
> return new Partition<>(new PartitionDoFn(numPartitions, partitionFn));
>   }
> {code}
> It will be useful to introduce new API with additional _sideInputs_ provided 
> to partition function. User will be able to write logic to use both _element_ 
> value and _sideInputs_ to decide on which partition a particular element 
> belongs to.
> Option-1: Proposed new API:
> {code:java}
>   public interface PartitionWithSideInputsFn extends Serializable {
> int partitionFor(T elem, int numPartitions, Context c);
>   }
> public static  Partition of(int numPartitions, 
> PartitionWithSideInputsFn partitionFn, Requirements requirements) {
>  ...
>   }
> {code}
> User can use any of the two APIs as per there partitioning function logic.
> Option-2: Redesign old API with Builder Pattern which can provide optionally 
> a _Requirements_ with _sideInputs._ Deprecate old API.
> {code:java}
> // using sideviews
> Partition.into(numberOfPartitions).via(
> fn(
>   (input,c) ->  {
> // use c.sideInput(view)
> // use input
> // return partitionnumber
>  },requiresSideInputs(view))
> )
> // without using sideviews
> Partition.into(numberOfPartitions).via(
> fn((input,c) ->  {
> // use input
> // return partitionnumber
>  })
> )
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=437648=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437648
 ]

ASF GitHub Bot logged work on BEAM-7390:


Author: ASF GitHub Bot
Created on: 27/May/20 05:18
Start Date: 27/May/20 05:18
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10165:
URL: https://github.com/apache/beam/pull/10165#issuecomment-634434431


   Last 3 test runs failed, could you check the logs?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437648)
Time Spent: 12h  (was: 11h 50m)

> Colab examples for aggregation transforms (Python)
> --
>
> Key: BEAM-7390
> URL: https://issues.apache.org/jira/browse/BEAM-7390
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: P3
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> Merge aggregation Colabs into the transform catalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10016?focusedWorklogId=437643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437643
 ]

ASF GitHub Bot logged work on BEAM-10016:
-

Author: ASF GitHub Bot
Created on: 27/May/20 05:00
Start Date: 27/May/20 05:00
Worklog Time Spent: 10m 
  Work Description: lukecwik removed a comment on pull request #11825:
URL: https://github.com/apache/beam/pull/11825#issuecomment-634428835







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437643)
Time Spent: 1h 20m  (was: 1h 10m)

> Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
> ---
>
> Key: BEAM-10016
> URL: https://issues.apache.org/jira/browse/BEAM-10016
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Both beam_PostCommit_Java_PVR_Flink_Batch and 
> beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2.
> SEVERE: Error in task code:  CHAIN MapPartition (MapPartition at [6]{Values, 
> FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map 
> (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: 
> PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) 
> (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be 
> cast to [B
>   at 
> org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
>   at 
> org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541)
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109)
>   at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203)
>   at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10016?focusedWorklogId=437640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437640
 ]

ASF GitHub Bot logged work on BEAM-10016:
-

Author: ASF GitHub Bot
Created on: 27/May/20 04:59
Start Date: 27/May/20 04:59
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11825:
URL: https://github.com/apache/beam/pull/11825#issuecomment-634428769







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437640)
Time Spent: 1h  (was: 50m)

> Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
> ---
>
> Key: BEAM-10016
> URL: https://issues.apache.org/jira/browse/BEAM-10016
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Both beam_PostCommit_Java_PVR_Flink_Batch and 
> beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2.
> SEVERE: Error in task code:  CHAIN MapPartition (MapPartition at [6]{Values, 
> FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map 
> (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: 
> PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) 
> (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be 
> cast to [B
>   at 
> org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
>   at 
> org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541)
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109)
>   at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203)
>   at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10016?focusedWorklogId=437641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437641
 ]

ASF GitHub Bot logged work on BEAM-10016:
-

Author: ASF GitHub Bot
Created on: 27/May/20 04:59
Start Date: 27/May/20 04:59
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11825:
URL: https://github.com/apache/beam/pull/11825#issuecomment-634428961


   Run Java Flink PortableValidatesRunner Batch



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437641)
Time Spent: 1h 10m  (was: 1h)

> Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
> ---
>
> Key: BEAM-10016
> URL: https://issues.apache.org/jira/browse/BEAM-10016
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Both beam_PostCommit_Java_PVR_Flink_Batch and 
> beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2.
> SEVERE: Error in task code:  CHAIN MapPartition (MapPartition at [6]{Values, 
> FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map 
> (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: 
> PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) 
> (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be 
> cast to [B
>   at 
> org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
>   at 
> org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541)
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109)
>   at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203)
>   at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10016?focusedWorklogId=437639=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437639
 ]

ASF GitHub Bot logged work on BEAM-10016:
-

Author: ASF GitHub Bot
Created on: 27/May/20 04:57
Start Date: 27/May/20 04:57
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11825:
URL: https://github.com/apache/beam/pull/11825#issuecomment-634428219


   R: @mxm @ibzib 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437639)
Time Spent: 50m  (was: 40m)

> Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
> ---
>
> Key: BEAM-10016
> URL: https://issues.apache.org/jira/browse/BEAM-10016
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Both beam_PostCommit_Java_PVR_Flink_Batch and 
> beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2.
> SEVERE: Error in task code:  CHAIN MapPartition (MapPartition at [6]{Values, 
> FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map 
> (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: 
> PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) 
> (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be 
> cast to [B
>   at 
> org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
>   at 
> org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541)
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109)
>   at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203)
>   at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10016?focusedWorklogId=437637=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437637
 ]

ASF GitHub Bot logged work on BEAM-10016:
-

Author: ASF GitHub Bot
Created on: 27/May/20 04:56
Start Date: 27/May/20 04:56
Worklog Time Spent: 10m 
  Work Description: lukecwik removed a comment on pull request #11825:
URL: https://github.com/apache/beam/pull/11825#issuecomment-634427643


   Run PortableValidatesRunner Flink



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437637)
Time Spent: 0.5h  (was: 20m)

> Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
> ---
>
> Key: BEAM-10016
> URL: https://issues.apache.org/jira/browse/BEAM-10016
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Both beam_PostCommit_Java_PVR_Flink_Batch and 
> beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2.
> SEVERE: Error in task code:  CHAIN MapPartition (MapPartition at [6]{Values, 
> FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map 
> (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: 
> PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) 
> (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be 
> cast to [B
>   at 
> org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
>   at 
> org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541)
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109)
>   at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203)
>   at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10016?focusedWorklogId=437638=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437638
 ]

ASF GitHub Bot logged work on BEAM-10016:
-

Author: ASF GitHub Bot
Created on: 27/May/20 04:56
Start Date: 27/May/20 04:56
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11825:
URL: https://github.com/apache/beam/pull/11825#issuecomment-634428036







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437638)
Time Spent: 40m  (was: 0.5h)

> Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
> ---
>
> Key: BEAM-10016
> URL: https://issues.apache.org/jira/browse/BEAM-10016
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Both beam_PostCommit_Java_PVR_Flink_Batch and 
> beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2.
> SEVERE: Error in task code:  CHAIN MapPartition (MapPartition at [6]{Values, 
> FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map 
> (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: 
> PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) 
> (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be 
> cast to [B
>   at 
> org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
>   at 
> org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541)
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109)
>   at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203)
>   at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10016?focusedWorklogId=437635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437635
 ]

ASF GitHub Bot logged work on BEAM-10016:
-

Author: ASF GitHub Bot
Created on: 27/May/20 04:54
Start Date: 27/May/20 04:54
Worklog Time Spent: 10m 
  Work Description: lukecwik opened a new pull request #11825:
URL: https://github.com/apache/beam/pull/11825


   Follow-up on fixing the issue should happen separately.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10016?focusedWorklogId=437636=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437636
 ]

ASF GitHub Bot logged work on BEAM-10016:
-

Author: ASF GitHub Bot
Created on: 27/May/20 04:54
Start Date: 27/May/20 04:54
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11825:
URL: https://github.com/apache/beam/pull/11825#issuecomment-634427643


   Run PortableValidatesRunner Flink



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437636)
Time Spent: 20m  (was: 10m)

> Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
> ---
>
> Key: BEAM-10016
> URL: https://issues.apache.org/jira/browse/BEAM-10016
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Both beam_PostCommit_Java_PVR_Flink_Batch and 
> beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2.
> SEVERE: Error in task code:  CHAIN MapPartition (MapPartition at [6]{Values, 
> FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map 
> (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: 
> PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) 
> (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be 
> cast to [B
>   at 
> org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
>   at 
> org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541)
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109)
>   at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203)
>   at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-26 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117378#comment-17117378
 ] 

Luke Cwik edited comment on BEAM-10016 at 5/27/20, 4:19 AM:


The issue is that the GreedyPipelineFuser (and related classes) doesn't take 
into account the change in the encoding from the flattens input to the flattens 
output in certain scenarios where the flatten isn't being merged with an 
existing stage.

Normally one could copy the coder from the flatten's output PCollection to all 
the input PCollections to fix this but this doesn't hold when dealing with 
cross language pipelines because we could have
{code:java}
ParDo(Java) -> PC(big endian int coder)   -> Flatten(Python) -> 
PC(varint coder)
ParDo(Go) -> PCollection(little endian int coder) /{code}
The Python SDK in this case would know big endian int coder, little endian int 
coder and varint coder but Java/Go would only know the big endian int coder and 
little endian int coder respectively.

The solution in the above example is to make the Python SDK perform the 
transcoding by having it execute the flatten. Only flattens where the 
input/output coder matches can be done by a runner since no transcoding is 
necessary.

An alternative would be to require flattens have the same input and output 
coders but this has its own problems since it would be a backwards incompatible 
change or to insert identity ParDo's within SDKs to make sure that input/output 
coders match whenever there is a Flatten.


was (Author: lcwik):
The issue is that the GreedyPipelineFuser (and related classes) doesn't take 
into account the change in the encoding from the flattens input to the flattens 
output in certain scenarios where the flatten isn't being merged with an 
existing stage.

Normally one could copy the coder from the flatten's output PCollection to all 
the input PCollections to fix this but this doesn't hold when dealing with 
cross language pipelines because we could have
{code:java}
ParDo(Java) -> PC(big endian int coder)   -> Flatten(Python) -> 
PC(varint coder)
ParDo(Go) -> PCollection(little endian int coder) /{code}
The Python SDK in this case would know big endian int coder, little endian int 
coder and varint coder but Java/Go would only know the big endian int coder and 
little endian int coder respectively.

The solution in the above example is to make the Python SDK perform the 
transcoding by having it execute the flatten. Only flattens where the 
input/output coder matches can be done by a runner since no transcoding is 
necessary.

> Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
> ---
>
> Key: BEAM-10016
> URL: https://issues.apache.org/jira/browse/BEAM-10016
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Maximilian Michels
>Priority: P2
>
> Both beam_PostCommit_Java_PVR_Flink_Batch and 
> beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2.
> SEVERE: Error in task code:  CHAIN MapPartition (MapPartition at [6]{Values, 
> FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map 
> (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: 
> PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) 
> (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be 
> cast to [B
>   at 
> org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
>   at 
> org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541)
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109)
>   at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203)
>   at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>   at 

[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=437631=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437631
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 27/May/20 04:19
Start Date: 27/May/20 04:19
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #11632:
URL: https://github.com/apache/beam/pull/11632#discussion_r430848092



##
File path: sdks/python/apache_beam/dataframe/convert.py
##
@@ -16,13 +16,23 @@
 
 from __future__ import absolute_import
 
+import typing
+
 import inspect
 
 from apache_beam import pvalue
 from apache_beam.dataframe import expressions
 from apache_beam.dataframe import frame_base
 from apache_beam.dataframe import transforms
 
+if typing.TYPE_CHECKING:
+  # pylint: disable=ungrouped-imports
+  from typing import Any
+  from typing import Dict
+  from typing import Tuple
+  from typing import Union

Review comment:
   [sigh] It still didn't like PCollection. 
`apache_beam/dataframe/transforms.py:30:0: W0611: Unused PCollection imported 
from apache_beam.pvalue (unused-import)`. But the rest are OK.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437631)
Time Spent: 84h 20m  (was: 84h 10m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: P2
>  Time Spent: 84h 20m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-26 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117378#comment-17117378
 ] 

Luke Cwik commented on BEAM-10016:
--

The issue is that the GreedyPipelineFuser (and related classes) doesn't take 
into account the change in the encoding from the flattens input to the flattens 
output in certain scenarios where the flatten isn't being merged with an 
existing stage.

Normally one could copy the coder from the flatten's output PCollection to all 
the input PCollections to fix this but this doesn't hold when dealing with 
cross language pipelines because we could have
{code:java}
ParDo(Java) -> PC(big endian int coder)   -> Flatten(Python) -> 
PC(varint coder)
ParDo(Go) -> PCollection(little endian int coder) /{code}
The Python SDK in this case would know big endian int coder, little endian int 
coder and varint coder but Java/Go would only know the big endian int coder and 
little endian int coder respectively.

The solution in the above example is to make the Python SDK perform the 
transcoding by having it execute the flatten. Only flattens where the 
input/output coder matches can be done by a runner since no transcoding is 
necessary.

> Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
> ---
>
> Key: BEAM-10016
> URL: https://issues.apache.org/jira/browse/BEAM-10016
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Maximilian Michels
>Priority: P2
>
> Both beam_PostCommit_Java_PVR_Flink_Batch and 
> beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2.
> SEVERE: Error in task code:  CHAIN MapPartition (MapPartition at [6]{Values, 
> FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map 
> (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: 
> PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) 
> (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be 
> cast to [B
>   at 
> org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
>   at 
> org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541)
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109)
>   at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203)
>   at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10075) Allow users to tune the grouping table size in batch dataflow pipelines

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10075?focusedWorklogId=437629=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437629
 ]

ASF GitHub Bot logged work on BEAM-10075:
-

Author: ASF GitHub Bot
Created on: 27/May/20 04:16
Start Date: 27/May/20 04:16
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11811:
URL: https://github.com/apache/beam/pull/11811#issuecomment-634415558


   > looks like 
`org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful`
 failed in the precommit here, I can't imagine that was from this change?
   
   There are several flakes in the project which need to be fixed. Feel free to 
rerun the precommit as necessary and/or fix the flakes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437629)
Time Spent: 1h  (was: 50m)

> Allow users to tune the grouping table size in batch dataflow pipelines
> ---
>
> Key: BEAM-10075
> URL: https://issues.apache.org/jira/browse/BEAM-10075
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The dataflow worker hard-codes the grouping table size to 100 MB.  We should 
> allow users to specify this as a pipeline parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7568) Java dataflow harness re-encodes value state cells even if they haven't changed

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7568?focusedWorklogId=437623=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437623
 ]

ASF GitHub Bot logged work on BEAM-7568:


Author: ASF GitHub Bot
Created on: 27/May/20 04:15
Start Date: 27/May/20 04:15
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #11823:
URL: https://github.com/apache/beam/pull/11823#discussion_r430836474



##
File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternals.java
##
@@ -410,14 +416,18 @@ protected WorkItemCommitRequest 
persistDirectly(WindmillStateCache.ForKey cache)
 return WorkItemCommitRequest.newBuilder().buildPartial();
   }
 
-  ByteString.Output stream = ByteString.newOutput();
-  if (value != null) {
-coder.encode(value, stream, Coder.Context.OUTER);
+  ByteString encoded = null;
+  if (cachedSize == -1 || modified) {

Review comment:
   This logic is really frustrating that we are encoding because we aren't 
returning the size of what we read as part of the state future from 
WindmillStateReader.
   
   This could be better but this is definitely an improvement to not re-encode 
the value needlessly.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437623)
Time Spent: 0.5h  (was: 20m)

> Java dataflow harness re-encodes value state cells even if they haven't 
> changed
> ---
>
> Key: BEAM-7568
> URL: https://issues.apache.org/jira/browse/BEAM-7568
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Affects Versions: 2.13.0
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: P2
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The java dataflow worker seems to re-encode ValueState cells after every work 
> item, even they weren't modified.
> You can see here 
> [https://github.com/apache/beam/blob/a71bfda77df36aa1531f01533c372233cfba0dd9/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternals.java#L413]
>  that the value is always encoded (and used to weight the cache entry) even 
> if it won't be persisted back to windmill. 
> This can have some large performance implications if they values being stored 
> are expensive/large to encode, and infrequently modified.  Ideally, the 
> weight would be also cached, and the value would only need to be modified if 
> it was changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10075) Allow users to tune the grouping table size in batch dataflow pipelines

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10075?focusedWorklogId=437620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437620
 ]

ASF GitHub Bot logged work on BEAM-10075:
-

Author: ASF GitHub Bot
Created on: 27/May/20 04:14
Start Date: 27/May/20 04:14
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on pull request #11811:
URL: https://github.com/apache/beam/pull/11811#issuecomment-634414369







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437620)
Time Spent: 50m  (was: 40m)

> Allow users to tune the grouping table size in batch dataflow pipelines
> ---
>
> Key: BEAM-10075
> URL: https://issues.apache.org/jira/browse/BEAM-10075
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The dataflow worker hard-codes the grouping table size to 100 MB.  We should 
> allow users to specify this as a pipeline parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7568) Java dataflow harness re-encodes value state cells even if they haven't changed

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7568?focusedWorklogId=437626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437626
 ]

ASF GitHub Bot logged work on BEAM-7568:


Author: ASF GitHub Bot
Created on: 27/May/20 04:15
Start Date: 27/May/20 04:15
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11823:
URL: https://github.com/apache/beam/pull/11823#issuecomment-634405269


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437626)
Time Spent: 40m  (was: 0.5h)

> Java dataflow harness re-encodes value state cells even if they haven't 
> changed
> ---
>
> Key: BEAM-7568
> URL: https://issues.apache.org/jira/browse/BEAM-7568
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Affects Versions: 2.13.0
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The java dataflow worker seems to re-encode ValueState cells after every work 
> item, even they weren't modified.
> You can see here 
> [https://github.com/apache/beam/blob/a71bfda77df36aa1531f01533c372233cfba0dd9/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternals.java#L413]
>  that the value is always encoded (and used to weight the cache entry) even 
> if it won't be persisted back to windmill. 
> This can have some large performance implications if they values being stored 
> are expensive/large to encode, and infrequently modified.  Ideally, the 
> weight would be also cached, and the value would only need to be modified if 
> it was changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2939) Fn API SDF support

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=437614=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437614
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 27/May/20 04:14
Start Date: 27/May/20 04:14
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #11818:
URL: https://github.com/apache/beam/pull/11818#issuecomment-634309798


   Looks like the only failure in Java PreCommit was VideoIntelligenceIT. A 
known flake fixed on master 
[BEAM-10050](https://issues.apache.org/jira/browse/BEAM-10050). I'll go ahead 
and merge.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437614)
Time Spent: 35h 50m  (was: 35h 40m)

> Fn API SDF support
> --
>
> Key: BEAM-2939
> URL: https://issues.apache.org/jira/browse/BEAM-2939
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: P2
>  Labels: portability
>  Time Spent: 35h 50m
>  Remaining Estimate: 0h
>
> The Fn API should support streaming SDF. Detailed design TBD.
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7336) KafkaIO should support inferring schemas when reading Avro

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7336?focusedWorklogId=437617=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437617
 ]

ASF GitHub Bot logged work on BEAM-7336:


Author: ASF GitHub Bot
Created on: 27/May/20 04:14
Start Date: 27/May/20 04:14
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #10978:
URL: https://github.com/apache/beam/pull/10978#issuecomment-634057582


   @reuvenlax Could you take a look on this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437617)
Time Spent: 2h 40m  (was: 2.5h)

> KafkaIO should support inferring schemas when reading Avro
> --
>
> Key: BEAM-7336
> URL: https://issues.apache.org/jira/browse/BEAM-7336
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka
>Reporter: Reuven Lax
>Assignee: Ismaël Mejía
>Priority: P2
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> PubSubIO already supports this.
> It would also be nice to be able to look up Avro schemas in the Kafka schema 
> registry.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7568) Java dataflow harness re-encodes value state cells even if they haven't changed

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7568?focusedWorklogId=437619=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437619
 ]

ASF GitHub Bot logged work on BEAM-7568:


Author: ASF GitHub Bot
Created on: 27/May/20 04:14
Start Date: 27/May/20 04:14
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on a change in pull request 
#11823:
URL: https://github.com/apache/beam/pull/11823#discussion_r430837481



##
File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternals.java
##
@@ -410,14 +416,18 @@ protected WorkItemCommitRequest 
persistDirectly(WindmillStateCache.ForKey cache)
 return WorkItemCommitRequest.newBuilder().buildPartial();
   }
 
-  ByteString.Output stream = ByteString.newOutput();
-  if (value != null) {
-coder.encode(value, stream, Coder.Context.OUTER);
+  ByteString encoded = null;
+  if (cachedSize == -1 || modified) {

Review comment:
   agreed, it took me awhile to reason this out w/ the different code paths 
that can affect the values here.  I'm reasonably sure you don't need the 
`cachedSize == -1` branch but, it's been working like this for ~1 year so I 
didn't want to tempt fate.
   
   There's also the case where this is the first write, but `valueIsKnown = 
false` (because `isNewKey = false`), so we end up with a read that doesn't 
return anything.
   
   Thanks for the quick reviews!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437619)
Time Spent: 20m  (was: 10m)

> Java dataflow harness re-encodes value state cells even if they haven't 
> changed
> ---
>
> Key: BEAM-7568
> URL: https://issues.apache.org/jira/browse/BEAM-7568
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Affects Versions: 2.13.0
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The java dataflow worker seems to re-encode ValueState cells after every work 
> item, even they weren't modified.
> You can see here 
> [https://github.com/apache/beam/blob/a71bfda77df36aa1531f01533c372233cfba0dd9/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternals.java#L413]
>  that the value is always encoded (and used to weight the cache entry) even 
> if it won't be persisted back to windmill. 
> This can have some large performance implications if they values being stored 
> are expensive/large to encode, and infrequently modified.  Ideally, the 
> weight would be also cached, and the value would only need to be modified if 
> it was changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8371) Sunset Beam Python 2 support in new releases in 2020.

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8371?focusedWorklogId=437611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437611
 ]

ASF GitHub Bot logged work on BEAM-8371:


Author: ASF GitHub Bot
Created on: 27/May/20 04:13
Start Date: 27/May/20 04:13
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #11819:
URL: https://github.com/apache/beam/pull/11819#issuecomment-634282371







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437611)
Time Spent: 50m  (was: 40m)

> Sunset Beam Python 2 support in new releases in 2020.
> -
>
> Key: BEAM-8371
> URL: https://issues.apache.org/jira/browse/BEAM-8371
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Ashwin Ramaswami
>Priority: P2
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Creating this Jira to track eventual sunset of Python 2 support in Beam.
> This was previously discussed in [1], [2], [3].
> We can use this issue to communicate next steps, collect feedback and give 
> updates on current status.
> [1] 
> https://lists.apache.org/thread.html/eba6caa58ea79a7ecbc8560d1c680a366b44c531d96ce5c699d41535@%3Cdev.beam.apache.org%3E
> [2] 
> https://lists.apache.org/thread.html/456631fe1a696c537ef8ebfee42cd3ea8121bf7c639c52da5f7032e7@%3Cdev.beam.apache.org%3E
> [3] 
> https://lists.apache.org/thread.html/r0d5c309a7e3107854f4892ccfeb1a17c0cec25dfce188678ab8df072%40%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9916) Update IOs documentation links

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9916?focusedWorklogId=437596=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437596
 ]

ASF GitHub Bot logged work on BEAM-9916:


Author: ASF GitHub Bot
Created on: 27/May/20 04:12
Start Date: 27/May/20 04:12
Worklog Time Spent: 10m 
  Work Description: datancoffee commented on pull request #11802:
URL: https://github.com/apache/beam/pull/11802#issuecomment-634267799


   Can we also add a section on "Applications" for application-specific 
connectors?
   
   ---
   
   Add a section "Applications" after the "Database" section but before 
"Miscellaneous"
   
   FhirIO provides an API for reading and writing resources to https://cloud.google.com/healthcare/docs/concepts/fhir;>Google Cloud 
Healthcare Fhir API. 
   
   
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java
   
   HL7v2IO provides an API for reading from and writing to https://cloud.google.com/healthcare/docs/concepts/hl7v2;>Google Cloud 
Healthcare HL7v2 API. 
   
   
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437596)
Time Spent: 1h 20m  (was: 1h 10m)

> Update IOs documentation links
> --
>
> Key: BEAM-9916
> URL: https://issues.apache.org/jira/browse/BEAM-9916
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Alexey Romanenko
>Assignee: Ashwin Ramaswami
>Priority: P3
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, on the webpage 
> [https://beam.apache.org/documentation/io/built-in/] , we link all IOs to 
> their code on github, which could be quite odd for users. After a 
> [discussion|https://lists.apache.org/thread.html/r2c7bb9a2e13f1b8c296930fc8a36edafe731f49b5182567989afcb98%40%3Cdev.beam.apache.org%3E]
>  on dev@, it was decided to link them to the latest doc reference (like 
> [https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileSystems.html]
>  for FileIO).
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10078) uniquify Dataflow specific jars when staging

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10078?focusedWorklogId=437591=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437591
 ]

ASF GitHub Bot logged work on BEAM-10078:
-

Author: ASF GitHub Bot
Created on: 27/May/20 04:11
Start Date: 27/May/20 04:11
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #11814:
URL: https://github.com/apache/beam/pull/11814#issuecomment-633786896







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437591)
Time Spent: 0.5h  (was: 20m)

> uniquify Dataflow specific jars when staging
> 
>
> Key: BEAM-10078
> URL: https://issues.apache.org/jira/browse/BEAM-10078
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> After BEAM-9383, Dataflow specific jars (dataflow-worker.jar, windmill_main) 
> could be overwritten when two or more jobs share the same staging location. 
> Since they 1) should have specific predefined names AND 2) should have unique 
> location for avoiding collision, they need special handling when staging 
> artifacts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7520) DirectRunner timers are not strictly time ordered

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7520?focusedWorklogId=437587=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437587
 ]

ASF GitHub Bot logged work on BEAM-7520:


Author: ASF GitHub Bot
Created on: 27/May/20 04:10
Start Date: 27/May/20 04:10
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #9190:
URL: https://github.com/apache/beam/pull/9190#discussion_r430620228



##
File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
##
@@ -3488,6 +3499,158 @@ public void onTimer(OutputReceiver r) {
   pipeline.run();
 }
 
+/** A test makes sure that an event time timers are correctly ordered. */
+@Test
+@Category({
+  ValidatesRunner.class,
+  UsesTimersInParDo.class,
+  UsesTestStream.class,
+  UsesStatefulParDo.class,
+  UsesStrictTimerOrdering.class
+})
+public void testEventTimeTimerOrdering() throws Exception {
+  final int numTestElements = 100;
+  final Instant now = new Instant(15000L);
+  TestStream.Builder> builder =
+  TestStream.create(KvCoder.of(StringUtf8Coder.of(), 
StringUtf8Coder.of()))
+  .advanceWatermarkTo(new Instant(0));
+
+  for (int i = 0; i < numTestElements; i++) {
+builder = builder.addElements(TimestampedValue.of(KV.of("dummy", "" + 
i), now.plus(i)));
+builder = builder.advanceWatermarkTo(now.plus(i / 10 * 10));
+  }
+
+  testEventTimeTimerOrderingWithInputPTransform(
+  now, numTestElements, builder.advanceWatermarkToInfinity());
+}
+
+/** A test makes sure that an event time timers are correctly ordered 
using Create transform. */
+@Test
+@Category({
+  ValidatesRunner.class,
+  UsesTimersInParDo.class,
+  UsesStatefulParDo.class,
+  UsesStrictTimerOrdering.class
+})
+public void testEventTimeTimerOrderingWithCreate() throws Exception {
+  final int numTestElements = 100;
+  final Instant now = new Instant(15000L);
+
+  List>> elements = new ArrayList<>();
+  for (int i = 0; i < numTestElements; i++) {
+elements.add(TimestampedValue.of(KV.of("dummy", "" + i), now.plus(i)));
+  }
+
+  testEventTimeTimerOrderingWithInputPTransform(

Review comment:
   The elements in the PCollection produced by the Create transform are 
considered unordered so the stateful DoFn could see (dummy, 1) before it sees 
(dummy, 0). Doesn't the DoFn need to be marked with `@RequiresTimeSortedInput` 
in this case?

##
File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
##
@@ -3488,6 +3499,158 @@ public void onTimer(OutputReceiver r) {
   pipeline.run();
 }
 
+/** A test makes sure that an event time timers are correctly ordered. */
+@Test
+@Category({
+  ValidatesRunner.class,
+  UsesTimersInParDo.class,
+  UsesTestStream.class,
+  UsesStatefulParDo.class,
+  UsesStrictTimerOrdering.class
+})
+public void testEventTimeTimerOrdering() throws Exception {
+  final int numTestElements = 100;
+  final Instant now = new Instant(15000L);
+  TestStream.Builder> builder =
+  TestStream.create(KvCoder.of(StringUtf8Coder.of(), 
StringUtf8Coder.of()))
+  .advanceWatermarkTo(new Instant(0));
+
+  for (int i = 0; i < numTestElements; i++) {
+builder = builder.addElements(TimestampedValue.of(KV.of("dummy", "" + 
i), now.plus(i)));
+builder = builder.advanceWatermarkTo(now.plus(i / 10 * 10));
+  }
+
+  testEventTimeTimerOrderingWithInputPTransform(
+  now, numTestElements, builder.advanceWatermarkToInfinity());
+}
+
+/** A test makes sure that an event time timers are correctly ordered 
using Create transform. */
+@Test
+@Category({
+  ValidatesRunner.class,
+  UsesTimersInParDo.class,
+  UsesStatefulParDo.class,
+  UsesStrictTimerOrdering.class
+})
+public void testEventTimeTimerOrderingWithCreate() throws Exception {
+  final int numTestElements = 100;
+  final Instant now = new Instant(15000L);
+
+  List>> elements = new ArrayList<>();
+  for (int i = 0; i < numTestElements; i++) {
+elements.add(TimestampedValue.of(KV.of("dummy", "" + i), now.plus(i)));
+  }
+
+  testEventTimeTimerOrderingWithInputPTransform(

Review comment:
   @je-ik @y1chi @kennknowles 
   The elements in the PCollection produced by the Create transform are 
considered unordered so the stateful DoFn could see `(dummy, 1)` before it sees 
`(dummy, 0)`. Doesn't the DoFn need to be marked with 
`@RequiresTimeSortedInput` in this case?





This is an automated message from the Apache Git Service.
To respond to the 

[jira] [Work logged] (BEAM-9679) Core Transforms | Go SDK Code Katas

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9679?focusedWorklogId=437575=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437575
 ]

ASF GitHub Bot logged work on BEAM-9679:


Author: ASF GitHub Bot
Created on: 27/May/20 04:09
Start Date: 27/May/20 04:09
Worklog Time Spent: 10m 
  Work Description: henryken commented on a change in pull request #11803:
URL: https://github.com/apache/beam/pull/11803#discussion_r429965845



##
File path: learning/katas/go/Core 
Transforms/CoGroupByKey/CoGroupByKey/pkg/task/task.go
##
@@ -0,0 +1,52 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package task
+
+import (
+   "fmt"
+   "github.com/apache/beam/sdks/go/pkg/beam"
+)
+
+func ApplyTransform(s beam.Scope, fruits beam.PCollection, countries 
beam.PCollection) beam.PCollection {
+   fruitsKV := beam.ParDo(s, func(e string) (string, string) {
+   return string(e[0]), e
+   }, fruits)
+
+   countriesKV := beam.ParDo(s, func(e string) (string, string) {
+   return string(e[0]), e
+   }, countries)
+
+   grouped := beam.CoGroupByKey(s, fruitsKV, countriesKV)
+   return beam.ParDo(s, func(key string, f func(*string) bool, c 
func(*string) bool, emit func(string)) {
+   v := {

Review comment:
   Why not using `wa` here so that not to confuse with Value?

##
File path: learning/katas/go/Core Transforms/CoGroupByKey/CoGroupByKey/task.md
##
@@ -0,0 +1,104 @@
+
+
+# CoGroupByKey
+
+CoGroupByKey performs a relational join of two or more key/value PCollections 
that have the same 
+key type.
+
+**Kata:** Implement a 
[beam.CoGroupByKey](https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#CoGroupByKey)
 
+transform that join words by the first alphabetical letter, and then produces 
the string representation of the 
+WordsAlphabet model.
+
+
+Refer to
+https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#CoGroupByKey;>beam.CoGroupByKey
+to solve this problem.
+
+
+
+  Refer to the Beam Programming Guide
+  https://beam.apache.org/documentation/programming-guide/#cogroupbykey;>
+"CoGroupByKey" section for more information.
+
+
+
+  Think of this problem in three stages.  First, create key/value pairs of 
PCollections called KV
+  for fruits and countries, pairing the first character with the word.  Next, 
apply CoGroupByKey to the KVs
+  followed by a ParDo.
+
+
+
+  In the last lesson we learned how to make key/value PCollections called KV.  
Now we have 
+  two to make from fruits and countries.
+  
+  To return as a KV, you can return two values from your DoFn. The first 
return value represents the Key, and 
+  the second return value represents the Value.  An example is shown below.
+  
+```
+func doFn(element string) (string, string) {
+key := string(element[0])
+value := element
+return key, value
+}
+``` 
+
+
+
+  In the last lesson we learned that 
+  https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#GroupByKey;>
+  beam.GroupByKey takes a single KV.
+  https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#CoGroupByKey;>beam.CoGroupByKey
+  takes more than one KV.
+
+
+
+  Our final step in this problem requires a
+  https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#ParDo;>beam.ParDo
+  with a DoFn that's different than what we've seen in previous lessons.  In 
the previous step we should
+  have a PCollection acquired from CoGroupByKey.  A ParDo for that PCollection 
expects a DoFn that looks
+  like the following. 
+  
+  ```
+  func doFn(key string, aKV func(*string) bool, anotherKV func(*string) bool, 
emit func(string)){

Review comment:
   If I understand correctly, it seems that `func(*string) bool` should 
assign the value of V to the passed-in variable pointer, not the KV?
   Borrowing the concept of the Java version, these functions behave similarly 
to CoGbkResult.get() based on the sequence of the passed in PCollection?

##
File path: learning/katas/go/Core Transforms/CoGroupByKey/CoGroupByKey/go.mod
##
@@ -0,0 +1,26 @@
+// Licensed to the Apache Software 

[jira] [Work logged] (BEAM-9603) Support Dynamic Timer in Java SDK over FnApi

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9603?focusedWorklogId=437578=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437578
 ]

ASF GitHub Bot logged work on BEAM-9603:


Author: ASF GitHub Bot
Created on: 27/May/20 04:09
Start Date: 27/May/20 04:09
Worklog Time Spent: 10m 
  Work Description: boyuanzz merged pull request #11756:
URL: https://github.com/apache/beam/pull/11756


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437578)
Time Spent: 7h  (was: 6h 50m)

> Support Dynamic Timer in Java SDK over FnApi
> 
>
> Key: BEAM-9603
> URL: https://issues.apache.org/jira/browse/BEAM-9603
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-harness
>Reporter: Boyuan Zhang
>Assignee: Yichi Zhang
>Priority: P2
>  Time Spent: 7h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10064) Fix google3 import error for BEAM-9383

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10064?focusedWorklogId=437577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437577
 ]

ASF GitHub Bot logged work on BEAM-10064:
-

Author: ASF GitHub Bot
Created on: 27/May/20 04:09
Start Date: 27/May/20 04:09
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11793:
URL: https://github.com/apache/beam/pull/11793#issuecomment-634353785


   retest this please
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437577)
Time Spent: 0.5h  (was: 20m)

> Fix google3 import error for BEAM-9383
> --
>
> Key: BEAM-10064
> URL: https://issues.apache.org/jira/browse/BEAM-10064
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Fix google3 importing error for BEAM-9383
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9220) Add use_runner_v2 argument for dataflow

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9220?focusedWorklogId=437561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437561
 ]

ASF GitHub Bot logged work on BEAM-9220:


Author: ASF GitHub Bot
Created on: 27/May/20 04:08
Start Date: 27/May/20 04:08
Worklog Time Spent: 10m 
  Work Description: lostluck removed a comment on pull request #11207:
URL: https://github.com/apache/beam/pull/11207#issuecomment-634131283







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437561)
Time Spent: 3h 10m  (was: 3h)

> Add use_runner_v2 argument for dataflow
> ---
>
> Key: BEAM-9220
> URL: https://issues.apache.org/jira/browse/BEAM-9220
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Ankur Goenka
>Priority: P2
> Fix For: 2.20.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2939) Fn API SDF support

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=437564=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437564
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 27/May/20 04:08
Start Date: 27/May/20 04:08
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit opened a new pull request #11818:
URL: https://github.com/apache/beam/pull/11818


   R: @lukecwik 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 

[jira] [Work logged] (BEAM-2939) Fn API SDF support

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=437558=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437558
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 27/May/20 04:07
Start Date: 27/May/20 04:07
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #11781:
URL: https://github.com/apache/beam/pull/11781#issuecomment-634096823


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437558)
Time Spent: 35.5h  (was: 35h 20m)

> Fn API SDF support
> --
>
> Key: BEAM-2939
> URL: https://issues.apache.org/jira/browse/BEAM-2939
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: P2
>  Labels: portability
>  Time Spent: 35.5h
>  Remaining Estimate: 0h
>
> The Fn API should support streaming SDF. Detailed design TBD.
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9603) Support Dynamic Timer in Java SDK over FnApi

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9603?focusedWorklogId=437555=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437555
 ]

ASF GitHub Bot logged work on BEAM-9603:


Author: ASF GitHub Bot
Created on: 27/May/20 04:07
Start Date: 27/May/20 04:07
Worklog Time Spent: 10m 
  Work Description: boyuanzz merged pull request #11822:
URL: https://github.com/apache/beam/pull/11822


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437555)
Time Spent: 6h 50m  (was: 6h 40m)

> Support Dynamic Timer in Java SDK over FnApi
> 
>
> Key: BEAM-9603
> URL: https://issues.apache.org/jira/browse/BEAM-9603
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-harness
>Reporter: Boyuan Zhang
>Assignee: Yichi Zhang
>Priority: P2
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9220) Add use_runner_v2 argument for dataflow

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9220?focusedWorklogId=437557=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437557
 ]

ASF GitHub Bot logged work on BEAM-9220:


Author: ASF GitHub Bot
Created on: 27/May/20 04:07
Start Date: 27/May/20 04:07
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #11207:
URL: https://github.com/apache/beam/pull/11207#issuecomment-634131283







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437557)
Time Spent: 3h  (was: 2h 50m)

> Add use_runner_v2 argument for dataflow
> ---
>
> Key: BEAM-9220
> URL: https://issues.apache.org/jira/browse/BEAM-9220
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Ankur Goenka
>Priority: P2
> Fix For: 2.20.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7370) Beam Dependency Update Request: Sphinx

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7370?focusedWorklogId=437544=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437544
 ]

ASF GitHub Bot logged work on BEAM-7370:


Author: ASF GitHub Bot
Created on: 27/May/20 04:05
Start Date: 27/May/20 04:05
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #11798:
URL: https://github.com/apache/beam/pull/11798#issuecomment-634178059


   Thanks! LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437544)
Time Spent: 0.5h  (was: 20m)

> Beam Dependency Update Request: Sphinx
> --
>
> Key: BEAM-7370
> URL: https://issues.apache.org/jira/browse/BEAM-7370
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: P2
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
>  - 2019-05-20 16:38:07.937770 
> -
> Please consider upgrading the dependency Sphinx. 
> The current version is 1.8.5. The latest version is 2.0.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-06-17 12:32:27.855338 
> -
> Please consider upgrading the dependency Sphinx. 
> The current version is 1.8.5. The latest version is 2.1.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-06-24 12:02:59.052884 
> -
> Please consider upgrading the dependency Sphinx. 
> The current version is 1.8.5. The latest version is 2.1.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-07-01 12:04:13.113613 
> -
> Please consider upgrading the dependency Sphinx. 
> The current version is 1.8.5. The latest version is 2.1.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-07-08 12:03:15.091005 
> -
> Please consider upgrading the dependency Sphinx. 
> The current version is 1.8.5. The latest version is 2.1.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-07-15 12:03:09.406918 
> -
> Please consider upgrading the dependency Sphinx. 
> The current version is 1.8.5. The latest version is 2.1.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-07-22 12:03:31.157859 
> -
> Please consider upgrading the dependency Sphinx. 
> The current version is 1.8.5. The latest version is 2.1.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-07-29 12:05:13.023604 
> -
> Please consider upgrading the dependency Sphinx. 
> The current version is 1.8.5. The latest version is 2.1.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-08-05 12:03:03.242767 
> -
> Please consider upgrading the dependency Sphinx. 
> The current version is 1.8.5. The latest version is 2.1.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-08-12 12:04:01.647619 
> -
> Please consider upgrading the dependency 

[jira] [Work logged] (BEAM-8910) Use AVRO instead of JSON in BigQuery bounded source.

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8910?focusedWorklogId=437537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437537
 ]

ASF GitHub Bot logged work on BEAM-8910:


Author: ASF GitHub Bot
Created on: 27/May/20 04:04
Start Date: 27/May/20 04:04
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #11086:
URL: https://github.com/apache/beam/pull/11086#issuecomment-634324376


   @pabloem, would you mind squashing the commits that happened after the last
   review iteration (but please don't squash earlier commits until review is
   finalized) and resolving any comments that are already addressed? You can
   also delete the "Run postcommit..." which would make  it easier to follow
   up on remaining comment threads.
   
   On Thu, May 21, 2020 at 12:52 PM Pablo  wrote:
   
   > *@pabloem* commented on this pull request.
   > --
   >
   > In sdks/python/apache_beam/io/gcp/bigquery.py
   > :
   >
   > > @@ -610,7 +611,8 @@ def __init__(
   >coder=None,
   >use_standard_sql=False,
   >flatten_results=True,
   > -  kms_key=None):
   > +  kms_key=None,
   > +  use_json_exports=False):
   >
   > Restarting this discussion after some rebasing and preparing...
   >
   > I think I'd prefer to allow users to choose the export file format. This
   > is what we do for writing to BQ in Java and Python. Allowing users to
   > choose output type for specific column types would add overhead of (if avro
   > and json_format_bytes: formatbytes; if avro and json_datetime_format:
   > formatdatetime; if not avro and not json_format_bytes: formatbytes; ...).
   >
   > We can discourage users from using JSON (AVRO is already the default) -
   > and eventually stop supporting it if that makes sense. Thoughts?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   >
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437537)
Time Spent: 12h 40m  (was: 12.5h)

> Use AVRO instead of JSON in BigQuery bounded source.
> 
>
> Key: BEAM-8910
> URL: https://issues.apache.org/jira/browse/BEAM-8910
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kamil Wasilewski
>Assignee: Pablo Estrada
>Priority: P3
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>
> The proposed BigQuery bounded source in Python SDK (see PR: 
> [https://github.com/apache/beam/pull/9772)] uses a BigQuery export job to 
> take a snapshot of the table and read from each produced JSON file. A 
> performance improvement can be gain by switching to AVRO instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=437530=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437530
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 27/May/20 04:02
Start Date: 27/May/20 04:02
Worklog Time Spent: 10m 
  Work Description: chadrik commented on a change in pull request #11632:
URL: https://github.com/apache/beam/pull/11632#discussion_r430719974



##
File path: sdks/python/apache_beam/dataframe/convert.py
##
@@ -16,13 +16,23 @@
 
 from __future__ import absolute_import
 
+import typing
+
 import inspect
 
 from apache_beam import pvalue
 from apache_beam.dataframe import expressions
 from apache_beam.dataframe import frame_base
 from apache_beam.dataframe import transforms
 
+if typing.TYPE_CHECKING:
+  # pylint: disable=ungrouped-imports
+  from typing import Any
+  from typing import Dict
+  from typing import Tuple
+  from typing import Union

Review comment:
   I think it still needs to be fixed for `dataframe.convert`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437530)
Time Spent: 84h 10m  (was: 84h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: P2
>  Time Spent: 84h 10m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9650) Add consistent slowly changing side inputs support

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9650?focusedWorklogId=437531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437531
 ]

ASF GitHub Bot logged work on BEAM-9650:


Author: ASF GitHub Bot
Created on: 27/May/20 04:02
Start Date: 27/May/20 04:02
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on a change in pull request #11582:
URL: https://github.com/apache/beam/pull/11582#discussion_r430612821



##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -1641,3 +1644,314 @@ def process(self, unused_element, signal):
 *self._args,
 **self._kwargs))
 | _PassThroughThenCleanup(RemoveJsonFiles(gcs_location)))
+
+
+class _ExtractBQData(DoFn):
+  '''
+  PTransform:ReadAllFromBigQueryRequest->FileMetadata that fetches BQ data into
+  a temporary storage and returns metadata for created files.
+  '''
+  def __init__(
+  self,
+  gcs_location_pattern=None,
+  project=None,
+  coder=None,
+  schema=None,
+  kms_key=None):
+
+self.gcs_location_pattern = gcs_location_pattern
+self.project = project
+self.coder = coder or _JsonToDictCoder
+self.kms_key = kms_key
+self.split_result = None
+self.schema = schema
+self.target_schema = None
+
+  def process(self, element):
+'''
+:param element(ReadAllFromBigQueryRequest):
+:return:
+'''
+element.validate()
+if element.table is not None:
+  table_reference = bigquery_tools.parse_table_reference(element.table)
+  query = None
+  use_legacy_sql = True
+else:
+  query = element.query
+  use_legacy_sql = element.use_legacy_sql
+
+flatten_results = element.flatten_results
+
+bq = bigquery_tools.BigQueryWrapper()
+
+try:
+  if element.query is not None:
+self._setup_temporary_dataset(bq, query, use_legacy_sql)
+table_reference = self._execute_query(
+bq, query, use_legacy_sql, flatten_results)
+
+  gcs_location = self.gcs_location_pattern.format(uuid.uuid4().hex)
+
+  table_schema = bq.get_table(
+  table_reference.projectId,
+  table_reference.datasetId,
+  table_reference.tableId).schema
+
+  if self.target_schema is None:
+self.target_schema = bigquery_tools.parse_table_schema_from_json(
+json.dumps(self.schema))
+
+  if not self.target_schema == table_schema:
+raise ValueError((
+"Schema generated by reading from BQ doesn't match expected"
+"schema.\nExpected: {}\nActual: {}").format(
+self.target_schema, table_schema))
+
+  metadata_list = self._export_files(bq, table_reference, gcs_location)
+
+  yield pvalue.TaggedOutput('location_to_cleanup', gcs_location)
+  for metadata in metadata_list:
+yield metadata.path
+
+finally:
+  if query is not None:
+bq.clean_up_temporary_dataset(self.project)
+
+  def _setup_temporary_dataset(self, bq, query, use_legacy_sql):
+location = bq.get_query_location(self.project, query, use_legacy_sql)
+bq.create_temporary_dataset(self.project, location)
+
+  def _execute_query(self, bq, query, use_legacy_sql, flatten_results):
+job = bq._start_query_job(
+self.project,
+query,
+use_legacy_sql,
+flatten_results,
+job_id=uuid.uuid4().hex,
+kms_key=self.kms_key)
+job_ref = job.jobReference
+bq.wait_for_bq_job(job_ref)
+return bq._get_temp_table(self.project)
+
+  def _export_files(self, bq, table_reference, gcs_location):
+"""Runs a BigQuery export job.
+
+Returns:
+  a list of FileMetadata instances
+"""
+job_id = uuid.uuid4().hex
+job_ref = bq.perform_extract_job([gcs_location],
+ job_id,
+ table_reference,
+ bigquery_tools.FileFormat.JSON,
+ include_header=False)
+bq.wait_for_bq_job(job_ref)
+metadata_list = FileSystems.match([gcs_location])[0].metadata_list
+
+return metadata_list
+
+
+class _PassThroughThenCleanupWithSI(PTransform):
+  """A PTransform that invokes a DoFn after the input PCollection has been
+processed.
+
+DoFn should have arguments (element, side_input, cleanup_signal).
+
+Utilizes readiness of PCollection to trigger DoFn.
+  """
+  def __init__(self, cleanup_dofn, side_input):
+self.cleanup_dofn = cleanup_dofn
+self.side_input = side_input
+
+  def expand(self, input):
+class PassThrough(beam.DoFn):
+  def process(self, element):
+yield element
+
+main_output, cleanup_signal = input | beam.ParDo(
+  PassThrough()).with_outputs(
+  'cleanup_signal', main='main')
+
+_ = (
+input.pipeline
+| beam.Create([None])
+| 

[jira] [Work logged] (BEAM-10075) Allow users to tune the grouping table size in batch dataflow pipelines

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10075?focusedWorklogId=437523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437523
 ]

ASF GitHub Bot logged work on BEAM-10075:
-

Author: ASF GitHub Bot
Created on: 27/May/20 04:01
Start Date: 27/May/20 04:01
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on pull request #11811:
URL: https://github.com/apache/beam/pull/11811#issuecomment-634097057


   looks like the precommit failed on flink 1.9 tests: 
   ``` 
   
org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest
 > 
org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest.testWatermarkEmission[numTasks
 = 4; numSplits=2] FAILED
   ```
   
   Which I'm going to guess are unrelated to this change ;)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437523)
Time Spent: 40m  (was: 0.5h)

> Allow users to tune the grouping table size in batch dataflow pipelines
> ---
>
> Key: BEAM-10075
> URL: https://issues.apache.org/jira/browse/BEAM-10075
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The dataflow worker hard-codes the grouping table size to 100 MB.  We should 
> allow users to specify this as a pipeline parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9770) Add BigQuery DeadLetter pattern to Patterns Page

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9770?focusedWorklogId=437516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437516
 ]

ASF GitHub Bot logged work on BEAM-9770:


Author: ASF GitHub Bot
Created on: 27/May/20 04:00
Start Date: 27/May/20 04:00
Worklog Time Spent: 10m 
  Work Description: rezarokni opened a new pull request #11815:
URL: https://github.com/apache/beam/pull/11815


   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-2939) Fn API SDF support

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=437518=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437518
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 27/May/20 04:00
Start Date: 27/May/20 04:00
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit merged pull request #11818:
URL: https://github.com/apache/beam/pull/11818


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437518)
Time Spent: 35h 20m  (was: 35h 10m)

> Fn API SDF support
> --
>
> Key: BEAM-2939
> URL: https://issues.apache.org/jira/browse/BEAM-2939
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: P2
>  Labels: portability
>  Time Spent: 35h 20m
>  Remaining Estimate: 0h
>
> The Fn API should support streaming SDF. Detailed design TBD.
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10076) Dataflow worker status page incorrectly displays work item statuses

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10076?focusedWorklogId=437515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437515
 ]

ASF GitHub Bot logged work on BEAM-10076:
-

Author: ASF GitHub Bot
Created on: 27/May/20 04:00
Start Date: 27/May/20 04:00
Worklog Time Spent: 10m 
  Work Description: lukecwik merged pull request #11812:
URL: https://github.com/apache/beam/pull/11812


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437515)
Time Spent: 20m  (was: 10m)

> Dataflow worker status page incorrectly displays work item statuses
> ---
>
> Key: BEAM-10076
> URL: https://issues.apache.org/jira/browse/BEAM-10076
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: P2
> Fix For: 2.23.0
>
> Attachments: image-2020-05-25-17-13-49-465.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The work item status page incorrectly renders its table due to an incorrectly 
> placed  tag.
>  (see attached screenshot)
> !image-2020-05-25-17-13-49-465.png|width=512,height=94!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10073) Add dashboard for pubsub performance tests to grafana

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10073?focusedWorklogId=437513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437513
 ]

ASF GitHub Bot logged work on BEAM-10073:
-

Author: ASF GitHub Bot
Created on: 27/May/20 04:00
Start Date: 27/May/20 04:00
Worklog Time Spent: 10m 
  Work Description: piotr-szuberski commented on a change in pull request 
#11809:
URL: https://github.com/apache/beam/pull/11809#discussion_r430022185



##
File path: 
.test-infra/metrics/grafana/dashboards/perftests_metrics/Python_IO_IT_Tests_Dataflow.json
##
@@ -35,7 +35,7 @@
   },
   "hiddenSeries": false,
   "id": 2,
-  "interval": "",
+  "interval": "15h",

Review comment:
   I chose 15h because this interval is set to the AutoJob in jenkins. The 
java FileBased dashboards have it set to this value (in this case 6h)

##
File path: 
.test-infra/metrics/grafana/dashboards/perftests_metrics/Python_IO_IT_Tests_Dataflow.json
##
@@ -261,6 +261,250 @@
 "align": false,
 "alignLevel": null
   }
+},
+{
+  "aliasColors": {},
+  "bars": false,
+  "cacheTimeout": null,
+  "dashLength": 10,
+  "dashes": false,
+  "datasource": "BeamInfluxDB",
+  "fill": 1,
+  "fillGradient": 0,
+  "gridPos": {
+"h": 9,
+"w": 12,
+"x": 0,
+"y": 9
+  },
+  "hiddenSeries": false,
+  "id": 4,
+  "interval": "15h",
+  "legend": {
+"avg": false,
+"current": false,
+"max": false,
+"min": false,
+"show": false,
+"total": false,
+"values": false
+  },
+  "lines": true,
+  "linewidth": 2,
+  "links": [],
+  "nullPointMode": "connected",
+  "options": {
+"dataLinks": []
+  },
+  "percentage": false,
+  "pluginVersion": "6.7.2",
+  "pointradius": 2,
+  "points": true,
+  "renderer": "flot",
+  "seriesOverrides": [],
+  "spaceLength": 10,
+  "stack": false,
+  "steppedLine": false,
+  "targets": [
+{
+  "alias": "read_time",
+  "groupBy": [
+{
+  "params": [
+"$__interval"
+  ],
+  "type": "time"
+}
+  ],
+  "measurement": "python_bqio_read",
+  "orderByTime": "ASC",
+  "policy": "default",
+  "query": "SELECT mean(\"value\") FROM \"python_psio_2GB_results\" 
WHERE \"metric\" = \"pubsub_io_perf_read_runtime\" AND $timeFilter GROUP BY 
time($__interval), \"metric\"",
+  "rawQuery": true,
+  "refId": "A",
+  "resultFormat": "time_series",
+  "select": [
+[
+  {
+"params": [
+  "value"
+],
+"type": "field"
+  },
+  {
+"params": [],
+"type": "mean"
+  }
+]
+  ],
+  "tags": []
+}
+  ],
+  "thresholds": [],
+  "timeFrom": null,
+  "timeRegions": [],
+  "timeShift": null,
+  "title": "Reading 2GB of data | Pubsub IO",

Review comment:
   Done

##
File path: 
.test-infra/metrics/grafana/dashboards/perftests_metrics/Python_IO_IT_Tests_Dataflow.json
##
@@ -35,7 +35,7 @@
   },
   "hiddenSeries": false,
   "id": 2,
-  "interval": "",
+  "interval": "15h",

Review comment:
   I didn't dig into it, bu now it's clear. I changed to 24h

##
File path: 
.test-infra/metrics/grafana/dashboards/perftests_metrics/Python_IO_IT_Tests_Dataflow.json
##
@@ -35,7 +35,7 @@
   },
   "hiddenSeries": false,
   "id": 2,
-  "interval": "",
+  "interval": "15h",

Review comment:
   I didn't dig into it and was distracted by other jobs that indeed run 
every few hours. But now it's clear. I changed to 24h

##
File path: 
.test-infra/metrics/grafana/dashboards/perftests_metrics/Python_IO_IT_Tests_Dataflow.json
##
@@ -35,7 +35,7 @@
   },
   "hiddenSeries": false,
   "id": 2,
-  "interval": "",
+  "interval": "15h",

Review comment:
   I didn't dig into it and was distracted by other jobs that indeed run 
every few hours. But now the syntax of schedules is clear. I changed to 24h





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437513)
Time Spent: 50m  (was: 40m)

> Add dashboard for pubsub performance tests to grafana
> 

[jira] [Work logged] (BEAM-7568) Java dataflow harness re-encodes value state cells even if they haven't changed

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7568?focusedWorklogId=437509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437509
 ]

ASF GitHub Bot logged work on BEAM-7568:


Author: ASF GitHub Bot
Created on: 27/May/20 03:59
Start Date: 27/May/20 03:59
Worklog Time Spent: 10m 
  Work Description: steveniemitz opened a new pull request #11823:
URL: https://github.com/apache/beam/pull/11823


   The dataflow runner will re-encode ValueState cells every commit, even if 
they haven't changed (and won't be persisted).  This can be expensive for 
large, but infrequently changed values.
   
   This simply adds a field to the value state and caches the size when 
encoding the value the first time.
   
   R: @pabloem @lukecwik 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 

[jira] [Work logged] (BEAM-10065) Docs - Beam "Release guide" template is broken

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10065?focusedWorklogId=437506=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437506
 ]

ASF GitHub Bot logged work on BEAM-10065:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:58
Start Date: 27/May/20 03:58
Worklog Time Spent: 10m 
  Work Description: ibzib commented on a change in pull request #11797:
URL: https://github.com/apache/beam/pull/11797#discussion_r430480420



##
File path: website/www/site/static/js/language-switch.js
##
@@ -113,7 +113,7 @@ $(document).ready(function() {
}
 
 // Swapping visibility of code blocks.
-$(this.selector).hide();
+$(this.selector).not(".language-md").hide();

Review comment:
   Why is this change needed?

##
File path: website/www/site/static/js/language-switch.js
##
@@ -113,7 +113,7 @@ $(document).ready(function() {
}
 
 // Swapping visibility of code blocks.
-$(this.selector).hide();
+$(this.selector).not(".language-md").hide();

Review comment:
   This is a bigger problem than just markdown. Every code block that 
specifies a language that is not toggled on will be hidden. I filed BEAM-10092.
   
   For now, since syntax highlighting is not important here, please remove the 
`md` and we can merge the rest of your changes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437506)
Time Spent: 40m  (was: 0.5h)

> Docs - Beam "Release guide" template is broken
> --
>
> Key: BEAM-10065
> URL: https://issues.apache.org/jira/browse/BEAM-10065
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Ashwin Ramaswami
>Assignee: Ashwin Ramaswami
>Priority: P2
> Attachments: Screen Shot 2020-05-22 at 9.09.35 AM.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> It just shows "e>" for the template.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10093) Add ZetaSQL Nexmark variant

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10093?focusedWorklogId=437501=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437501
 ]

ASF GitHub Bot logged work on BEAM-10093:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:57
Start Date: 27/May/20 03:57
Worklog Time Spent: 10m 
  Work Description: kennknowles opened a new pull request #11820:
URL: https://github.com/apache/beam/pull/11820


   Copy/paste/modify job. Split the test classes to make it easy to run just 
one of them. After this we can add a Jenkins run to report benchmarks.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-10077) using filename + hash instead of UUID for staging name

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10077?focusedWorklogId=437496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437496
 ]

ASF GitHub Bot logged work on BEAM-10077:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:57
Start Date: 27/May/20 03:57
Worklog Time Spent: 10m 
  Work Description: ihji commented on a change in pull request #11813:
URL: https://github.com/apache/beam/pull/11813#discussion_r430576598



##
File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java
##
@@ -264,6 +259,14 @@ public static Environment createProcessEnvironment(
   .build()
   .toByteString());
 }
+if (stagedName == null) {
+  stagedName = createStagingFileName(file, hashCode);

Review comment:
   I believe existing unit test covers this path. This is the path when a 
given path doesn't have special '=' character. In that case, we generate the 
staged name.

##
File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
##
@@ -211,8 +210,8 @@
 
   @VisibleForTesting static final int GCS_UPLOAD_BUFFER_SIZE_BYTES_DEFAULT = 
1024 * 1024;
 
-  @VisibleForTesting static final String PIPELINE_FILE_FORMAT = 
"pipeline-%s.pb";
-  @VisibleForTesting static final String DATAFLOW_GRAPH_FILE_FORMAT = 
"dataflow_graph-%s.json";
+  @VisibleForTesting static final String PIPELINE_FILE_NAME = "pipeline.pb";

Review comment:
   In the previous PR, `forBytesToStage` only respected the target name as 
is so we needed to make the target name unique. Now `forBytesToStage` generates 
unique names by itself so we don't need to put UUID suffix manually.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437496)
Time Spent: 1h  (was: 50m)

> using filename + hash instead of UUID for staging name
> --
>
> Key: BEAM-10077
> URL: https://issues.apache.org/jira/browse/BEAM-10077
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Recent change BEAM-9383 disabled the artifact caching logic for GCS by object 
> names. Changing staging name generation from UUID to filename + hash will 
> re-enable the artifact caching so we can avoid re-uploading same artifact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8371) Sunset Beam Python 2 support in new releases in 2020.

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8371?focusedWorklogId=437494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437494
 ]

ASF GitHub Bot logged work on BEAM-8371:


Author: ASF GitHub Bot
Created on: 27/May/20 03:57
Start Date: 27/May/20 03:57
Worklog Time Spent: 10m 
  Work Description: epicfaace commented on pull request #11819:
URL: https://github.com/apache/beam/pull/11819#issuecomment-634347950


   @tvalentyn Got it. I didn't think that the actual build files for python 2 
would change much until the time we drop Py2 support, so that's why I thought 
this PR would be useful for when we actually drop Py2 support.
   
   What parts of this PR would be fine to do for now? Do you think we could 
move some tests (such as the runner tests) to py3 for now, as @pabloem 
mentioned?
   
   Also, do you have an idea of when we might want to drop Py2 support? We did 
commit to the fact that we would drop Py2 support in 2020 -- are you thinking 
sometime in the fall?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437494)
Time Spent: 40m  (was: 0.5h)

> Sunset Beam Python 2 support in new releases in 2020.
> -
>
> Key: BEAM-8371
> URL: https://issues.apache.org/jira/browse/BEAM-8371
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Ashwin Ramaswami
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Creating this Jira to track eventual sunset of Python 2 support in Beam.
> This was previously discussed in [1], [2], [3].
> We can use this issue to communicate next steps, collect feedback and give 
> updates on current status.
> [1] 
> https://lists.apache.org/thread.html/eba6caa58ea79a7ecbc8560d1c680a366b44c531d96ce5c699d41535@%3Cdev.beam.apache.org%3E
> [2] 
> https://lists.apache.org/thread.html/456631fe1a696c537ef8ebfee42cd3ea8121bf7c639c52da5f7032e7@%3Cdev.beam.apache.org%3E
> [3] 
> https://lists.apache.org/thread.html/r0d5c309a7e3107854f4892ccfeb1a17c0cec25dfce188678ab8df072%40%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7770) Add ReadAll transform for SolrIO

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7770?focusedWorklogId=437492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437492
 ]

ASF GitHub Bot logged work on BEAM-7770:


Author: ASF GitHub Bot
Created on: 27/May/20 03:56
Start Date: 27/May/20 03:56
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #11357:
URL: https://github.com/apache/beam/pull/11357#issuecomment-633987107







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437492)
Time Spent: 40m  (was: 0.5h)

> Add ReadAll transform for SolrIO
> 
>
> Key: BEAM-7770
> URL: https://issues.apache.org/jira/browse/BEAM-7770
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-solr
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: P3
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> SolrIO already uses internally a composable approach but we need to expose an 
> explicit ReadAll transform that allows user to create reads in the middle of 
> the Pipeline to improve composability (e.g. Reads in the middle of a 
> Pipeline).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10073) Add dashboard for pubsub performance tests to grafana

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10073?focusedWorklogId=437488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437488
 ]

ASF GitHub Bot logged work on BEAM-10073:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:56
Start Date: 27/May/20 03:56
Worklog Time Spent: 10m 
  Work Description: kamilwu merged pull request #11809:
URL: https://github.com/apache/beam/pull/11809


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437488)
Time Spent: 40m  (was: 0.5h)

> Add dashboard for pubsub performance tests to grafana
> -
>
> Key: BEAM-10073
> URL: https://issues.apache.org/jira/browse/BEAM-10073
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Assignee: Piotr Szuberski
>Priority: P1
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Pubsub IO performance tests are not published to influx and not displayed in 
> grafana dashboards



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9916) Update IOs documentation links

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9916?focusedWorklogId=437489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437489
 ]

ASF GitHub Bot logged work on BEAM-9916:


Author: ASF GitHub Bot
Created on: 27/May/20 03:56
Start Date: 27/May/20 03:56
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11802:
URL: https://github.com/apache/beam/pull/11802#issuecomment-634186277







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437489)
Time Spent: 1h 10m  (was: 1h)

> Update IOs documentation links
> --
>
> Key: BEAM-9916
> URL: https://issues.apache.org/jira/browse/BEAM-9916
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Alexey Romanenko
>Assignee: Ashwin Ramaswami
>Priority: P3
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently, on the webpage 
> [https://beam.apache.org/documentation/io/built-in/] , we link all IOs to 
> their code on github, which could be quite odd for users. After a 
> [discussion|https://lists.apache.org/thread.html/r2c7bb9a2e13f1b8c296930fc8a36edafe731f49b5182567989afcb98%40%3Cdev.beam.apache.org%3E]
>  on dev@, it was decided to link them to the latest doc reference (like 
> [https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileSystems.html]
>  for FileIO).
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9946) Enhance Partition transform to provide partitionfn with SideInputs

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9946?focusedWorklogId=437490=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437490
 ]

ASF GitHub Bot logged work on BEAM-9946:


Author: ASF GitHub Bot
Created on: 27/May/20 03:56
Start Date: 27/May/20 03:56
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #11682:
URL: https://github.com/apache/beam/pull/11682#issuecomment-634091221


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437490)
Remaining Estimate: 93h 20m  (was: 93.5h)
Time Spent: 2h 40m  (was: 2.5h)

> Enhance Partition transform to provide partitionfn with SideInputs
> --
>
> Key: BEAM-9946
> URL: https://issues.apache.org/jira/browse/BEAM-9946
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: P2
>   Original Estimate: 96h
>  Time Spent: 2h 40m
>  Remaining Estimate: 93h 20m
>
> Currently _Partition_ transform can partition a collection into n collections 
> based on only _element_ value in _PartitionFn_ to decide on which partition a 
> particular element belongs to.
> {code:java}
> public interface PartitionFn extends Serializable {
> int partitionFor(T elem, int numPartitions);
>   }
> public static  Partition of(int numPartitions, PartitionFn 
> partitionFn) {
> return new Partition<>(new PartitionDoFn(numPartitions, partitionFn));
>   }
> {code}
> It will be useful to introduce new API with additional _sideInputs_ provided 
> to partition function. User will be able to write logic to use both _element_ 
> value and _sideInputs_ to decide on which partition a particular element 
> belongs to.
> Option-1: Proposed new API:
> {code:java}
>   public interface PartitionWithSideInputsFn extends Serializable {
> int partitionFor(T elem, int numPartitions, Context c);
>   }
> public static  Partition of(int numPartitions, 
> PartitionWithSideInputsFn partitionFn, Requirements requirements) {
>  ...
>   }
> {code}
> User can use any of the two APIs as per there partitioning function logic.
> Option-2: Redesign old API with Builder Pattern which can provide optionally 
> a _Requirements_ with _sideInputs._ Deprecate old API.
> {code:java}
> // using sideviews
> Partition.into(numberOfPartitions).via(
> fn(
>   (input,c) ->  {
> // use c.sideInput(view)
> // use input
> // return partitionnumber
>  },requiresSideInputs(view))
> )
> // without using sideviews
> Partition.into(numberOfPartitions).via(
> fn((input,c) ->  {
> // use input
> // return partitionnumber
>  })
> )
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9926) Certain code examples for programming guide are not showing

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9926?focusedWorklogId=437479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437479
 ]

ASF GitHub Bot logged work on BEAM-9926:


Author: ASF GitHub Bot
Created on: 27/May/20 03:55
Start Date: 27/May/20 03:55
Worklog Time Spent: 10m 
  Work Description: epicfaace commented on a change in pull request #11790:
URL: https://github.com/apache/beam/pull/11790#discussion_r430653038



##
File path: website/www/site/content/en/documentation/programming-guide.md
##
@@ -683,18 +687,18 @@ that for you. Your `@ProcessElement` method should accept 
a parameter tagged wit
 `@Element`, which will be populated with the input element. In order to output
 elements, the method can also take a parameter of type `OutputReceiver` which
 provides a method for emitting elements. The parameter types must match the 
input
-and output types of your `DoFn` or the framework will raise an error. Note: 
@Element and
-OutputReceiver were introduced in Beam 2.5.0; if using an earlier release of 
Beam, a
-ProcessContext parameter should be used instead.
+and output types of your `DoFn` or the framework will raise an error. Note: 
`@Element` and
+`OutputReceiver` were introduced in Beam 2.5.0; if using an earlier release of 
Beam, a
+`ProcessContext` parameter should be used instead.
 {{< /paragraph >}}
 
 {{< paragraph class="language-py" >}}
 Inside your `DoFn` subclass, you'll write a method `process` where you provide
 the actual processing logic. You don't need to manually extract the elements
 from the input collection; the Beam SDKs handle that for you. Your `process`
-method should accept an object of type `element`. This is the input element and
-output is emitted by using `yield` or `return` statement inside `process`
-method.
+method should accept an argument `element`, which is the input element, emit
+output value(s) by using `yield` statements. You can also use a `return`
+statement if you are sure the method will return only a single output value.

Review comment:
   That sounds good!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437479)
Time Spent: 1h 20m  (was: 1h 10m)

> Certain code examples for programming guide are not showing
> ---
>
> Key: BEAM-9926
> URL: https://issues.apache.org/jira/browse/BEAM-9926
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Ashwin Ramaswami
>Priority: P2
> Attachments: screenshot-1.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Seems like the code examples for the entire State section are missing. See 
> [https://beam.apache.org/documentation/programming-guide/#state-and-timers]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10074) Hash Functions in BeamSQL

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10074?focusedWorklogId=437480=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437480
 ]

ASF GitHub Bot logged work on BEAM-10074:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:55
Start Date: 27/May/20 03:55
Worklog Time Spent: 10m 
  Work Description: darshanj opened a new pull request #11817:
URL: https://github.com/apache/beam/pull/11817


   Implemented Hashing functions MD5, SHA1, SHA256 and SHA512.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-9679) Core Transforms | Go SDK Code Katas

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9679?focusedWorklogId=437405=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437405
 ]

ASF GitHub Bot logged work on BEAM-9679:


Author: ASF GitHub Bot
Created on: 27/May/20 03:45
Start Date: 27/May/20 03:45
Worklog Time Spent: 10m 
  Work Description: damondouglas commented on a change in pull request 
#11803:
URL: https://github.com/apache/beam/pull/11803#discussion_r430474979



##
File path: learning/katas/go/Core 
Transforms/CoGroupByKey/CoGroupByKey/pkg/task/task.go
##
@@ -0,0 +1,52 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package task
+
+import (
+   "fmt"
+   "github.com/apache/beam/sdks/go/pkg/beam"
+)
+
+func ApplyTransform(s beam.Scope, fruits beam.PCollection, countries 
beam.PCollection) beam.PCollection {
+   fruitsKV := beam.ParDo(s, func(e string) (string, string) {
+   return string(e[0]), e
+   }, fruits)
+
+   countriesKV := beam.ParDo(s, func(e string) (string, string) {
+   return string(e[0]), e
+   }, countries)
+
+   grouped := beam.CoGroupByKey(s, fruitsKV, countriesKV)
+   return beam.ParDo(s, func(key string, f func(*string) bool, c 
func(*string) bool, emit func(string)) {

Review comment:
   I agree with you and realizing for code readability purposes it makes 
sense.  I created 
[BEAM-10091](https://issues.apache.org/jira/browse/BEAM-10091) when we are all 
complete with the series of Go SDK katas.  I'd like to perform an overall 
cleanup for naming consistency and code readability.

##
File path: learning/katas/go/Core 
Transforms/CoGroupByKey/CoGroupByKey/pkg/task/task.go
##
@@ -0,0 +1,52 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package task
+
+import (
+   "fmt"
+   "github.com/apache/beam/sdks/go/pkg/beam"
+)
+
+func ApplyTransform(s beam.Scope, fruits beam.PCollection, countries 
beam.PCollection) beam.PCollection {
+   fruitsKV := beam.ParDo(s, func(e string) (string, string) {
+   return string(e[0]), e
+   }, fruits)
+
+   countriesKV := beam.ParDo(s, func(e string) (string, string) {
+   return string(e[0]), e
+   }, countries)
+
+   grouped := beam.CoGroupByKey(s, fruitsKV, countriesKV)
+   return beam.ParDo(s, func(key string, f func(*string) bool, c 
func(*string) bool, emit func(string)) {

Review comment:
   I agree with you and realizing for code readability purposes it makes 
sense.  I created 
[BEAM-10091](https://issues.apache.org/jira/browse/BEAM-10091) when we are all 
complete with the series of Go SDK katas.  I'd like to perform an overall 
cleanup for naming consistency and code readability.
   
   For now, I will adjust the current CoGroupByKey lesson task for code 
readability.  Thank you, Henry.

##
File path: learning/katas/go/Core Transforms/CoGroupByKey/CoGroupByKey/task.md
##
@@ -0,0 +1,104 @@
+
+
+# CoGroupByKey
+
+CoGroupByKey performs a relational join of two or more key/value PCollections 
that have the same 
+key type.
+
+**Kata:** Implement a 
[beam.CoGroupByKey](https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#CoGroupByKey)
 
+transform that join words by the first alphabetical letter, and then produces 
the string representation of the 
+WordsAlphabet model.
+
+
+Refer to
+

[jira] [Work logged] (BEAM-10076) Dataflow worker status page incorrectly displays work item statuses

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10076?focusedWorklogId=437421=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437421
 ]

ASF GitHub Bot logged work on BEAM-10076:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:46
Start Date: 27/May/20 03:46
Worklog Time Spent: 10m 
  Work Description: steveniemitz opened a new pull request #11812:
URL: https://github.com/apache/beam/pull/11812


   R: @lukecwik 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-9971) beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9971?focusedWorklogId=437477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437477
 ]

ASF GitHub Bot logged work on BEAM-9971:


Author: ASF GitHub Bot
Created on: 27/May/20 03:54
Start Date: 27/May/20 03:54
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #11784:
URL: https://github.com/apache/beam/pull/11784#issuecomment-634118385


   There's a ReflectHelpers class in core with some utility functions for 
loading services and finding class loaders: 
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/common/ReflectHelpers.java#L247
   
   Sorry if this is just noise, I don't have much context here, just thought 
I'd point it out in case its helpful.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437477)
Time Spent: 50m  (was: 40m)

> beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)
> --
>
> Key: BEAM-9971
> URL: https://issues.apache.org/jira/browse/BEAM-9971
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: portability-spark
> Fix For: 2.22.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This happens sporadically. One time the issue affected 14 tests; another time 
> it affected 112 tests.
> It looks like the ClassLoader is sometimes contaminated with jars from 
> /tmp/spark-*, which have already been deleted.
> 20/05/21 13:54:27 ERROR org.apache.beam.runners.jobsubmission.JobInvocation: 
> Error during job invocation 
> pipelinetest0testidentitytransform-kcweaver-0521205426-f4de06c4_51aced77-c171-4842-be1f-6c79226872e5.
> java.util.ServiceConfigurationError: 
> org.apache.beam.runners.core.construction.NativeTransforms$IsNativeTransform: 
> Error reading configuration file
>   at java.util.ServiceLoader.fail(ServiceLoader.java:232)
>   at java.util.ServiceLoader.parse(ServiceLoader.java:309)
>   at java.util.ServiceLoader.access$200(ServiceLoader.java:185)
>   at 
> java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:357)
>   at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393)
>   at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474)
>   at 
> org.apache.beam.runners.core.construction.NativeTransforms.isNative(NativeTransforms.java:50)
>   at 
> org.apache.beam.runners.core.construction.graph.QueryablePipeline.isPrimitiveTransform(QueryablePipeline.java:189)
>   at 
> org.apache.beam.runners.core.construction.graph.QueryablePipeline.getPrimitiveTransformIds(QueryablePipeline.java:137)
>   at 
> org.apache.beam.runners.core.construction.graph.QueryablePipeline.forPrimitivesIn(QueryablePipeline.java:90)
>   at 
> org.apache.beam.runners.core.construction.graph.GreedyPipelineFuser.(GreedyPipelineFuser.java:67)
>   at 
> org.apache.beam.runners.core.construction.graph.GreedyPipelineFuser.fuse(GreedyPipelineFuser.java:90)
>   at 
> org.apache.beam.runners.spark.SparkPipelineRunner.run(SparkPipelineRunner.java:94)
>   at 
> org.apache.beam.runners.jobsubmission.JobInvocation.runPipeline(JobInvocation.java:83)
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: 
> /tmp/spark-5e8a8a9a-22d6-48d5-b398-1a4f5582d954/userFiles-ec74cac1-21b5-4127-b764-540636b733d0/beam-runners-core-construction-java-2.22.0-SNAPSHOT-tests.jar
>  (No such file or directory)
>   at java.util.zip.ZipFile.open(Native Method)
>   at java.util.zip.ZipFile.(ZipFile.java:230)
>   at java.util.zip.ZipFile.(ZipFile.java:155)
>   at 

[jira] [Work logged] (BEAM-10078) uniquify Dataflow specific jars when staging

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10078?focusedWorklogId=437469=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437469
 ]

ASF GitHub Bot logged work on BEAM-10078:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:54
Start Date: 27/May/20 03:54
Worklog Time Spent: 10m 
  Work Description: ihji opened a new pull request #11814:
URL: https://github.com/apache/beam/pull/11814


   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-5863) Automate Community Metrics infrastructure deployment

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5863?focusedWorklogId=437471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437471
 ]

ASF GitHub Bot logged work on BEAM-5863:


Author: ASF GitHub Bot
Created on: 27/May/20 03:54
Start Date: 27/May/20 03:54
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on a change in pull request #11816:
URL: https://github.com/apache/beam/pull/11816#discussion_r430630493



##
File path: .test-infra/metrics/build_and_publish_containers.sh
##
@@ -36,8 +33,8 @@ fi
 
 echo
 echo ===Start==
-CONTAINER_VERSION_NAME=$1
-DO_PUSH=$2
+DO_PUSH=$1
+CONTAINER_VERSION_NAME=${2:-latest}

Review comment:
   will latest overwrite container image version? If not, we should add 
cleanup script for old versions.

##
File path: .test-infra/metrics/build.gradle
##
@@ -50,9 +51,49 @@ dockerCompose {
 
 dockerCompose.isRequiredBy(testMetricsStack)
 
-task preCommit { dependsOn testMetricsStack }
+task validateConfiguration(type: Exec) {
+  commandLine 'sh', '-c', 'kubectl apply --dry-run=true -Rf kubernetes'
+}
+
+task preCommit {
+  dependsOn validateConfiguration
+  dependsOn testMetricsStack
+}
+
+task buildAndPublishContainers(type: Exec) {
+  commandLine './build_and_publish_containers.sh', 'true'
+}
+
+// Applies new configuration to all resources labeled with `app=beammetrics`
+// and forces Kubernetes to re-pull images.
+task applyConfiguration() {
+  doLast {
+assert grgit : 'Cannot use outside of git repository'
+
+def git = grgit.open()
+def commitedChanges = git.log(paths: ['.test-infra/metrics']).findAll {
+  it.dateTime > ZonedDateTime.now().minusHours(6)

Review comment:
   Is there a reason to detect timee vs automatic job trigger by commit to 
master with relevant folder?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437471)
Time Spent: 0.5h  (was: 20m)

> Automate Community Metrics infrastructure deployment
> 
>
> Key: BEAM-5863
> URL: https://issues.apache.org/jira/browse/BEAM-5863
> Project: Beam
>  Issue Type: Sub-task
>  Components: community-metrics, project-management
>Reporter: Scott Wegner
>Assignee: Kamil Wasilewski
>Priority: P3
>  Labels: community-metrics
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently the deployment process for the production Community Metrics stack 
> is manual (documented 
> [here|https://cwiki.apache.org/confluence/display/BEAM/Community+Metrics]). 
> If we end up having to deploy more than a few times a year, it would be nice 
> to automate these steps.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10077) using filename + hash instead of UUID for staging name

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10077?focusedWorklogId=437466=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437466
 ]

ASF GitHub Bot logged work on BEAM-10077:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:53
Start Date: 27/May/20 03:53
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11813:
URL: https://github.com/apache/beam/pull/11813#issuecomment-634085265







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437466)
Time Spent: 50m  (was: 40m)

> using filename + hash instead of UUID for staging name
> --
>
> Key: BEAM-10077
> URL: https://issues.apache.org/jira/browse/BEAM-10077
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Recent change BEAM-9383 disabled the artifact caching logic for GCS by object 
> names. Changing staging name generation from UUID to filename + hash will 
> re-enable the artifact caching so we can avoid re-uploading same artifact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10073) Add dashboard for pubsub performance tests to grafana

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10073?focusedWorklogId=437461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437461
 ]

ASF GitHub Bot logged work on BEAM-10073:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:52
Start Date: 27/May/20 03:52
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #11809:
URL: https://github.com/apache/beam/pull/11809#issuecomment-633935810







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437461)
Time Spent: 0.5h  (was: 20m)

> Add dashboard for pubsub performance tests to grafana
> -
>
> Key: BEAM-10073
> URL: https://issues.apache.org/jira/browse/BEAM-10073
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Assignee: Piotr Szuberski
>Priority: P1
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Pubsub IO performance tests are not published to influx and not displayed in 
> grafana dashboards



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8371) Sunset Beam Python 2 support in new releases in 2020.

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8371?focusedWorklogId=437448=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437448
 ]

ASF GitHub Bot logged work on BEAM-8371:


Author: ASF GitHub Bot
Created on: 27/May/20 03:51
Start Date: 27/May/20 03:51
Worklog Time Spent: 10m 
  Work Description: epicfaace commented on a change in pull request #11819:
URL: https://github.com/apache/beam/pull/11819#discussion_r430705916



##
File path: sdks/python/findSupportedPython.groovy
##
@@ -18,7 +18,7 @@
 
 /* This (groovy-maven-plugin) script finds the supported python interpreter 
and pip
  * binary in the path. As there is no strict naming convention exists amongst 
OSes
- * for Python & pip (some call it python2.7, others name it python-2.7),
+ * for Python & pip (some call it python3.5, others name it python-3.5),

Review comment:
   This file only seems to be using Python 2, so I wasn't sure if we still 
needed it.

##
File path: sdks/python/findSupportedPython.groovy
##
@@ -18,7 +18,7 @@
 
 /* This (groovy-maven-plugin) script finds the supported python interpreter 
and pip
  * binary in the path. As there is no strict naming convention exists amongst 
OSes
- * for Python & pip (some call it python2.7, others name it python-2.7),
+ * for Python & pip (some call it python3.5, others name it python-3.5),

Review comment:
   This file only seems to be using Python 2, so I wasn't sure if we still 
needed it. Can we safely delete it?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437448)
Time Spent: 0.5h  (was: 20m)

> Sunset Beam Python 2 support in new releases in 2020.
> -
>
> Key: BEAM-8371
> URL: https://issues.apache.org/jira/browse/BEAM-8371
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Ashwin Ramaswami
>Priority: P2
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Creating this Jira to track eventual sunset of Python 2 support in Beam.
> This was previously discussed in [1], [2], [3].
> We can use this issue to communicate next steps, collect feedback and give 
> updates on current status.
> [1] 
> https://lists.apache.org/thread.html/eba6caa58ea79a7ecbc8560d1c680a366b44c531d96ce5c699d41535@%3Cdev.beam.apache.org%3E
> [2] 
> https://lists.apache.org/thread.html/456631fe1a696c537ef8ebfee42cd3ea8121bf7c639c52da5f7032e7@%3Cdev.beam.apache.org%3E
> [3] 
> https://lists.apache.org/thread.html/r0d5c309a7e3107854f4892ccfeb1a17c0cec25dfce188678ab8df072%40%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10077) using filename + hash instead of UUID for staging name

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10077?focusedWorklogId=437441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437441
 ]

ASF GitHub Bot logged work on BEAM-10077:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:50
Start Date: 27/May/20 03:50
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#11813:
URL: https://github.com/apache/beam/pull/11813#discussion_r430505636



##
File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
##
@@ -211,8 +210,8 @@
 
   @VisibleForTesting static final int GCS_UPLOAD_BUFFER_SIZE_BYTES_DEFAULT = 
1024 * 1024;
 
-  @VisibleForTesting static final String PIPELINE_FILE_FORMAT = 
"pipeline-%s.pb";
-  @VisibleForTesting static final String DATAFLOW_GRAPH_FILE_FORMAT = 
"dataflow_graph-%s.json";
+  @VisibleForTesting static final String PIPELINE_FILE_NAME = "pipeline.pb";

Review comment:
   Seems like this is just reverting to old behavior. Why would these names 
being unique an issue ? Did the change break Dataflow somehow ?

##
File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java
##
@@ -264,6 +259,14 @@ public static Environment createProcessEnvironment(
   .build()
   .toByteString());
 }
+if (stagedName == null) {
+  stagedName = createStagingFileName(file, hashCode);

Review comment:
   Where are we setting the hashCoder for this path ? Also are we missing a 
unit test for this ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437441)
Time Spent: 40m  (was: 0.5h)

> using filename + hash instead of UUID for staging name
> --
>
> Key: BEAM-10077
> URL: https://issues.apache.org/jira/browse/BEAM-10077
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Recent change BEAM-9383 disabled the artifact caching logic for GCS by object 
> names. Changing staging name generation from UUID to filename + hash will 
> re-enable the artifact caching so we can avoid re-uploading same artifact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7370) Beam Dependency Update Request: Sphinx

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7370?focusedWorklogId=437444=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437444
 ]

ASF GitHub Bot logged work on BEAM-7370:


Author: ASF GitHub Bot
Created on: 27/May/20 03:50
Start Date: 27/May/20 03:50
Worklog Time Spent: 10m 
  Work Description: udim merged pull request #11798:
URL: https://github.com/apache/beam/pull/11798


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437444)
Time Spent: 20m  (was: 10m)

> Beam Dependency Update Request: Sphinx
> --
>
> Key: BEAM-7370
> URL: https://issues.apache.org/jira/browse/BEAM-7370
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>  - 2019-05-20 16:38:07.937770 
> -
> Please consider upgrading the dependency Sphinx. 
> The current version is 1.8.5. The latest version is 2.0.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-06-17 12:32:27.855338 
> -
> Please consider upgrading the dependency Sphinx. 
> The current version is 1.8.5. The latest version is 2.1.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-06-24 12:02:59.052884 
> -
> Please consider upgrading the dependency Sphinx. 
> The current version is 1.8.5. The latest version is 2.1.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-07-01 12:04:13.113613 
> -
> Please consider upgrading the dependency Sphinx. 
> The current version is 1.8.5. The latest version is 2.1.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-07-08 12:03:15.091005 
> -
> Please consider upgrading the dependency Sphinx. 
> The current version is 1.8.5. The latest version is 2.1.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-07-15 12:03:09.406918 
> -
> Please consider upgrading the dependency Sphinx. 
> The current version is 1.8.5. The latest version is 2.1.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-07-22 12:03:31.157859 
> -
> Please consider upgrading the dependency Sphinx. 
> The current version is 1.8.5. The latest version is 2.1.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-07-29 12:05:13.023604 
> -
> Please consider upgrading the dependency Sphinx. 
> The current version is 1.8.5. The latest version is 2.1.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-08-05 12:03:03.242767 
> -
> Please consider upgrading the dependency Sphinx. 
> The current version is 1.8.5. The latest version is 2.1.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-08-12 12:04:01.647619 
> -
> Please consider upgrading the dependency Sphinx. 
> The current version is 

[jira] [Work logged] (BEAM-10073) Add dashboard for pubsub performance tests to grafana

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10073?focusedWorklogId=437403=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437403
 ]

ASF GitHub Bot logged work on BEAM-10073:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:44
Start Date: 27/May/20 03:44
Worklog Time Spent: 10m 
  Work Description: piotr-szuberski commented on pull request #11809:
URL: https://github.com/apache/beam/pull/11809#issuecomment-633966513


   > Thanks, LGTM. We have to publish new images and add one more commit that 
updates image tags in deployment resource file. I will do it for you and merge 
afterwards.
   
   Sounds good to me.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437403)
Time Spent: 20m  (was: 10m)

> Add dashboard for pubsub performance tests to grafana
> -
>
> Key: BEAM-10073
> URL: https://issues.apache.org/jira/browse/BEAM-10073
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Assignee: Piotr Szuberski
>Priority: P1
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Pubsub IO performance tests are not published to influx and not displayed in 
> grafana dashboards



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10077) using filename + hash instead of UUID for staging name

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10077?focusedWorklogId=437390=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437390
 ]

ASF GitHub Bot logged work on BEAM-10077:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:43
Start Date: 27/May/20 03:43
Worklog Time Spent: 10m 
  Work Description: ihji opened a new pull request #11813:
URL: https://github.com/apache/beam/pull/11813


   For avoiding artifact re-uploading: 
https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java#L147
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-9363) BeamSQL windowing as TVF

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9363?focusedWorklogId=437439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437439
 ]

ASF GitHub Bot logged work on BEAM-9363:


Author: ASF GitHub Bot
Created on: 27/May/20 03:49
Start Date: 27/May/20 03:49
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #11807:
URL: https://github.com/apache/beam/pull/11807#issuecomment-634182322


   also R: @robinyqiu to check if you could review?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437439)
Time Spent: 7h  (was: 6h 50m)

> BeamSQL windowing as TVF
> 
>
> Key: BEAM-9363
> URL: https://issues.apache.org/jira/browse/BEAM-9363
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: P2
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
>  This Jira tracks the implementation for 
> https://s.apache.org/streaming-beam-sql
> TVF is table-valued function, which is a SQL feature that produce a table as 
> function's output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9742) Add ability to pass FluentBackoff to JdbcIo.Write

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9742?focusedWorklogId=437432=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437432
 ]

ASF GitHub Bot logged work on BEAM-9742:


Author: ASF GitHub Bot
Created on: 27/May/20 03:49
Start Date: 27/May/20 03:49
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #11396:
URL: https://github.com/apache/beam/pull/11396#issuecomment-633894816


   Hi @Akshay-Iyangar, kind ping regarding this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437432)
Time Spent: 2.5h  (was: 2h 20m)

> Add ability to pass FluentBackoff to JdbcIo.Write
> -
>
> Key: BEAM-9742
> URL: https://issues.apache.org/jira/browse/BEAM-9742
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Akshay Iyangar
>Assignee: Akshay Iyangar
>Priority: P3
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently, the FluentBackoff is hardcoded with `maxRetries` and 
> `initialBackoff` .
> It would be helpful if the client were able to pass these values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10075) Allow users to tune the grouping table size in batch dataflow pipelines

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10075?focusedWorklogId=437437=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437437
 ]

ASF GitHub Bot logged work on BEAM-10075:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:49
Start Date: 27/May/20 03:49
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #11811:
URL: https://github.com/apache/beam/pull/11811#discussion_r430831035



##
File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.java
##
@@ -177,6 +177,20 @@ public Dataflow create(PipelineOptions options) {
 
   void setNumberOfWorkerHarnessThreads(int value);
 
+  /**
+   * Size (in MB) of each grouping table used to pre-combine elements. If 
unset, defaults to 100 MB.
+   *
+   * CAUTION: If set too large, workers may run into OOM conditions more 
easily, each worker may
+   * have many grouping tables in-memory concurrently.
+   */
+  @Description(
+  "The size (in MB) of the grouping tables used to pre-combine elements 
before "
+  + "shuffling.  Larger values may reduce the amount of data 
shuffled.")
+  @Default.Long(100)
+  Long getGroupingTableMaxSizeMb();
+
+  void setGroupingTableMaxSizeMb(Long value);

Review comment:
   Please add this option to 
https://github.com/apache/beam/blob/32c657955d05cb5c558def91c2d1e7edb80fb123/sdks/java/core/src/main/java/org/apache/beam/sdk/options/SdkHarnessOptions.java#L32
 instead of here.
   
   Please also plumb the value to 
https://github.com/apache/beam/blob/32c657955d05cb5c558def91c2d1e7edb80fb123/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/PrecombineGroupingTable.java#L44
   
   We should be able to use a primitive type (also long seems really large for 
specifying amount of MBs):
   ```suggestion
 int getGroupingTableMaxSizeMb();
   
 void setGroupingTableMaxSizeMb(int value);
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437437)
Time Spent: 0.5h  (was: 20m)

> Allow users to tune the grouping table size in batch dataflow pipelines
> ---
>
> Key: BEAM-10075
> URL: https://issues.apache.org/jira/browse/BEAM-10075
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The dataflow worker hard-codes the grouping table size to 100 MB.  We should 
> allow users to specify this as a pipeline parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10054) Direct Runner execution stalls with test pipeline

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10054?focusedWorklogId=437431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437431
 ]

ASF GitHub Bot logged work on BEAM-10054:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:48
Start Date: 27/May/20 03:48
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11777:
URL: https://github.com/apache/beam/pull/11777#issuecomment-634231656


   I think this fix should be OK. Could you add a test? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437431)
Time Spent: 1h 10m  (was: 1h)

> Direct Runner execution stalls with test pipeline
> -
>
> Key: BEAM-10054
> URL: https://issues.apache.org/jira/browse/BEAM-10054
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Internally, we have a test pipeline which runs with the DirectRunner. When 
> upgrading from 2.18.0 to 2.21.0 the test failed with the following exception:
> {noformat}
> tp = Exception('Monitor task detected a pipeline stall.',), value = None, tb 
> = None
> def raise_(tp, value=None, tb=None):
> """
> A function that matches the Python 2.x ``raise`` statement. This
> allows re-raising exceptions with the cls value and traceback on
> Python 2 and 3.
> """
> if value is not None and isinstance(tp, Exception):
> raise TypeError("instance exception may not have a separate 
> value")
> if value is not None:
> exc = tp(value)
> else:
> exc = tp
> if exc.__traceback__ is not tb:
> raise exc.with_traceback(tb)
> >   raise exc
> E   Exception: Monitor task detected a pipeline stall.
> {noformat}
> I was able to bisect the error. This commit introduced the failure: 
> https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731
> If the following conditions evaluates to False, the pipeline runs correctly: 
> https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731#diff-2bb845e226f3a97c0f0f737d0558c5dbR1273



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9916) Update IOs documentation links

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9916?focusedWorklogId=437423=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437423
 ]

ASF GitHub Bot logged work on BEAM-9916:


Author: ASF GitHub Bot
Created on: 27/May/20 03:47
Start Date: 27/May/20 03:47
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev removed a comment on pull request #11802:
URL: https://github.com/apache/beam/pull/11802#issuecomment-634093933







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437423)
Time Spent: 1h  (was: 50m)

> Update IOs documentation links
> --
>
> Key: BEAM-9916
> URL: https://issues.apache.org/jira/browse/BEAM-9916
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Alexey Romanenko
>Assignee: Ashwin Ramaswami
>Priority: P3
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently, on the webpage 
> [https://beam.apache.org/documentation/io/built-in/] , we link all IOs to 
> their code on github, which could be quite odd for users. After a 
> [discussion|https://lists.apache.org/thread.html/r2c7bb9a2e13f1b8c296930fc8a36edafe731f49b5182567989afcb98%40%3Cdev.beam.apache.org%3E]
>  on dev@, it was decided to link them to the latest doc reference (like 
> [https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileSystems.html]
>  for FileIO).
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9785) Add PostCommit suite for Python 3.8

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9785?focusedWorklogId=437424=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437424
 ]

ASF GitHub Bot logged work on BEAM-9785:


Author: ASF GitHub Bot
Created on: 27/May/20 03:47
Start Date: 27/May/20 03:47
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11788:
URL: https://github.com/apache/beam/pull/11788#issuecomment-634117541







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437424)
Time Spent: 50m  (was: 40m)

> Add PostCommit suite for Python 3.8
> ---
>
> Key: BEAM-9785
> URL: https://issues.apache.org/jira/browse/BEAM-9785
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: yoshiki obata
>Priority: P2
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add PostCommit suites for Python 3.8.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=437418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437418
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 27/May/20 03:46
Start Date: 27/May/20 03:46
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #11632:
URL: https://github.com/apache/beam/pull/11632#discussion_r430715044



##
File path: sdks/python/apache_beam/dataframe/convert.py
##
@@ -16,13 +16,23 @@
 
 from __future__ import absolute_import
 
+import typing
+
 import inspect
 
 from apache_beam import pvalue
 from apache_beam.dataframe import expressions
 from apache_beam.dataframe import frame_base
 from apache_beam.dataframe import transforms
 
+if typing.TYPE_CHECKING:
+  # pylint: disable=ungrouped-imports
+  from typing import Any
+  from typing import Dict
+  from typing import Tuple
+  from typing import Union

Review comment:
   Oh, yes. Thanks for catching this. Fixed. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437418)
Time Spent: 84h  (was: 83h 50m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: P2
>  Time Spent: 84h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=437413=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437413
 ]

ASF GitHub Bot logged work on BEAM-7390:


Author: ASF GitHub Bot
Created on: 27/May/20 03:46
Start Date: 27/May/20 03:46
Worklog Time Spent: 10m 
  Work Description: aaltay removed a comment on pull request #10165:
URL: https://github.com/apache/beam/pull/10165#issuecomment-634326058


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437413)
Time Spent: 11h 50m  (was: 11h 40m)

> Colab examples for aggregation transforms (Python)
> --
>
> Key: BEAM-7390
> URL: https://issues.apache.org/jira/browse/BEAM-7390
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: P3
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> Merge aggregation Colabs into the transform catalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8648) Euphoria: Deprecate OutputHints from public API

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8648?focusedWorklogId=437414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437414
 ]

ASF GitHub Bot logged work on BEAM-8648:


Author: ASF GitHub Bot
Created on: 27/May/20 03:46
Start Date: 27/May/20 03:46
Worklog Time Spent: 10m 
  Work Description: je-ik commented on pull request #10084:
URL: https://github.com/apache/beam/pull/10084#issuecomment-633861553


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437414)
Time Spent: 1h 50m  (was: 1h 40m)

> Euphoria: Deprecate OutputHints from public API
> ---
>
> Key: BEAM-8648
> URL: https://issues.apache.org/jira/browse/BEAM-8648
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-euphoria
>Reporter: David Morávek
>Assignee: David Morávek
>Priority: P2
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Deprecate OutputHints as they are no longer used during translation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10050) VideoIntelligenceIT.annotateVideoFromURINoContext is flaky

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10050?focusedWorklogId=437417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437417
 ]

ASF GitHub Bot logged work on BEAM-10050:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:46
Start Date: 27/May/20 03:46
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #11775:
URL: https://github.com/apache/beam/pull/11775#issuecomment-634121510


   Thanks @mwalenia!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437417)
Time Spent: 1h 20m  (was: 1h 10m)

> VideoIntelligenceIT.annotateVideoFromURINoContext is flaky
> --
>
> Key: BEAM-10050
> URL: https://issues.apache.org/jira/browse/BEAM-10050
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Michał Walenia
>Priority: P2
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I've seen this fail a few times in precommits [Example 
> failure|https://builds.apache.org/job/beam_PreCommit_Java_Commit/11515/]
> {code}
> java.lang.AssertionError: Annotate 
> video/ParDo(AnnotateVideoFromURI)/ParMultiDo(AnnotateVideoFromURI).output: 
> expected: but was:
>   at 
> org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:169)
>   at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:411)
>   at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:403)
>   at 
> org.apache.beam.sdk.extensions.ml.VideoIntelligenceIT.annotateVideoFromURINoContext(VideoIntelligenceIT.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10027) Support for Kotlin-based Beam Katas

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10027?focusedWorklogId=437419=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437419
 ]

ASF GitHub Bot logged work on BEAM-10027:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:46
Start Date: 27/May/20 03:46
Worklog Time Spent: 10m 
  Work Description: henryken commented on pull request #11761:
URL: https://github.com/apache/beam/pull/11761#issuecomment-633608920







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437419)
Time Spent: 9h 40m  (was: 9.5h)

> Support for Kotlin-based Beam Katas
> ---
>
> Key: BEAM-10027
> URL: https://issues.apache.org/jira/browse/BEAM-10027
> Project: Beam
>  Issue Type: Improvement
>  Components: katas
>Reporter: Rion Williams
>Assignee: Rion Williams
>Priority: P2
>   Original Estimate: 8h
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> Currently, there are a series of examples available demonstrating the use of 
> Apache Beam with Kotlin. It would be nice to have support for the same Beam 
> Katas that exist for Python, Go, and Java to also support Kotlin. 
> The port itself shouldn't be that involved since it can still target the JVM, 
> so it would likely just require the inclusion for Kotlin dependencies and a 
> conversion for all of the existing Java examples. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=437406=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437406
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 27/May/20 03:45
Start Date: 27/May/20 03:45
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11632:
URL: https://github.com/apache/beam/pull/11632#issuecomment-634286508







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437406)
Time Spent: 83h 50m  (was: 83h 40m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: P2
>  Time Spent: 83h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10073) Add dashboard for pubsub performance tests to grafana

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10073?focusedWorklogId=437397=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437397
 ]

ASF GitHub Bot logged work on BEAM-10073:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:44
Start Date: 27/May/20 03:44
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on a change in pull request #11809:
URL: https://github.com/apache/beam/pull/11809#discussion_r430015390



##
File path: 
.test-infra/metrics/grafana/dashboards/perftests_metrics/Python_IO_IT_Tests_Dataflow.json
##
@@ -35,7 +35,7 @@
   },
   "hiddenSeries": false,
   "id": 2,
-  "interval": "",
+  "interval": "15h",

Review comment:
   Is there any particular reason why "15"?

##
File path: 
.test-infra/metrics/grafana/dashboards/perftests_metrics/Python_IO_IT_Tests_Dataflow.json
##
@@ -261,6 +261,250 @@
 "align": false,
 "alignLevel": null
   }
+},
+{
+  "aliasColors": {},
+  "bars": false,
+  "cacheTimeout": null,
+  "dashLength": 10,
+  "dashes": false,
+  "datasource": "BeamInfluxDB",
+  "fill": 1,
+  "fillGradient": 0,
+  "gridPos": {
+"h": 9,
+"w": 12,
+"x": 0,
+"y": 9
+  },
+  "hiddenSeries": false,
+  "id": 4,
+  "interval": "15h",
+  "legend": {
+"avg": false,
+"current": false,
+"max": false,
+"min": false,
+"show": false,
+"total": false,
+"values": false
+  },
+  "lines": true,
+  "linewidth": 2,
+  "links": [],
+  "nullPointMode": "connected",
+  "options": {
+"dataLinks": []
+  },
+  "percentage": false,
+  "pluginVersion": "6.7.2",
+  "pointradius": 2,
+  "points": true,
+  "renderer": "flot",
+  "seriesOverrides": [],
+  "spaceLength": 10,
+  "stack": false,
+  "steppedLine": false,
+  "targets": [
+{
+  "alias": "read_time",
+  "groupBy": [
+{
+  "params": [
+"$__interval"
+  ],
+  "type": "time"
+}
+  ],
+  "measurement": "python_bqio_read",
+  "orderByTime": "ASC",
+  "policy": "default",
+  "query": "SELECT mean(\"value\") FROM \"python_psio_2GB_results\" 
WHERE \"metric\" = \"pubsub_io_perf_read_runtime\" AND $timeFilter GROUP BY 
time($__interval), \"metric\"",
+  "rawQuery": true,
+  "refId": "A",
+  "resultFormat": "time_series",
+  "select": [
+[
+  {
+"params": [
+  "value"
+],
+"type": "field"
+  },
+  {
+"params": [],
+"type": "mean"
+  }
+]
+  ],
+  "tags": []
+}
+  ],
+  "thresholds": [],
+  "timeFrom": null,
+  "timeRegions": [],
+  "timeShift": null,
+  "title": "Reading 2GB of data | Pubsub IO",

Review comment:
   We should also specify that the test is a streaming job and uses a 
native Dataflow IO. These are very important information for everyone seeing 
that chart.

##
File path: 
.test-infra/metrics/grafana/dashboards/perftests_metrics/Python_IO_IT_Tests_Dataflow.json
##
@@ -35,7 +35,7 @@
   },
   "hiddenSeries": false,
   "id": 2,
-  "interval": "",
+  "interval": "15h",

Review comment:
   "H 15 * * *" means the job is executed daily at 15:00 (3 p.m.) UTC. So 
the interval should be 24h

##
File path: 
.test-infra/metrics/grafana/dashboards/perftests_metrics/Python_IO_IT_Tests_Dataflow.json
##
@@ -35,7 +35,7 @@
   },
   "hiddenSeries": false,
   "id": 2,
-  "interval": "",
+  "interval": "15h",

Review comment:
   Take a look at the job history: 
https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PubsubIO_Performance_Test_Python/





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437397)
Remaining Estimate: 0h
Time Spent: 10m

> Add dashboard for pubsub performance tests to grafana
> -
>
> Key: BEAM-10073
> URL: https://issues.apache.org/jira/browse/BEAM-10073
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr 

[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=437402=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437402
 ]

ASF GitHub Bot logged work on BEAM-10097:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:44
Start Date: 27/May/20 03:44
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-634312177


   R: @mxm @tweise 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437402)
Time Spent: 20m  (was: 10m)

> Migrate PCollection views to use both iterable and multimap 
> materializations/access patterns
> 
>
> Key: BEAM-10097
> URL: https://issues.apache.org/jira/browse/BEAM-10097
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, 
> map, multimap.
> We should be using the primitive views (iterable, multimap) directly without 
> going through the naive mapping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10066) Support ValueProvider for RedisIO

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10066?focusedWorklogId=437391=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437391
 ]

ASF GitHub Bot logged work on BEAM-10066:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:43
Start Date: 27/May/20 03:43
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #11799:
URL: https://github.com/apache/beam/pull/11799#issuecomment-633858452


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437391)
Remaining Estimate: 40m  (was: 50m)
Time Spent: 20m  (was: 10m)

> Support ValueProvider for RedisIO
> -
>
> Key: BEAM-10066
> URL: https://issues.apache.org/jira/browse/BEAM-10066
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-redis
>Affects Versions: 2.20.0
>Reporter: Teije van Sloten
>Assignee: Teije van Sloten
>Priority: P3
>   Original Estimate: 1h
>  Time Spent: 20m
>  Remaining Estimate: 40m
>
> RedisIO doesn't have support for `ValueProvider` when setting up the 
> connection with Redis, therefore I cannot provide the connection at runtime 
> of the application only at compile time.
> This will involve wrapping the RedisConnectionConfiguration with 
> ValueProvider and ensuring that the building the configuration still supports 
> values without ValueProvider.
> E.g.:
>  
> {code:java}
> public abstract class RedisConnectionConfiguration implements Serializable {
>   abstract ValueProvider host();
>   abstract ValueProvider port();
>   @Nullable
>   abstract ValueProvider auth();
>   abstract ValueProvider timeout();
>   abstract ValueProvider ssl();
>   abstract Builder builder();
> }
>  
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10075) Allow users to tune the grouping table size in batch dataflow pipelines

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10075?focusedWorklogId=437394=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437394
 ]

ASF GitHub Bot logged work on BEAM-10075:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:43
Start Date: 27/May/20 03:43
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on a change in pull request 
#11811:
URL: https://github.com/apache/beam/pull/11811#discussion_r430834337



##
File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.java
##
@@ -177,6 +177,20 @@ public Dataflow create(PipelineOptions options) {
 
   void setNumberOfWorkerHarnessThreads(int value);
 
+  /**
+   * Size (in MB) of each grouping table used to pre-combine elements. If 
unset, defaults to 100 MB.
+   *
+   * CAUTION: If set too large, workers may run into OOM conditions more 
easily, each worker may
+   * have many grouping tables in-memory concurrently.
+   */
+  @Description(
+  "The size (in MB) of the grouping tables used to pre-combine elements 
before "
+  + "shuffling.  Larger values may reduce the amount of data 
shuffled.")
+  @Default.Long(100)
+  Long getGroupingTableMaxSizeMb();
+
+  void setGroupingTableMaxSizeMb(Long value);

Review comment:
    done!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437394)
Time Spent: 20m  (was: 10m)

> Allow users to tune the grouping table size in batch dataflow pipelines
> ---
>
> Key: BEAM-10075
> URL: https://issues.apache.org/jira/browse/BEAM-10075
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The dataflow worker hard-codes the grouping table size to 100 MB.  We should 
> allow users to specify this as a pipeline parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7520) DirectRunner timers are not strictly time ordered

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7520?focusedWorklogId=437379=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437379
 ]

ASF GitHub Bot logged work on BEAM-7520:


Author: ASF GitHub Bot
Created on: 27/May/20 03:42
Start Date: 27/May/20 03:42
Worklog Time Spent: 10m 
  Work Description: je-ik commented on a change in pull request #9190:
URL: https://github.com/apache/beam/pull/9190#discussion_r430680574



##
File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
##
@@ -3488,6 +3499,158 @@ public void onTimer(OutputReceiver r) {
   pipeline.run();
 }
 
+/** A test makes sure that an event time timers are correctly ordered. */
+@Test
+@Category({
+  ValidatesRunner.class,
+  UsesTimersInParDo.class,
+  UsesTestStream.class,
+  UsesStatefulParDo.class,
+  UsesStrictTimerOrdering.class
+})
+public void testEventTimeTimerOrdering() throws Exception {
+  final int numTestElements = 100;
+  final Instant now = new Instant(15000L);
+  TestStream.Builder> builder =
+  TestStream.create(KvCoder.of(StringUtf8Coder.of(), 
StringUtf8Coder.of()))
+  .advanceWatermarkTo(new Instant(0));
+
+  for (int i = 0; i < numTestElements; i++) {
+builder = builder.addElements(TimestampedValue.of(KV.of("dummy", "" + 
i), now.plus(i)));
+builder = builder.advanceWatermarkTo(now.plus(i / 10 * 10));
+  }
+
+  testEventTimeTimerOrderingWithInputPTransform(
+  now, numTestElements, builder.advanceWatermarkToInfinity());
+}
+
+/** A test makes sure that an event time timers are correctly ordered 
using Create transform. */
+@Test
+@Category({
+  ValidatesRunner.class,
+  UsesTimersInParDo.class,
+  UsesStatefulParDo.class,
+  UsesStrictTimerOrdering.class
+})
+public void testEventTimeTimerOrderingWithCreate() throws Exception {
+  final int numTestElements = 100;
+  final Instant now = new Instant(15000L);
+
+  List>> elements = new ArrayList<>();
+  for (int i = 0; i < numTestElements; i++) {
+elements.add(TimestampedValue.of(KV.of("dummy", "" + i), now.plus(i)));
+  }
+
+  testEventTimeTimerOrderingWithInputPTransform(

Review comment:
   The purpose of the test is to verify, that timers are fired in timestamp 
order, not that elements arrive ordered. The test already mimics some of the 
functionality of `@RequiresTimeSortedInput`, which should solve the issue you 
mention. This test should be a requirement (`UsesStrictTimerOrdering`) for a 
runer to be able to support `@RequiresTimeSortedInput`. Have you seen any 
issues with the test?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437379)
Time Spent: 21h  (was: 20h 50m)

> DirectRunner timers are not strictly time ordered
> -
>
> Key: BEAM-7520
> URL: https://issues.apache.org/jira/browse/BEAM-7520
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.13.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: P2
> Fix For: 2.17.0
>
>  Time Spent: 21h
>  Remaining Estimate: 0h
>
> Let's suppose we have the following situation:
>  - statful ParDo with two timers - timerA and timerB
>  - timerA is set for window.maxTimestamp() + 1
>  - timerB is set anywhere between  timerB.timestamp
>  - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE
> Then the order of timers is as follows (correct):
>  - timerB
>  - timerA
> But, if timerB sets another timer (say for timerB.timestamp + 1), then the 
> order of timers will be:
>  - timerB (timerB.timestamp)
>  - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE)
>  - timerB (timerB.timestamp + 1)
> Which is not ordered by timestamp. The reason for this is that when the input 
> watermark update is evaluated, the WatermarkManager,extractFiredTimers() will 
> produce both timerA and timerB. That would be correct, but when timerB sets 
> another timer, that breaks this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10077) using filename + hash instead of UUID for staging name

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10077?focusedWorklogId=437369=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437369
 ]

ASF GitHub Bot logged work on BEAM-10077:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:41
Start Date: 27/May/20 03:41
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #11813:
URL: https://github.com/apache/beam/pull/11813#issuecomment-633751018


   R: @robertwb, @chamikaramj 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437369)
Time Spent: 20m  (was: 10m)

> using filename + hash instead of UUID for staging name
> --
>
> Key: BEAM-10077
> URL: https://issues.apache.org/jira/browse/BEAM-10077
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Recent change BEAM-9383 disabled the artifact caching logic for GCS by object 
> names. Changing staging name generation from UUID to filename + hash will 
> re-enable the artifact caching so we can avoid re-uploading same artifact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7770) Add ReadAll transform for SolrIO

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7770?focusedWorklogId=437373=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437373
 ]

ASF GitHub Bot logged work on BEAM-7770:


Author: ASF GitHub Bot
Created on: 27/May/20 03:41
Start Date: 27/May/20 03:41
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev edited a comment on pull request #11357:
URL: https://github.com/apache/beam/pull/11357#issuecomment-634050866


   Since `readAll()` is a new public API method, please, add test(s) for it. 
Apart from that, it's fine for me.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437373)
Time Spent: 0.5h  (was: 20m)

> Add ReadAll transform for SolrIO
> 
>
> Key: BEAM-7770
> URL: https://issues.apache.org/jira/browse/BEAM-7770
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-solr
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: P3
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SolrIO already uses internally a composable approach but we need to expose an 
> explicit ReadAll transform that allows user to create reads in the middle of 
> the Pipeline to improve composability (e.g. Reads in the middle of a 
> Pipeline).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9971) beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9971?focusedWorklogId=437366=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437366
 ]

ASF GitHub Bot logged work on BEAM-9971:


Author: ASF GitHub Bot
Created on: 27/May/20 03:40
Start Date: 27/May/20 03:40
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11784:
URL: https://github.com/apache/beam/pull/11784#issuecomment-634315982







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437366)
Time Spent: 40m  (was: 0.5h)

> beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)
> --
>
> Key: BEAM-9971
> URL: https://issues.apache.org/jira/browse/BEAM-9971
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: portability-spark
> Fix For: 2.22.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This happens sporadically. One time the issue affected 14 tests; another time 
> it affected 112 tests.
> It looks like the ClassLoader is sometimes contaminated with jars from 
> /tmp/spark-*, which have already been deleted.
> 20/05/21 13:54:27 ERROR org.apache.beam.runners.jobsubmission.JobInvocation: 
> Error during job invocation 
> pipelinetest0testidentitytransform-kcweaver-0521205426-f4de06c4_51aced77-c171-4842-be1f-6c79226872e5.
> java.util.ServiceConfigurationError: 
> org.apache.beam.runners.core.construction.NativeTransforms$IsNativeTransform: 
> Error reading configuration file
>   at java.util.ServiceLoader.fail(ServiceLoader.java:232)
>   at java.util.ServiceLoader.parse(ServiceLoader.java:309)
>   at java.util.ServiceLoader.access$200(ServiceLoader.java:185)
>   at 
> java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:357)
>   at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393)
>   at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474)
>   at 
> org.apache.beam.runners.core.construction.NativeTransforms.isNative(NativeTransforms.java:50)
>   at 
> org.apache.beam.runners.core.construction.graph.QueryablePipeline.isPrimitiveTransform(QueryablePipeline.java:189)
>   at 
> org.apache.beam.runners.core.construction.graph.QueryablePipeline.getPrimitiveTransformIds(QueryablePipeline.java:137)
>   at 
> org.apache.beam.runners.core.construction.graph.QueryablePipeline.forPrimitivesIn(QueryablePipeline.java:90)
>   at 
> org.apache.beam.runners.core.construction.graph.GreedyPipelineFuser.(GreedyPipelineFuser.java:67)
>   at 
> org.apache.beam.runners.core.construction.graph.GreedyPipelineFuser.fuse(GreedyPipelineFuser.java:90)
>   at 
> org.apache.beam.runners.spark.SparkPipelineRunner.run(SparkPipelineRunner.java:94)
>   at 
> org.apache.beam.runners.jobsubmission.JobInvocation.runPipeline(JobInvocation.java:83)
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: 
> /tmp/spark-5e8a8a9a-22d6-48d5-b398-1a4f5582d954/userFiles-ec74cac1-21b5-4127-b764-540636b733d0/beam-runners-core-construction-java-2.22.0-SNAPSHOT-tests.jar
>  (No such file or directory)
>   at java.util.zip.ZipFile.open(Native Method)
>   at java.util.zip.ZipFile.(ZipFile.java:230)
>   at java.util.zip.ZipFile.(ZipFile.java:155)
>   at java.util.jar.JarFile.(JarFile.java:167)
>   at java.util.jar.JarFile.(JarFile.java:104)
>   at sun.net.www.protocol.jar.URLJarFile.(URLJarFile.java:93)
>   at sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:69)
>   at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:84)
>   at 
> 

[jira] [Work logged] (BEAM-9926) Certain code examples for programming guide are not showing

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9926?focusedWorklogId=437367=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437367
 ]

ASF GitHub Bot logged work on BEAM-9926:


Author: ASF GitHub Bot
Created on: 27/May/20 03:40
Start Date: 27/May/20 03:40
Worklog Time Spent: 10m 
  Work Description: pabloem edited a comment on pull request #11790:
URL: https://github.com/apache/beam/pull/11790#issuecomment-634198356


   retest this please
   (note this is just a command to jenkins to run tests on your changes)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437367)
Time Spent: 1h 10m  (was: 1h)

> Certain code examples for programming guide are not showing
> ---
>
> Key: BEAM-9926
> URL: https://issues.apache.org/jira/browse/BEAM-9926
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Ashwin Ramaswami
>Priority: P2
> Attachments: screenshot-1.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Seems like the code examples for the entire State section are missing. See 
> [https://beam.apache.org/documentation/programming-guide/#state-and-timers]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10003) Need two PR to submit snippets to website

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10003?focusedWorklogId=437332=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437332
 ]

ASF GitHub Bot logged work on BEAM-10003:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:35
Start Date: 27/May/20 03:35
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #11796:
URL: https://github.com/apache/beam/pull/11796#discussion_r430541307



##
File path: website/www/site/layouts/shortcodes/code_sample.html
##
@@ -0,0 +1,15 @@
+{{/*
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+   http://www.apache.org/licenses/LICENSE-2.0
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License. See accompanying LICENSE file.
+*/}}
+{{/*
+  This shortcode is used to fetch a piece of code with tags from Beam code.
+  There should be no breaklines here to make sure the string results do not 
get impacts of newlines.
+*/}}{{ $tag := .Get 1 }}{{ $path := printf "code_samples/%s" (replaceRE "/" 
"_" (.Get 0)) }}{{ $data := readFile $path }}{{ $matchRegex := printf 
"%s%s%s%s%s" "\\[START " $tag "]\n[\\s\\S]*?\n.*\\[END " $tag "]" }}{{ $match 
:= index (findRE $matchRegex $data) 0 }}{{ $lines := split $match "\n" }}{{ 
$lineCount := len $lines }}{{ $cleanedLines := $lines | first (sub $lineCount 
1) | last (sub $lineCount 2) }}{{ $firstLine := index $cleanedLines 0 }}{{ 
$numberOfWhitespaces := index (findRE "^\\s*" $firstLine) 0 | len }}{{ 
$unindentRegex := printf "%s%d%s" "^\\s{" $numberOfWhitespaces "}" }}{{ 
$unindentedLines := apply $cleanedLines "replaceRE" $unindentRegex "" "." }}{{ 
$result := delimit $unindentedLines "\n" }}{{ print $result }}

Review comment:
   What does this part do:
   
   {{ $path := printf "code_samples/%s" (replaceRE "/" "_" (.Get 0)) }}
   
   I have 2 questions. What is this path? And why there is a need to replace / 
with _





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437332)
Time Spent: 40m  (was: 0.5h)

> Need two PR to submit snippets to website
> -
>
> Key: BEAM-10003
> URL: https://issues.apache.org/jira/browse/BEAM-10003
> Project: Beam
>  Issue Type: New Feature
>  Components: website
>Reporter: Reza ardeshir rokni
>Assignee: Aizhamal Nurmamat kyzy
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Looks like build_github_samples.sh uses code already on the repo to build 
> local serving;
> do
>   fileName=$(echo "$url" | sed -e 's/\//_/g')
>   curl -o "$DIST_DIR"/"$fileName" 
> "[https://raw.githubusercontent.com|https://raw.githubusercontent.com/]$url;
> done
> So when tying to test locally, the code needs to have already be in Beam. 
> Ideally the script should make use of local code when building so :
> 1- Easier to  build & test changes.
> 2- No need to raise two PR for what is a single change
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9916) Update IOs documentation links

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9916?focusedWorklogId=437322=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437322
 ]

ASF GitHub Bot logged work on BEAM-9916:


Author: ASF GitHub Bot
Created on: 27/May/20 03:35
Start Date: 27/May/20 03:35
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #11802:
URL: https://github.com/apache/beam/pull/11802#issuecomment-634093933







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437322)
Time Spent: 50m  (was: 40m)

> Update IOs documentation links
> --
>
> Key: BEAM-9916
> URL: https://issues.apache.org/jira/browse/BEAM-9916
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Alexey Romanenko
>Assignee: Ashwin Ramaswami
>Priority: P3
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently, on the webpage 
> [https://beam.apache.org/documentation/io/built-in/] , we link all IOs to 
> their code on github, which could be quite odd for users. After a 
> [discussion|https://lists.apache.org/thread.html/r2c7bb9a2e13f1b8c296930fc8a36edafe731f49b5182567989afcb98%40%3Cdev.beam.apache.org%3E]
>  on dev@, it was decided to link them to the latest doc reference (like 
> [https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileSystems.html]
>  for FileIO).
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10078) uniquify Dataflow specific jars when staging

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10078?focusedWorklogId=437330=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437330
 ]

ASF GitHub Bot logged work on BEAM-10078:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:35
Start Date: 27/May/20 03:35
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11814:
URL: https://github.com/apache/beam/pull/11814#issuecomment-634085087


   Did you mean review https://github.com/apache/beam/pull/11813 and not this 
one ? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437330)
Remaining Estimate: 0h
Time Spent: 10m

> uniquify Dataflow specific jars when staging
> 
>
> Key: BEAM-10078
> URL: https://issues.apache.org/jira/browse/BEAM-10078
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After BEAM-9383, Dataflow specific jars (dataflow-worker.jar, windmill_main) 
> could be overwritten when two or more jobs share the same staging location. 
> Since they 1) should have specific predefined names AND 2) should have unique 
> location for avoiding collision, they need special handling when staging 
> artifacts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10054) Direct Runner execution stalls with test pipeline

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10054?focusedWorklogId=437318=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437318
 ]

ASF GitHub Bot logged work on BEAM-10054:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:34
Start Date: 27/May/20 03:34
Worklog Time Spent: 10m 
  Work Description: mxm removed a comment on pull request #11777:
URL: https://github.com/apache/beam/pull/11777#issuecomment-632572299


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437318)
Time Spent: 1h  (was: 50m)

> Direct Runner execution stalls with test pipeline
> -
>
> Key: BEAM-10054
> URL: https://issues.apache.org/jira/browse/BEAM-10054
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Internally, we have a test pipeline which runs with the DirectRunner. When 
> upgrading from 2.18.0 to 2.21.0 the test failed with the following exception:
> {noformat}
> tp = Exception('Monitor task detected a pipeline stall.',), value = None, tb 
> = None
> def raise_(tp, value=None, tb=None):
> """
> A function that matches the Python 2.x ``raise`` statement. This
> allows re-raising exceptions with the cls value and traceback on
> Python 2 and 3.
> """
> if value is not None and isinstance(tp, Exception):
> raise TypeError("instance exception may not have a separate 
> value")
> if value is not None:
> exc = tp(value)
> else:
> exc = tp
> if exc.__traceback__ is not tb:
> raise exc.with_traceback(tb)
> >   raise exc
> E   Exception: Monitor task detected a pipeline stall.
> {noformat}
> I was able to bisect the error. This commit introduced the failure: 
> https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731
> If the following conditions evaluates to False, the pipeline runs correctly: 
> https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731#diff-2bb845e226f3a97c0f0f737d0558c5dbR1273



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9971) beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9971?focusedWorklogId=437314=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437314
 ]

ASF GitHub Bot logged work on BEAM-9971:


Author: ASF GitHub Bot
Created on: 27/May/20 03:33
Start Date: 27/May/20 03:33
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11784:
URL: https://github.com/apache/beam/pull/11784#issuecomment-634157424







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437314)
Time Spent: 0.5h  (was: 20m)

> beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)
> --
>
> Key: BEAM-9971
> URL: https://issues.apache.org/jira/browse/BEAM-9971
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: portability-spark
> Fix For: 2.22.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This happens sporadically. One time the issue affected 14 tests; another time 
> it affected 112 tests.
> It looks like the ClassLoader is sometimes contaminated with jars from 
> /tmp/spark-*, which have already been deleted.
> 20/05/21 13:54:27 ERROR org.apache.beam.runners.jobsubmission.JobInvocation: 
> Error during job invocation 
> pipelinetest0testidentitytransform-kcweaver-0521205426-f4de06c4_51aced77-c171-4842-be1f-6c79226872e5.
> java.util.ServiceConfigurationError: 
> org.apache.beam.runners.core.construction.NativeTransforms$IsNativeTransform: 
> Error reading configuration file
>   at java.util.ServiceLoader.fail(ServiceLoader.java:232)
>   at java.util.ServiceLoader.parse(ServiceLoader.java:309)
>   at java.util.ServiceLoader.access$200(ServiceLoader.java:185)
>   at 
> java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:357)
>   at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393)
>   at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474)
>   at 
> org.apache.beam.runners.core.construction.NativeTransforms.isNative(NativeTransforms.java:50)
>   at 
> org.apache.beam.runners.core.construction.graph.QueryablePipeline.isPrimitiveTransform(QueryablePipeline.java:189)
>   at 
> org.apache.beam.runners.core.construction.graph.QueryablePipeline.getPrimitiveTransformIds(QueryablePipeline.java:137)
>   at 
> org.apache.beam.runners.core.construction.graph.QueryablePipeline.forPrimitivesIn(QueryablePipeline.java:90)
>   at 
> org.apache.beam.runners.core.construction.graph.GreedyPipelineFuser.(GreedyPipelineFuser.java:67)
>   at 
> org.apache.beam.runners.core.construction.graph.GreedyPipelineFuser.fuse(GreedyPipelineFuser.java:90)
>   at 
> org.apache.beam.runners.spark.SparkPipelineRunner.run(SparkPipelineRunner.java:94)
>   at 
> org.apache.beam.runners.jobsubmission.JobInvocation.runPipeline(JobInvocation.java:83)
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: 
> /tmp/spark-5e8a8a9a-22d6-48d5-b398-1a4f5582d954/userFiles-ec74cac1-21b5-4127-b764-540636b733d0/beam-runners-core-construction-java-2.22.0-SNAPSHOT-tests.jar
>  (No such file or directory)
>   at java.util.zip.ZipFile.open(Native Method)
>   at java.util.zip.ZipFile.(ZipFile.java:230)
>   at java.util.zip.ZipFile.(ZipFile.java:155)
>   at java.util.jar.JarFile.(JarFile.java:167)
>   at java.util.jar.JarFile.(JarFile.java:104)
>   at sun.net.www.protocol.jar.URLJarFile.(URLJarFile.java:93)
>   at sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:69)
>   at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:84)
>   at 
> 

[jira] [Work logged] (BEAM-10093) Add ZetaSQL Nexmark variant

2020-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10093?focusedWorklogId=437317=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-437317
 ]

ASF GitHub Bot logged work on BEAM-10093:
-

Author: ASF GitHub Bot
Created on: 27/May/20 03:33
Start Date: 27/May/20 03:33
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #11820:
URL: https://github.com/apache/beam/pull/11820#issuecomment-634292920







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 437317)
Remaining Estimate: 0h
Time Spent: 10m

> Add ZetaSQL Nexmark variant
> ---
>
> Key: BEAM-10093
> URL: https://issues.apache.org/jira/browse/BEAM-10093
> Project: Beam
>  Issue Type: New Feature
>  Components: testing-nexmark
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: P2
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Most queries will be identical, but best to simply stay decoupled, so this is 
> a copy/paste/modify job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >