[jira] [Updated] (BEAM-9361) NPE When putting Avro record with enum through SqlTransform

2020-05-21 Thread MAKSIM TSYGAN (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MAKSIM TSYGAN updated BEAM-9361:

Attachment: image-2020-05-21-10-36-00-375.png

> NPE When putting Avro record with enum through SqlTransform
> ---
>
> Key: BEAM-9361
> URL: https://issues.apache.org/jira/browse/BEAM-9361
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.19.0
>Reporter: Niels Basjes
>Priority: P2
> Attachments: image-2020-05-21-10-36-00-375.png
>
>
> I ran into this problem when trying to put my Avro records through the 
> SqlTransform.
> I was able to reduce the reproduction path to the code below.
> This code fails on my machine (using Beam 2.19.0) with the following 
> NullPointerException
> {code:java}
>  org.apache.beam.sdk.extensions.sql.impl.ParseException: Unable to parse 
> query SELECT name, direction FROM InputStreamat 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:175)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103)
>   at 
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:125)
>   at 
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:83)
>   at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:539)
>   at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:490)
>   at 
> org.apache.beam.sdk.values.PCollectionTuple.apply(PCollectionTuple.java:261)
>   at com.bol.analytics.m2.TestAvro2SQL.testAvro2SQL(TestAvro2SQL.java:99)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
>   at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
>   at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
> Caused by: 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.ValidationException:
>  java.lang.NullPointerException
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.prepare.PlannerImpl.validate(PlannerImpl.java:217)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:144)
>   ... 31 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:45)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.toRelDataType(CalciteUtils.java:280)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.toRelDataType(CalciteUtils.java:287)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.lambda$toCalciteRowType$0(CalciteUtils.java:261)
>   at 
> java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
>   at java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:581)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils

[jira] [Updated] (BEAM-10045) Reduce logging related to ratelimitExceeded error for BQ sink when performing streaming inserts

2020-05-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-10045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-10045:

Status: Open  (was: Triage Needed)

> Reduce logging related to ratelimitExceeded error for BQ sink when performing 
> streaming inserts
> ---
>
> Key: BEAM-10045
> URL: https://issues.apache.org/jira/browse/BEAM-10045
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: P2
>
> These errors are usually temporary and pipelines may recover.
>  
> We can consider not logging till we backoff for a certain amount of time here.
> [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L792]
>  
> cc: [~reuvenlax]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10053) Timers exception on "Job Drain" while using stateful beam processing in global window

2020-05-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-10053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-10053:
---

Assignee: (was: Aizhamal Nurmamat kyzy)

> Timers exception on "Job Drain" while using stateful beam processing in 
> global window
> -
>
> Key: BEAM-10053
> URL: https://issues.apache.org/jira/browse/BEAM-10053
> Project: Beam
>  Issue Type: Bug
>  Components: beam-community, runner-dataflow, sdk-java-core
>Affects Versions: 2.19.0
>Reporter: MOHIL
>Priority: P2
>
> Hello,
>  
> I have a use case where I have two sets of PCollections (RecordA and RecordB) 
> coming from a real time streaming source like Kafka.
>  
> Both Records are correlated with a common key, let's say KEY.
>  
> The purpose is to enrich RecordA with RecordB's data for which I am using 
> CoGbByKey. Since RecordA and RecordB for a common key can come within 1-2 
> minutes of event time, I am maintaining a sliding window for both records and 
> then do CoGpByKey for both PCollections.
>  
> The sliding windows that will find both RecordA and RecordB for a common key 
> KEY, will emit enriched output. Now, since multiple sliding windows can emit 
> the same output, I finally remove duplicate results by feeding aforementioned 
> outputs to a global window where I maintain a state to check whether output 
> has already been processed or not. Since it is a global window, I maintain a 
> Timer on state (for GC) to let it expire after 10 minutes have elapsed since 
> state has been written.
>  
> This is working perfectly fine w.r.t the expected results. However, I am 
> unable to stop job gracefully i.e. Drain the job gracefully. I see following 
> exception:
>  
> java.lang.IllegalStateException: 
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn@16b089a6 received state 
> cleanup timer for window 
> org.apache.beam.sdk.transforms.windowing.GlobalWindow@29ca0210 that is before 
> the appropriate cleanup time 294248-01-24T04:00:54.776Z
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:842)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processSystemTimer(SimpleParDoFn.java:384)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.access$700(SimpleParDoFn.java:73)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn$TimerType$2.processTimer(SimpleParDoFn.java:444)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:467)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:354)
> org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:52)
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1316)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1049)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> java.lang.IllegalStateException: 
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn@59902a10 received state 
> cleanup timer for window 
> org.apache.beam.sdk.transforms.windowing.GlobalWindow@29ca0210 that is before 
> the appropriate cleanup time 294248-01-24T04:00:54.776Z
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:842)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processSystemTimer(SimpleParDoFn.java:384)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.access$700(SimpleParDoFn.java:73)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn$TimerType$2.processTimer(SimpleParDoFn.java:444)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:467)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:354)
> org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:52)
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1316)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1049)
> java.util

[jira] [Updated] (BEAM-10053) Timers exception on "Job Drain" while using stateful beam processing in global window

2020-05-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-10053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-10053:

Component/s: (was: beam-community)

> Timers exception on "Job Drain" while using stateful beam processing in 
> global window
> -
>
> Key: BEAM-10053
> URL: https://issues.apache.org/jira/browse/BEAM-10053
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-core
>Affects Versions: 2.19.0
>Reporter: MOHIL
>Priority: P2
>
> Hello,
>  
> I have a use case where I have two sets of PCollections (RecordA and RecordB) 
> coming from a real time streaming source like Kafka.
>  
> Both Records are correlated with a common key, let's say KEY.
>  
> The purpose is to enrich RecordA with RecordB's data for which I am using 
> CoGbByKey. Since RecordA and RecordB for a common key can come within 1-2 
> minutes of event time, I am maintaining a sliding window for both records and 
> then do CoGpByKey for both PCollections.
>  
> The sliding windows that will find both RecordA and RecordB for a common key 
> KEY, will emit enriched output. Now, since multiple sliding windows can emit 
> the same output, I finally remove duplicate results by feeding aforementioned 
> outputs to a global window where I maintain a state to check whether output 
> has already been processed or not. Since it is a global window, I maintain a 
> Timer on state (for GC) to let it expire after 10 minutes have elapsed since 
> state has been written.
>  
> This is working perfectly fine w.r.t the expected results. However, I am 
> unable to stop job gracefully i.e. Drain the job gracefully. I see following 
> exception:
>  
> java.lang.IllegalStateException: 
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn@16b089a6 received state 
> cleanup timer for window 
> org.apache.beam.sdk.transforms.windowing.GlobalWindow@29ca0210 that is before 
> the appropriate cleanup time 294248-01-24T04:00:54.776Z
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:842)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processSystemTimer(SimpleParDoFn.java:384)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.access$700(SimpleParDoFn.java:73)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn$TimerType$2.processTimer(SimpleParDoFn.java:444)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:467)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:354)
> org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:52)
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1316)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1049)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> java.lang.IllegalStateException: 
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn@59902a10 received state 
> cleanup timer for window 
> org.apache.beam.sdk.transforms.windowing.GlobalWindow@29ca0210 that is before 
> the appropriate cleanup time 294248-01-24T04:00:54.776Z
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:842)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processSystemTimer(SimpleParDoFn.java:384)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.access$700(SimpleParDoFn.java:73)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn$TimerType$2.processTimer(SimpleParDoFn.java:444)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:467)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:354)
> org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:52)
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1316)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1049)
> java.util.concurrent.ThreadPoolExecut

[jira] [Commented] (BEAM-10053) Timers exception on "Job Drain" while using stateful beam processing in global window

2020-05-21 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-10053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112872#comment-17112872
 ] 

Ismaël Mejía commented on BEAM-10053:
-

[~lcwik] or [~reuvenlax] can you please take a look or assign to someone who 
can.

> Timers exception on "Job Drain" while using stateful beam processing in 
> global window
> -
>
> Key: BEAM-10053
> URL: https://issues.apache.org/jira/browse/BEAM-10053
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-core
>Affects Versions: 2.19.0
>Reporter: MOHIL
>Priority: P2
>
> Hello,
>  
> I have a use case where I have two sets of PCollections (RecordA and RecordB) 
> coming from a real time streaming source like Kafka.
>  
> Both Records are correlated with a common key, let's say KEY.
>  
> The purpose is to enrich RecordA with RecordB's data for which I am using 
> CoGbByKey. Since RecordA and RecordB for a common key can come within 1-2 
> minutes of event time, I am maintaining a sliding window for both records and 
> then do CoGpByKey for both PCollections.
>  
> The sliding windows that will find both RecordA and RecordB for a common key 
> KEY, will emit enriched output. Now, since multiple sliding windows can emit 
> the same output, I finally remove duplicate results by feeding aforementioned 
> outputs to a global window where I maintain a state to check whether output 
> has already been processed or not. Since it is a global window, I maintain a 
> Timer on state (for GC) to let it expire after 10 minutes have elapsed since 
> state has been written.
>  
> This is working perfectly fine w.r.t the expected results. However, I am 
> unable to stop job gracefully i.e. Drain the job gracefully. I see following 
> exception:
>  
> java.lang.IllegalStateException: 
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn@16b089a6 received state 
> cleanup timer for window 
> org.apache.beam.sdk.transforms.windowing.GlobalWindow@29ca0210 that is before 
> the appropriate cleanup time 294248-01-24T04:00:54.776Z
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:842)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processSystemTimer(SimpleParDoFn.java:384)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.access$700(SimpleParDoFn.java:73)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn$TimerType$2.processTimer(SimpleParDoFn.java:444)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:467)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:354)
> org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:52)
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1316)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1049)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> java.lang.IllegalStateException: 
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn@59902a10 received state 
> cleanup timer for window 
> org.apache.beam.sdk.transforms.windowing.GlobalWindow@29ca0210 that is before 
> the appropriate cleanup time 294248-01-24T04:00:54.776Z
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:842)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processSystemTimer(SimpleParDoFn.java:384)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.access$700(SimpleParDoFn.java:73)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn$TimerType$2.processTimer(SimpleParDoFn.java:444)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:467)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:354)
> org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:52)
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1316)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
> org.apache.beam.runners.dataflow.worker.Stream

[jira] [Updated] (BEAM-10044) Programming Guide has curly quotes in code samples

2020-05-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-10044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-10044:

Status: Open  (was: Triage Needed)

> Programming Guide has curly quotes in code samples
> --
>
> Key: BEAM-10044
> URL: https://issues.apache.org/jira/browse/BEAM-10044
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Ashwin Ramaswami
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Curly quotes should be converted to regular quotes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10050) VideoIntelligenceIT.annotateVideoFromURINoContext is flaky

2020-05-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-10050:

Status: Open  (was: Triage Needed)

> VideoIntelligenceIT.annotateVideoFromURINoContext is flaky
> --
>
> Key: BEAM-10050
> URL: https://issues.apache.org/jira/browse/BEAM-10050
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Michał Walenia
>Priority: P2
>
> I've seen this fail a few times in precommits [Example 
> failure](VideoIntelligenceIT.annotateVideoFromURINoContext)
> {code}
> java.lang.AssertionError: Annotate 
> video/ParDo(AnnotateVideoFromURI)/ParMultiDo(AnnotateVideoFromURI).output: 
> expected: but was:
>   at 
> org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:169)
>   at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:411)
>   at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:403)
>   at 
> org.apache.beam.sdk.extensions.ml.VideoIntelligenceIT.annotateVideoFromURINoContext(VideoIntelligenceIT.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-10044) Programming Guide has curly quotes in code samples

2020-05-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-10044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-10044.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> Programming Guide has curly quotes in code samples
> --
>
> Key: BEAM-10044
> URL: https://issues.apache.org/jira/browse/BEAM-10044
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Ashwin Ramaswami
>Priority: P2
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Curly quotes should be converted to regular quotes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10044) [website] Programming Guide has curly quotes in code samples

2020-05-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-10044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-10044:

Summary: [website] Programming Guide has curly quotes in code samples  
(was: Programming Guide has curly quotes in code samples)

> [website] Programming Guide has curly quotes in code samples
> 
>
> Key: BEAM-10044
> URL: https://issues.apache.org/jira/browse/BEAM-10044
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Ashwin Ramaswami
>Priority: P2
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Curly quotes should be converted to regular quotes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9361) NPE When putting Avro record with enum through SqlTransform

2020-05-21 Thread MAKSIM TSYGAN (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112875#comment-17112875
 ] 

MAKSIM TSYGAN commented on BEAM-9361:
-

I think that need add Enum in SQL type mapping in 
org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils class for work it in 
 SQL:

!image-2020-05-21-10-36-00-375.png!

 

I think that also need add mapping for Enum type for zeta-sql.

> NPE When putting Avro record with enum through SqlTransform
> ---
>
> Key: BEAM-9361
> URL: https://issues.apache.org/jira/browse/BEAM-9361
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.19.0
>Reporter: Niels Basjes
>Priority: P2
> Attachments: image-2020-05-21-10-36-00-375.png
>
>
> I ran into this problem when trying to put my Avro records through the 
> SqlTransform.
> I was able to reduce the reproduction path to the code below.
> This code fails on my machine (using Beam 2.19.0) with the following 
> NullPointerException
> {code:java}
>  org.apache.beam.sdk.extensions.sql.impl.ParseException: Unable to parse 
> query SELECT name, direction FROM InputStreamat 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:175)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103)
>   at 
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:125)
>   at 
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:83)
>   at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:539)
>   at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:490)
>   at 
> org.apache.beam.sdk.values.PCollectionTuple.apply(PCollectionTuple.java:261)
>   at com.bol.analytics.m2.TestAvro2SQL.testAvro2SQL(TestAvro2SQL.java:99)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
>   at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
>   at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
> Caused by: 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.ValidationException:
>  java.lang.NullPointerException
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.prepare.PlannerImpl.validate(PlannerImpl.java:217)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:144)
>   ... 31 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:45)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.toRelDataType(CalciteUtils.java:280)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.toRelDataType(CalciteUtils.java:287)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.lambda$toCalciteRowType$0(CalciteUtils.j

[jira] [Comment Edited] (BEAM-9361) NPE When putting Avro record with enum through SqlTransform

2020-05-21 Thread MAKSIM TSYGAN (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112875#comment-17112875
 ] 

MAKSIM TSYGAN edited comment on BEAM-9361 at 5/21/20, 7:43 AM:
---

I think that need add SQL type mapping  in 
org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils class for work it in 
 SQL:

!image-2020-05-21-10-36-00-375.png!

 

I think that also need add mapping for Enum type for zeta-sql.


was (Author: maksim.tsygan):
I think that need add Enum in SQL type mapping in 
org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils class for work it in 
 SQL:

!image-2020-05-21-10-36-00-375.png!

 

I think that also need add mapping for Enum type for zeta-sql.

> NPE When putting Avro record with enum through SqlTransform
> ---
>
> Key: BEAM-9361
> URL: https://issues.apache.org/jira/browse/BEAM-9361
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.19.0
>Reporter: Niels Basjes
>Priority: P2
> Attachments: image-2020-05-21-10-36-00-375.png
>
>
> I ran into this problem when trying to put my Avro records through the 
> SqlTransform.
> I was able to reduce the reproduction path to the code below.
> This code fails on my machine (using Beam 2.19.0) with the following 
> NullPointerException
> {code:java}
>  org.apache.beam.sdk.extensions.sql.impl.ParseException: Unable to parse 
> query SELECT name, direction FROM InputStreamat 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:175)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103)
>   at 
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:125)
>   at 
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:83)
>   at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:539)
>   at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:490)
>   at 
> org.apache.beam.sdk.values.PCollectionTuple.apply(PCollectionTuple.java:261)
>   at com.bol.analytics.m2.TestAvro2SQL.testAvro2SQL(TestAvro2SQL.java:99)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
>   at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
>   at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
> Caused by: 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.ValidationException:
>  java.lang.NullPointerException
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.prepare.PlannerImpl.validate(PlannerImpl.java:217)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:144)
>   ... 31 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:45)
>   at 
> or

[jira] [Comment Edited] (BEAM-9361) NPE When putting Avro record with enum through SqlTransform

2020-05-21 Thread MAKSIM TSYGAN (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112875#comment-17112875
 ] 

MAKSIM TSYGAN edited comment on BEAM-9361 at 5/21/20, 7:44 AM:
---

I think that need add SQL type mapping  in 
org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils class for work it in 
 SQL:

!image-2020-05-21-10-36-00-375.png!

 

Also we need add mapping for zeta-sql.


was (Author: maksim.tsygan):
I think that need add SQL type mapping  in 
org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils class for work it in 
 SQL:

!image-2020-05-21-10-36-00-375.png!

 

I think that also need add mapping for Enum type for zeta-sql.

> NPE When putting Avro record with enum through SqlTransform
> ---
>
> Key: BEAM-9361
> URL: https://issues.apache.org/jira/browse/BEAM-9361
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.19.0
>Reporter: Niels Basjes
>Priority: P2
> Attachments: image-2020-05-21-10-36-00-375.png
>
>
> I ran into this problem when trying to put my Avro records through the 
> SqlTransform.
> I was able to reduce the reproduction path to the code below.
> This code fails on my machine (using Beam 2.19.0) with the following 
> NullPointerException
> {code:java}
>  org.apache.beam.sdk.extensions.sql.impl.ParseException: Unable to parse 
> query SELECT name, direction FROM InputStreamat 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:175)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103)
>   at 
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:125)
>   at 
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:83)
>   at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:539)
>   at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:490)
>   at 
> org.apache.beam.sdk.values.PCollectionTuple.apply(PCollectionTuple.java:261)
>   at com.bol.analytics.m2.TestAvro2SQL.testAvro2SQL(TestAvro2SQL.java:99)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
>   at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
>   at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
> Caused by: 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.ValidationException:
>  java.lang.NullPointerException
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.prepare.PlannerImpl.validate(PlannerImpl.java:217)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:144)
>   ... 31 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:45)
>   at 
> org.apache.beam.sdk.extensions.sq

[jira] [Work logged] (BEAM-10052) check hash and avoid duplicates when uploading artifact in Python Dataflow Runner

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10052?focusedWorklogId=435875&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435875
 ]

ASF GitHub Bot logged work on BEAM-10052:
-

Author: ASF GitHub Bot
Created on: 21/May/20 08:10
Start Date: 21/May/20 08:10
Worklog Time Spent: 10m 
  Work Description: ihji opened a new pull request #11771:
URL: https://github.com/apache/beam/pull/11771


   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStruct

[jira] [Work logged] (BEAM-10052) check hash and avoid duplicates when uploading artifact in Python Dataflow Runner

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10052?focusedWorklogId=435876&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435876
 ]

ASF GitHub Bot logged work on BEAM-10052:
-

Author: ASF GitHub Bot
Created on: 21/May/20 08:10
Start Date: 21/May/20 08:10
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #11771:
URL: https://github.com/apache/beam/pull/11771#issuecomment-631950394


   R: @chamikaramj 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 435876)
Time Spent: 20m  (was: 10m)

> check hash and avoid duplicates when uploading artifact in Python Dataflow 
> Runner
> -
>
> Key: BEAM-10052
> URL: https://issues.apache.org/jira/browse/BEAM-10052
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> xlang pipeline could have many duplicated jars. it would be great if we check 
> hash and avoid duplicate uploads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9383) Staging Dataflow artifacts from environment

2020-05-21 Thread Heejong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112909#comment-17112909
 ] 

Heejong Lee commented on BEAM-9383:
---

[https://github.com/apache/beam/pull/11771] could mitigate the problem by 
removing duplicated artifacts from multiple environments.

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P0
> Fix For: 2.22.0
>
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10043) Grammar issues / typos in language-switch.js comments

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10043?focusedWorklogId=435883&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435883
 ]

ASF GitHub Bot logged work on BEAM-10043:
-

Author: ASF GitHub Bot
Created on: 21/May/20 08:48
Start Date: 21/May/20 08:48
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #11760:
URL: https://github.com/apache/beam/pull/11760#issuecomment-631967204


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 435883)
Remaining Estimate: 0h
Time Spent: 10m

> Grammar issues / typos in language-switch.js comments
> -
>
> Key: BEAM-10043
> URL: https://issues.apache.org/jira/browse/BEAM-10043
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Ashwin Ramaswami
>Priority: P3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Grammar issues / typos in language-switch.js comments



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9994) Cannot create a virtualenv using Python 3.8 on Jenkins machines

2020-05-21 Thread Kamil Wasilewski (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112986#comment-17112986
 ] 

Kamil Wasilewski commented on BEAM-9994:


Yes, I did that manually. 
If someone has instructions on how to update the master VM image, please share 
it.

> Cannot create a virtualenv using Python 3.8 on Jenkins machines
> ---
>
> Key: BEAM-9994
> URL: https://issues.apache.org/jira/browse/BEAM-9994
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: P2
>
> Command: *virtualenv --python /usr/bin/python3.8 env*
> Output:
> {noformat}
> Running virtualenv with interpreter /usr/bin/python3.8
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.5/dist-packages/virtualenv.py", line 22, in 
> 
> import distutils.spawn
> ModuleNotFoundError: No module named 'distutils.spawn'
> {noformat}
> Example test affected: 
> https://builds.apache.org/job/beam_PreCommit_PythonFormatter_Commit/1723/console



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10043) Grammar issues / typos in language-switch.js comments

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10043?focusedWorklogId=435888&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435888
 ]

ASF GitHub Bot logged work on BEAM-10043:
-

Author: ASF GitHub Bot
Created on: 21/May/20 09:09
Start Date: 21/May/20 09:09
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #11760:
URL: https://github.com/apache/beam/pull/11760#issuecomment-631976285


   Thanks @epicfaace!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 435888)
Time Spent: 20m  (was: 10m)

> Grammar issues / typos in language-switch.js comments
> -
>
> Key: BEAM-10043
> URL: https://issues.apache.org/jira/browse/BEAM-10043
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Ashwin Ramaswami
>Priority: P3
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Grammar issues / typos in language-switch.js comments



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10043) Grammar issues / typos in language-switch.js comments

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10043?focusedWorklogId=435889&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435889
 ]

ASF GitHub Bot logged work on BEAM-10043:
-

Author: ASF GitHub Bot
Created on: 21/May/20 09:10
Start Date: 21/May/20 09:10
Worklog Time Spent: 10m 
  Work Description: kamilwu merged pull request #11760:
URL: https://github.com/apache/beam/pull/11760


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 435889)
Time Spent: 0.5h  (was: 20m)

> Grammar issues / typos in language-switch.js comments
> -
>
> Key: BEAM-10043
> URL: https://issues.apache.org/jira/browse/BEAM-10043
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Ashwin Ramaswami
>Priority: P3
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Grammar issues / typos in language-switch.js comments



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10043) Grammar issues / typos in language-switch.js comments

2020-05-21 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10043:

Status: Open  (was: Triage Needed)

> Grammar issues / typos in language-switch.js comments
> -
>
> Key: BEAM-10043
> URL: https://issues.apache.org/jira/browse/BEAM-10043
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Ashwin Ramaswami
>Priority: P3
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Grammar issues / typos in language-switch.js comments



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-10043) Grammar issues / typos in language-switch.js comments

2020-05-21 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski resolved BEAM-10043.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> Grammar issues / typos in language-switch.js comments
> -
>
> Key: BEAM-10043
> URL: https://issues.apache.org/jira/browse/BEAM-10043
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Ashwin Ramaswami
>Priority: P3
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Grammar issues / typos in language-switch.js comments



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9722) Add batch SnowflakeIO.Read to Java SDK

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9722?focusedWorklogId=435898&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435898
 ]

ASF GitHub Bot logged work on BEAM-9722:


Author: ASF GitHub Bot
Created on: 21/May/20 09:41
Start Date: 21/May/20 09:41
Worklog Time Spent: 10m 
  Work Description: kkucharc commented on pull request #11360:
URL: https://github.com/apache/beam/pull/11360#issuecomment-631990753


   Run Python2_PVR_Flink PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 435898)
Time Spent: 8h  (was: 7h 50m)

> Add batch SnowflakeIO.Read to Java SDK
> --
>
> Key: BEAM-9722
> URL: https://issues.apache.org/jira/browse/BEAM-9722
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Kasia Kucharczyk
>Assignee: Dariusz Aniszewski
>Priority: P2
>  Time Spent: 8h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9930) Add announcement for Beam Summit Digital 2020 to the blog

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9930?focusedWorklogId=435923&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435923
 ]

ASF GitHub Bot logged work on BEAM-9930:


Author: ASF GitHub Bot
Created on: 21/May/20 10:26
Start Date: 21/May/20 10:26
Worklog Time Spent: 10m 
  Work Description: mxm opened a new pull request #11772:
URL: https://github.com/apache/beam/pull/11772


   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.ap

[jira] [Work logged] (BEAM-9722) Add batch SnowflakeIO.Read to Java SDK

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9722?focusedWorklogId=435937&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435937
 ]

ASF GitHub Bot logged work on BEAM-9722:


Author: ASF GitHub Bot
Created on: 21/May/20 10:52
Start Date: 21/May/20 10:52
Worklog Time Spent: 10m 
  Work Description: DariuszAniszewski commented on pull request #11360:
URL: https://github.com/apache/beam/pull/11360#issuecomment-632019864


   Just a small comment about the force-push from above - it was mistakenly 
done, then reverted. HEAD of this branch is still on **3ba192a** and comment is 
a leftover.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 435937)
Time Spent: 8h 10m  (was: 8h)

> Add batch SnowflakeIO.Read to Java SDK
> --
>
> Key: BEAM-9722
> URL: https://issues.apache.org/jira/browse/BEAM-9722
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Kasia Kucharczyk
>Assignee: Dariusz Aniszewski
>Priority: P2
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9723) [Java] PTransform that connects to Cloud DLP deidentification service

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9723?focusedWorklogId=435944&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435944
 ]

ASF GitHub Bot logged work on BEAM-9723:


Author: ASF GitHub Bot
Created on: 21/May/20 11:51
Start Date: 21/May/20 11:51
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on a change in pull request #11566:
URL: https://github.com/apache/beam/pull/11566#discussion_r428605560



##
File path: 
sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/BatchRequestForDLP.java
##
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.ml;
+
+import com.google.privacy.dlp.v2.Table;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.atomic.AtomicInteger;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.state.BagState;
+import org.apache.beam.sdk.state.StateSpec;
+import org.apache.beam.sdk.state.StateSpecs;
+import org.apache.beam.sdk.state.TimeDomain;
+import org.apache.beam.sdk.state.Timer;
+import org.apache.beam.sdk.state.TimerSpec;
+import org.apache.beam.sdk.state.TimerSpecs;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.values.KV;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * DoFn batching the input PCollection into bigger requests in order to better 
utilize the Cloud DLP
+ * service.
+ */
+@Experimental
+class BatchRequestForDLP extends DoFn, KV>> {
+  public static final Logger LOG = 
LoggerFactory.getLogger(BatchRequestForDLP.class);
+
+  private final Counter numberOfRowsBagged =
+  Metrics.counter(BatchRequestForDLP.class, "numberOfRowsBagged");
+  private final Integer batchSize;
+
+  @StateId("elementsBag")
+  private final StateSpec>> elementsBag = 
StateSpecs.bag();
+
+  @TimerId("eventTimer")
+  private final TimerSpec eventTimer = TimerSpecs.timer(TimeDomain.EVENT_TIME);
+
+  public BatchRequestForDLP(Integer batchSize) {
+this.batchSize = batchSize;
+  }
+
+  @ProcessElement
+  public void process(
+  @Element KV element,
+  @StateId("elementsBag") BagState> elementsBag,
+  @TimerId("eventTimer") Timer eventTimer,
+  BoundedWindow w) {
+elementsBag.add(element);
+eventTimer.set(w.maxTimestamp());
+  }
+
+  @OnTimer("eventTimer")
+  public void onTimer(
+  @StateId("elementsBag") BagState> elementsBag,
+  OutputReceiver>> output) {
+String key = elementsBag.read().iterator().next().getKey();
+AtomicInteger bufferSize = new AtomicInteger();
+List rows = new ArrayList<>();
+elementsBag
+.read()
+.forEach(
+element -> {
+  int elementSize = element.getValue().getSerializedSize();
+  boolean clearBuffer = bufferSize.intValue() + elementSize > 
batchSize;
+  if (clearBuffer) {
+numberOfRowsBagged.inc(rows.size());
+LOG.debug("Clear Buffer {} , Key {}", bufferSize.intValue(), 
element.getKey());

Review comment:
   Sure, I'll move it and add units. Thanks!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 435944)
Time Spent: 4h 20m  (was: 4h 10m)

> [Java] PTransform that connects to Cloud DLP deidentification service
> -
>
> Key: BEAM-9723
> URL: https://issues.apache.org/jira/browse/BEAM-9723
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Michał Walenia
>As

[jira] [Work logged] (BEAM-9723) [Java] PTransform that connects to Cloud DLP deidentification service

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9723?focusedWorklogId=435947&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435947
 ]

ASF GitHub Bot logged work on BEAM-9723:


Author: ASF GitHub Bot
Created on: 21/May/20 11:53
Start Date: 21/May/20 11:53
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on a change in pull request #11566:
URL: https://github.com/apache/beam/pull/11566#discussion_r428606587



##
File path: 
sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/DLPTextOperationsIT.java
##
@@ -0,0 +1,154 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.ml;
+
+import static org.junit.Assert.assertTrue;
+
+import com.google.privacy.dlp.v2.CharacterMaskConfig;
+import com.google.privacy.dlp.v2.DeidentifyConfig;
+import com.google.privacy.dlp.v2.DeidentifyContentResponse;
+import com.google.privacy.dlp.v2.Finding;
+import com.google.privacy.dlp.v2.InfoType;
+import com.google.privacy.dlp.v2.InfoTypeTransformations;
+import com.google.privacy.dlp.v2.InspectConfig;
+import com.google.privacy.dlp.v2.InspectContentResponse;
+import com.google.privacy.dlp.v2.Likelihood;
+import com.google.privacy.dlp.v2.PrimitiveTransformation;
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.beam.sdk.extensions.gcp.options.GcpOptions;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+@RunWith(JUnit4.class)
+public class DLPTextOperationsIT {
+  @Rule public TestPipeline testPipeline = TestPipeline.create();
+
+  private static final String IDENTIFYING_TEXT = "mary@example.com";
+  private static InfoType emailAddress = 
InfoType.newBuilder().setName("EMAIL_ADDRESS").build();
+  private static final InspectConfig inspectConfig =
+  InspectConfig.newBuilder()
+  .addInfoTypes(emailAddress)
+  .setMinLikelihood(Likelihood.LIKELY)
+  .build();
+
+  @Test
+  public void inspectsText() {
+String projectId = 
testPipeline.getOptions().as(GcpOptions.class).getProject();
+PCollection> inspectionResult =
+testPipeline
+.apply(Create.of(KV.of("", IDENTIFYING_TEXT)))
+.apply(
+DLPInspectText.newBuilder()
+.setBatchSize(52400)

Review comment:
   @santhh Is it 52400 or 524000? It should be the latter, since we're 
setting the limit to 524 kb, I think.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 435947)
Time Spent: 4.5h  (was: 4h 20m)

> [Java] PTransform that connects to Cloud DLP deidentification service
> -
>
> Key: BEAM-9723
> URL: https://issues.apache.org/jira/browse/BEAM-9723
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: P2
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1589?focusedWorklogId=435964&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435964
 ]

ASF GitHub Bot logged work on BEAM-1589:


Author: ASF GitHub Bot
Created on: 21/May/20 12:44
Start Date: 21/May/20 12:44
Worklog Time Spent: 10m 
  Work Description: harriskistpay opened a new pull request #11773:
URL: https://github.com/apache/beam/pull/11773


   Updated CHANGES.md file
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit

[jira] [Work logged] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1589?focusedWorklogId=435965&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435965
 ]

ASF GitHub Bot logged work on BEAM-1589:


Author: ASF GitHub Bot
Created on: 21/May/20 12:45
Start Date: 21/May/20 12:45
Worklog Time Spent: 10m 
  Work Description: harriskistpay closed pull request #11773:
URL: https://github.com/apache/beam/pull/11773


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 435965)
Time Spent: 13h  (was: 12h 50m)

> Add OnWindowExpiration method to Stateful DoFn
> --
>
> Key: BEAM-1589
> URL: https://issues.apache.org/jira/browse/BEAM-1589
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core, sdk-java-core
>Reporter: Jingsong Lee
>Assignee: Rehman Murad Ali
>Priority: P2
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> See BEAM-1517
> This allows the user to do some work before the state's garbage collection.
> It seems kind of annoying, but on the other hand forgetting to set a final 
> timer to flush state is probably data loss most of the time.
> FlinkRunner does this work very simply, but other runners, such as 
> DirectRunner, need to traverse all the states to do this, and maybe it's a 
> little hard.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1589?focusedWorklogId=435966&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435966
 ]

ASF GitHub Bot logged work on BEAM-1589:


Author: ASF GitHub Bot
Created on: 21/May/20 12:47
Start Date: 21/May/20 12:47
Worklog Time Spent: 10m 
  Work Description: rehmanmuradali opened a new pull request #11774:
URL: https://github.com/apache/beam/pull/11774


   Updated CHANGES.md
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Jav

[jira] [Work logged] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1589?focusedWorklogId=435967&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435967
 ]

ASF GitHub Bot logged work on BEAM-1589:


Author: ASF GitHub Bot
Created on: 21/May/20 12:48
Start Date: 21/May/20 12:48
Worklog Time Spent: 10m 
  Work Description: rehmanmuradali commented on pull request #11774:
URL: https://github.com/apache/beam/pull/11774#issuecomment-632067025


   R: @reuvenlax 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 435967)
Time Spent: 13h 20m  (was: 13h 10m)

> Add OnWindowExpiration method to Stateful DoFn
> --
>
> Key: BEAM-1589
> URL: https://issues.apache.org/jira/browse/BEAM-1589
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core, sdk-java-core
>Reporter: Jingsong Lee
>Assignee: Rehman Murad Ali
>Priority: P2
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> See BEAM-1517
> This allows the user to do some work before the state's garbage collection.
> It seems kind of annoying, but on the other hand forgetting to set a final 
> timer to flush state is probably data loss most of the time.
> FlinkRunner does this work very simply, but other runners, such as 
> DirectRunner, need to traverse all the states to do this, and maybe it's a 
> little hard.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10050) VideoIntelligenceIT.annotateVideoFromURINoContext is flaky

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10050?focusedWorklogId=435973&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435973
 ]

ASF GitHub Bot logged work on BEAM-10050:
-

Author: ASF GitHub Bot
Created on: 21/May/20 13:03
Start Date: 21/May/20 13:03
Worklog Time Spent: 10m 
  Work Description: mwalenia opened a new pull request #11775:
URL: https://github.com/apache/beam/pull/11775


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 435973)
Remaining Estimate: 0h
Time Spent: 10m

> VideoIntelligenceIT.annotateVideoFromURINoContext is flaky
> --
>
> Key: BEAM-10050
> URL: https://issues.apache.org/jira/browse/BEAM-10050
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Michał Walenia
>Priority: P2
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've seen this fail a few times in precommits [Example 
> failure](VideoIntelligenceIT.annotateVideoFromURINoContext)
> {code}
> java.lang.AssertionError: Annotate 
> video/ParDo(AnnotateVideoFromURI)/ParMultiDo(AnnotateVideoFromURI).output: 
> expected: but was:
>   at 
> org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:169)
>   at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:411)
>   at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:403)
>   at 
> org.apache.beam.sdk.extensions.ml.VideoIntelligenceIT.annotateVideoFromURINoContext(VideoIntelligenceIT.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9723) [Java] PTransform that connects to Cloud DLP deidentification service

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9723?focusedWorklogId=435996&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435996
 ]

ASF GitHub Bot logged work on BEAM-9723:


Author: ASF GitHub Bot
Created on: 21/May/20 13:47
Start Date: 21/May/20 13:47
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on a change in pull request #11566:
URL: https://github.com/apache/beam/pull/11566#discussion_r428661780



##
File path: 
sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPDeidentifyText.java
##
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.ml;
+
+import com.google.auto.value.AutoValue;
+import com.google.cloud.dlp.v2.DlpServiceClient;
+import com.google.privacy.dlp.v2.ContentItem;
+import com.google.privacy.dlp.v2.DeidentifyConfig;
+import com.google.privacy.dlp.v2.DeidentifyContentRequest;
+import com.google.privacy.dlp.v2.DeidentifyContentResponse;
+import com.google.privacy.dlp.v2.FieldId;
+import com.google.privacy.dlp.v2.InspectConfig;
+import com.google.privacy.dlp.v2.ProjectName;
+import com.google.privacy.dlp.v2.Table;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+
+/**
+ * A {@link PTransform} connecting to Cloud DLP and deidentifying text 
according to provided
+ * settings. The transform supports both CSV formatted input data and 
unstructured input.
+ *
+ * If the csvHeader property is set, csvDelimiter also should be, else the 
results will be
+ * incorrect. If csvHeader is not set, input is assumed to be unstructured.
+ *
+ * Either inspectTemplateName (String) or inspectConfig {@link 
InspectConfig} need to be set. The
+ * situation is the same with deidentifyTemplateName and deidentifyConfig 
({@link DeidentifyConfig}.
+ *
+ * Batch size defines how big are batches sent to DLP at once in bytes.
+ *
+ * The transform outputs {@link KV} of {@link String} (eg. filename) and 
{@link
+ * DeidentifyContentResponse}, which will contain {@link Table} of results for 
the user to consume.
+ */
+@Experimental
+@AutoValue
+public abstract class DLPDeidentifyText
+extends PTransform<
+PCollection>, PCollection>> {
+
+  @Nullable
+  public abstract String inspectTemplateName();
+
+  @Nullable
+  public abstract String deidentifyTemplateName();
+
+  @Nullable
+  public abstract InspectConfig inspectConfig();
+
+  @Nullable
+  public abstract DeidentifyConfig deidentifyConfig();
+
+  @Nullable
+  public abstract PCollectionView> csvHeader();
+
+  @Nullable
+  public abstract String csvDelimiter();
+
+  public abstract Integer batchSize();
+
+  public abstract String projectId();
+
+  @AutoValue.Builder
+  public abstract static class Builder {
+public abstract Builder setInspectTemplateName(String inspectTemplateName);
+
+public abstract Builder setCsvHeader(PCollectionView> 
csvHeader);
+
+public abstract Builder setCsvDelimiter(String delimiter);
+
+public abstract Builder setBatchSize(Integer batchSize);
+
+public abstract Builder setProjectId(String projectId);
+
+public abstract Builder setDeidentifyTemplateName(String 
deidentifyTemplateName);
+
+public abstract Builder setInspectConfig(InspectConfig inspectConfig);
+
+public abstract Builder setDeidentifyConfig(DeidentifyConfig 
deidentifyConfig);
+
+public abstract DLPDeidentifyText build();
+  }
+
+  public static DLPDeidentifyText.Builder newBuilder() {
+return new AutoValue_DLPDeidentifyText.Builder();
+  }
+
+  /**
+   * The transform batches the contents of input PCollection and then calls 
Cloud DLP service to
+   * perform the deidentification.
+   *
+   * @

[jira] [Work logged] (BEAM-9723) [Java] PTransform that connects to Cloud DLP deidentification service

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9723?focusedWorklogId=436000&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436000
 ]

ASF GitHub Bot logged work on BEAM-9723:


Author: ASF GitHub Bot
Created on: 21/May/20 13:55
Start Date: 21/May/20 13:55
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on a change in pull request #11566:
URL: https://github.com/apache/beam/pull/11566#discussion_r428666183



##
File path: 
sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/MapStringToDlpRow.java
##
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.ml;
+
+import com.google.privacy.dlp.v2.Table;
+import com.google.privacy.dlp.v2.Value;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Objects;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.values.KV;
+
+class MapStringToDlpRow extends DoFn, KV> {
+  private final String delimiter;
+
+  public MapStringToDlpRow(String delimiter) {
+this.delimiter = delimiter;
+  }
+
+  @ProcessElement
+  public void processElement(ProcessContext context) {

Review comment:
   Ok, will do!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436000)
Time Spent: 4h 50m  (was: 4h 40m)

> [Java] PTransform that connects to Cloud DLP deidentification service
> -
>
> Key: BEAM-9723
> URL: https://issues.apache.org/jira/browse/BEAM-9723
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: P2
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9679) Core Transforms | Go SDK Code Katas

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9679?focusedWorklogId=436008&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436008
 ]

ASF GitHub Bot logged work on BEAM-9679:


Author: ASF GitHub Bot
Created on: 21/May/20 14:04
Start Date: 21/May/20 14:04
Worklog Time Spent: 10m 
  Work Description: henryken commented on pull request #11734:
URL: https://github.com/apache/beam/pull/11734#issuecomment-632104571


   Thanks @damondouglas!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436008)
Time Spent: 2h 10m  (was: 2h)

> Core Transforms | Go SDK Code Katas
> ---
>
> Key: BEAM-9679
> URL: https://issues.apache.org/jira/browse/BEAM-9679
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: P2
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> A kata devoted to core beam transforms patterns after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
>  where the take away is an individual's ability to master the following using 
> an Apache Beam pipeline using the Golang SDK.
>  * Branching
>  * 
> [CoGroupByKey|[https://github.com/damondouglas/beam/tree/BEAM-9679-core-transform-groupbykey]]
>  * Combine
>  * Composite Transform
>  * DoFn Additional Parameters
>  * Flatten
>  * GroupByKey
>  * [Map|[https://github.com/apache/beam/pull/11564]]
>  * Partition
>  * Side Input



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9722) Add batch SnowflakeIO.Read to Java SDK

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9722?focusedWorklogId=436012&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436012
 ]

ASF GitHub Bot logged work on BEAM-9722:


Author: ASF GitHub Bot
Created on: 21/May/20 14:08
Start Date: 21/May/20 14:08
Worklog Time Spent: 10m 
  Work Description: kkucharc commented on pull request #11360:
URL: https://github.com/apache/beam/pull/11360#issuecomment-632106649


   I retested failing test - probably the previous one was timeouting. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436012)
Time Spent: 8h 20m  (was: 8h 10m)

> Add batch SnowflakeIO.Read to Java SDK
> --
>
> Key: BEAM-9722
> URL: https://issues.apache.org/jira/browse/BEAM-9722
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Kasia Kucharczyk
>Assignee: Dariusz Aniszewski
>Priority: P2
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9723) [Java] PTransform that connects to Cloud DLP deidentification service

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9723?focusedWorklogId=436013&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436013
 ]

ASF GitHub Bot logged work on BEAM-9723:


Author: ASF GitHub Bot
Created on: 21/May/20 14:12
Start Date: 21/May/20 14:12
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on a change in pull request #11566:
URL: https://github.com/apache/beam/pull/11566#discussion_r428676595



##
File path: 
sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPReidentifyText.java
##
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.ml;
+
+import com.google.auto.value.AutoValue;
+import com.google.cloud.dlp.v2.DlpServiceClient;
+import com.google.privacy.dlp.v2.ContentItem;
+import com.google.privacy.dlp.v2.DeidentifyConfig;
+import com.google.privacy.dlp.v2.FieldId;
+import com.google.privacy.dlp.v2.InspectConfig;
+import com.google.privacy.dlp.v2.ProjectName;
+import com.google.privacy.dlp.v2.ReidentifyContentRequest;
+import com.google.privacy.dlp.v2.ReidentifyContentResponse;
+import com.google.privacy.dlp.v2.Table;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A {@link PTransform} connecting to Cloud DLP and inspecting text for 
identifying data according
+ * to provided settings.
+ *
+ * Either inspectTemplateName (String) or inspectConfig {@link 
InspectConfig} need to be set, the
+ * same goes for reidentifyTemplateName or reidentifyConfig.
+ *
+ * Batch size defines how big are batches sent to DLP at once in bytes.
+ */
+@Experimental
+@AutoValue
+public abstract class DLPReidentifyText
+extends PTransform<
+PCollection>, PCollection>> {
+
+  public static final Logger LOG = 
LoggerFactory.getLogger(DLPInspectText.class);
+
+  public static final Integer DLP_PAYLOAD_LIMIT = 52400;
+
+  @Nullable
+  public abstract String inspectTemplateName();
+
+  @Nullable
+  public abstract String reidentifyTemplateName();
+
+  @Nullable
+  public abstract InspectConfig inspectConfig();
+
+  @Nullable
+  public abstract DeidentifyConfig reidentifyConfig();
+
+  @Nullable
+  public abstract String csvDelimiter();
+
+  @Nullable
+  public abstract PCollectionView> csvHeaders();
+
+  public abstract Integer batchSize();
+
+  public abstract String projectId();
+
+  @AutoValue.Builder
+  public abstract static class Builder {
+public abstract Builder setInspectTemplateName(String inspectTemplateName);
+
+public abstract Builder setInspectConfig(InspectConfig inspectConfig);
+
+public abstract Builder setReidentifyConfig(DeidentifyConfig 
deidentifyConfig);
+
+public abstract Builder setReidentifyTemplateName(String 
deidentifyTemplateName);
+
+public abstract Builder setBatchSize(Integer batchSize);
+
+public abstract Builder setCsvHeaders(PCollectionView> 
csvHeaders);
+
+public abstract Builder setCsvDelimiter(String delimiter);
+
+public abstract Builder setProjectId(String projectId);
+
+public abstract DLPReidentifyText build();
+  }
+
+  public static DLPReidentifyText.Builder newBuilder() {
+return new AutoValue_DLPReidentifyText.Builder();
+  }
+
+  @Override
+  public PCollection> expand(
+  PCollection> input) {
+return input
+.apply(ParDo.of(new MapStringToDlpRow(csvDelimiter(
+.apply("Batch Contents", ParDo.of(new BatchRequestForDLP(batchSize(
+.apply(
+"DLPDeidentify",
+ParDo.of(
+new ReidentifyText(
+projectId(),
+inspectTemp

[jira] [Work logged] (BEAM-9723) [Java] PTransform that connects to Cloud DLP deidentification service

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9723?focusedWorklogId=436016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436016
 ]

ASF GitHub Bot logged work on BEAM-9723:


Author: ASF GitHub Bot
Created on: 21/May/20 14:17
Start Date: 21/May/20 14:17
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on a change in pull request #11566:
URL: https://github.com/apache/beam/pull/11566#discussion_r428679955



##
File path: 
sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPDeidentifyText.java
##
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.ml;
+
+import com.google.auto.value.AutoValue;
+import com.google.cloud.dlp.v2.DlpServiceClient;
+import com.google.privacy.dlp.v2.ContentItem;
+import com.google.privacy.dlp.v2.DeidentifyConfig;
+import com.google.privacy.dlp.v2.DeidentifyContentRequest;
+import com.google.privacy.dlp.v2.DeidentifyContentResponse;
+import com.google.privacy.dlp.v2.FieldId;
+import com.google.privacy.dlp.v2.InspectConfig;
+import com.google.privacy.dlp.v2.ProjectName;
+import com.google.privacy.dlp.v2.Table;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+
+/**
+ * A {@link PTransform} connecting to Cloud DLP and deidentifying text 
according to provided
+ * settings. The transform supports both CSV formatted input data and 
unstructured input.
+ *
+ * If the csvHeader property is set, csvDelimiter also should be, else the 
results will be
+ * incorrect. If csvHeader is not set, input is assumed to be unstructured.
+ *
+ * Either inspectTemplateName (String) or inspectConfig {@link 
InspectConfig} need to be set. The
+ * situation is the same with deidentifyTemplateName and deidentifyConfig 
({@link DeidentifyConfig}.
+ *
+ * Batch size defines how big are batches sent to DLP at once in bytes.
+ *
+ * The transform outputs {@link KV} of {@link String} (eg. filename) and 
{@link
+ * DeidentifyContentResponse}, which will contain {@link Table} of results for 
the user to consume.
+ */
+@Experimental
+@AutoValue
+public abstract class DLPDeidentifyText
+extends PTransform<
+PCollection>, PCollection>> {
+
+  @Nullable
+  public abstract String inspectTemplateName();
+
+  @Nullable
+  public abstract String deidentifyTemplateName();
+
+  @Nullable
+  public abstract InspectConfig inspectConfig();
+
+  @Nullable
+  public abstract DeidentifyConfig deidentifyConfig();
+
+  @Nullable
+  public abstract PCollectionView> csvHeader();
+
+  @Nullable
+  public abstract String csvDelimiter();
+
+  public abstract Integer batchSize();
+
+  public abstract String projectId();
+
+  @AutoValue.Builder
+  public abstract static class Builder {
+public abstract Builder setInspectTemplateName(String inspectTemplateName);
+
+public abstract Builder setCsvHeader(PCollectionView> 
csvHeader);
+
+public abstract Builder setCsvDelimiter(String delimiter);
+
+public abstract Builder setBatchSize(Integer batchSize);
+
+public abstract Builder setProjectId(String projectId);
+
+public abstract Builder setDeidentifyTemplateName(String 
deidentifyTemplateName);
+
+public abstract Builder setInspectConfig(InspectConfig inspectConfig);
+
+public abstract Builder setDeidentifyConfig(DeidentifyConfig 
deidentifyConfig);
+
+public abstract DLPDeidentifyText build();
+  }
+

Review comment:
   Thanks!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries

[jira] [Work logged] (BEAM-9723) [Java] PTransform that connects to Cloud DLP deidentification service

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9723?focusedWorklogId=436017&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436017
 ]

ASF GitHub Bot logged work on BEAM-9723:


Author: ASF GitHub Bot
Created on: 21/May/20 14:20
Start Date: 21/May/20 14:20
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #11566:
URL: https://github.com/apache/beam/pull/11566#issuecomment-632113183


   @tysonjh , @santhh Thanks for the review, I added the first batch of fixes. 
I'll take care of the Javadocs tomorrow.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436017)
Time Spent: 5h 20m  (was: 5h 10m)

> [Java] PTransform that connects to Cloud DLP deidentification service
> -
>
> Key: BEAM-9723
> URL: https://issues.apache.org/jira/browse/BEAM-9723
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: P2
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9421) AI Platform pipeline patterns

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9421?focusedWorklogId=436019&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436019
 ]

ASF GitHub Bot logged work on BEAM-9421:


Author: ASF GitHub Bot
Created on: 21/May/20 14:36
Start Date: 21/May/20 14:36
Worklog Time Spent: 10m 
  Work Description: kamilwu opened a new pull request #11776:
URL: https://github.com/apache/beam/pull/11776


   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStruct

[jira] [Assigned] (BEAM-5863) Automate Community Metrics infrastructure deployment

2020-05-21 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski reassigned BEAM-5863:
--

Assignee: Kamil Wasilewski  (was: Michał Walenia)

> Automate Community Metrics infrastructure deployment
> 
>
> Key: BEAM-5863
> URL: https://issues.apache.org/jira/browse/BEAM-5863
> Project: Beam
>  Issue Type: Sub-task
>  Components: community-metrics, project-management
>Reporter: Scott Wegner
>Assignee: Kamil Wasilewski
>Priority: P3
>  Labels: community-metrics
>
> Currently the deployment process for the production Community Metrics stack 
> is manual (documented 
> [here|https://cwiki.apache.org/confluence/display/BEAM/Community+Metrics]). 
> If we end up having to deploy more than a few times a year, it would be nice 
> to automate these steps.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10053) Timers exception on "Job Drain" while using stateful beam processing in global window

2020-05-21 Thread Reuven Lax (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113286#comment-17113286
 ] 

Reuven Lax commented on BEAM-10053:
---

Something appears off here:

*org.apache.beam.sdk.transforms.windowing.GlobalWindow@29ca0210 that is before 
the appropriate cleanup time 294248-01-24T04:00:54.776Z*

However the end of the GlobalWindow should be 9223371950454775, which is 
294247-01-09T04:00:54+00:00

I don't see any allowed lateness set on the global window.

Luke, do you have any idea why there is almost a 1 year difference between 
these timestamps?

> Timers exception on "Job Drain" while using stateful beam processing in 
> global window
> -
>
> Key: BEAM-10053
> URL: https://issues.apache.org/jira/browse/BEAM-10053
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-core
>Affects Versions: 2.19.0
>Reporter: MOHIL
>Priority: P2
>
> Hello,
>  
> I have a use case where I have two sets of PCollections (RecordA and RecordB) 
> coming from a real time streaming source like Kafka.
>  
> Both Records are correlated with a common key, let's say KEY.
>  
> The purpose is to enrich RecordA with RecordB's data for which I am using 
> CoGbByKey. Since RecordA and RecordB for a common key can come within 1-2 
> minutes of event time, I am maintaining a sliding window for both records and 
> then do CoGpByKey for both PCollections.
>  
> The sliding windows that will find both RecordA and RecordB for a common key 
> KEY, will emit enriched output. Now, since multiple sliding windows can emit 
> the same output, I finally remove duplicate results by feeding aforementioned 
> outputs to a global window where I maintain a state to check whether output 
> has already been processed or not. Since it is a global window, I maintain a 
> Timer on state (for GC) to let it expire after 10 minutes have elapsed since 
> state has been written.
>  
> This is working perfectly fine w.r.t the expected results. However, I am 
> unable to stop job gracefully i.e. Drain the job gracefully. I see following 
> exception:
>  
> java.lang.IllegalStateException: 
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn@16b089a6 received state 
> cleanup timer for window 
> org.apache.beam.sdk.transforms.windowing.GlobalWindow@29ca0210 that is before 
> the appropriate cleanup time 294248-01-24T04:00:54.776Z
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:842)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processSystemTimer(SimpleParDoFn.java:384)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.access$700(SimpleParDoFn.java:73)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn$TimerType$2.processTimer(SimpleParDoFn.java:444)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:467)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:354)
> org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:52)
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1316)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1049)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> java.lang.IllegalStateException: 
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn@59902a10 received state 
> cleanup timer for window 
> org.apache.beam.sdk.transforms.windowing.GlobalWindow@29ca0210 that is before 
> the appropriate cleanup time 294248-01-24T04:00:54.776Z
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:842)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processSystemTimer(SimpleParDoFn.java:384)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.access$700(SimpleParDoFn.java:73)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn$TimerType$2.processTimer(SimpleParDoFn.java:444)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:467)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:354)
> org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:52)
> org.apache.beam.runners.dataflow.worker.

[jira] [Work logged] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1589?focusedWorklogId=436061&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436061
 ]

ASF GitHub Bot logged work on BEAM-1589:


Author: ASF GitHub Bot
Created on: 21/May/20 15:47
Start Date: 21/May/20 15:47
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #11774:
URL: https://github.com/apache/beam/pull/11774#issuecomment-632168553


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436061)
Time Spent: 13.5h  (was: 13h 20m)

> Add OnWindowExpiration method to Stateful DoFn
> --
>
> Key: BEAM-1589
> URL: https://issues.apache.org/jira/browse/BEAM-1589
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core, sdk-java-core
>Reporter: Jingsong Lee
>Assignee: Rehman Murad Ali
>Priority: P2
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> See BEAM-1517
> This allows the user to do some work before the state's garbage collection.
> It seems kind of annoying, but on the other hand forgetting to set a final 
> timer to flush state is probably data loss most of the time.
> FlinkRunner does this work very simply, but other runners, such as 
> DirectRunner, need to traverse all the states to do this, and maybe it's a 
> little hard.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1589?focusedWorklogId=436062&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436062
 ]

ASF GitHub Bot logged work on BEAM-1589:


Author: ASF GitHub Bot
Created on: 21/May/20 15:48
Start Date: 21/May/20 15:48
Worklog Time Spent: 10m 
  Work Description: reuvenlax merged pull request #11774:
URL: https://github.com/apache/beam/pull/11774


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436062)
Time Spent: 13h 40m  (was: 13.5h)

> Add OnWindowExpiration method to Stateful DoFn
> --
>
> Key: BEAM-1589
> URL: https://issues.apache.org/jira/browse/BEAM-1589
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core, sdk-java-core
>Reporter: Jingsong Lee
>Assignee: Rehman Murad Ali
>Priority: P2
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>
> See BEAM-1517
> This allows the user to do some work before the state's garbage collection.
> It seems kind of annoying, but on the other hand forgetting to set a final 
> timer to flush state is probably data loss most of the time.
> FlinkRunner does this work very simply, but other runners, such as 
> DirectRunner, need to traverse all the states to do this, and maybe it's a 
> little hard.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10050) VideoIntelligenceIT.annotateVideoFromURINoContext is flaky

2020-05-21 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-10050:
-
Description: 
I've seen this fail a few times in precommits [Example 
failure|https://builds.apache.org/job/beam_PreCommit_Java_Commit/11515/]

{code}
java.lang.AssertionError: Annotate 
video/ParDo(AnnotateVideoFromURI)/ParMultiDo(AnnotateVideoFromURI).output: 
expected: but was:
at 
org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:169)
at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:411)
at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:403)
at 
org.apache.beam.sdk.extensions.ml.VideoIntelligenceIT.annotateVideoFromURINoContext(VideoIntelligenceIT.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
{code}

  was:
I've seen this fail a few times in precommits [Example 
failure](VideoIntelligenceIT.annotateVideoFromURINoContext)

{code}
java.lang.AssertionError: Annotate 
video/ParDo(AnnotateVideoFromURI)/ParMultiDo(AnnotateVideoFromURI).output: 
expected: but was:
at 
org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:169)
at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:411)
at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:403)
at 
org.apache.beam.sdk.extensions.ml.VideoIntelligenceIT.annotateVideoFromURINoContext(VideoIntelligenceIT.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
{code}


> VideoIntelligenceIT.annotateVideoFromURINoContext is flaky
> --
>
> Key: BEAM-10050
> URL: https://issues.apache.org/jira/browse/BEAM-10050
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Michał Walenia
>Priority: P2
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've seen this fail a few times in precommits [Example 
> failure|https://builds.apache.org/job/beam_PreCommit_Java_Commit/11515/]
> {code}
> java.lang.AssertionError: Annotate 
> video/ParDo(AnnotateVideoFromURI)/ParMultiDo(AnnotateVideoFromURI).output: 
> expected: but was:
>   at 
> org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:169)
>   at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:411)
>   at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:403)
>   at 
> org.apache.beam.sdk.extensions.ml.VideoIntelligenceIT.annotateVideoFromURINoContext(VideoIntelligenceIT.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10050) VideoIntelligenceIT.annotateVideoFromURINoContext is flaky

2020-05-21 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113310#comment-17113310
 ] 

Brian Hulette commented on BEAM-10050:
--

Sorry about that! I meant to link one in the description but I copied the wrong 
thing. I updated it with a link to this job: 
https://builds.apache.org/job/beam_PreCommit_Java_Commit/11515/

> VideoIntelligenceIT.annotateVideoFromURINoContext is flaky
> --
>
> Key: BEAM-10050
> URL: https://issues.apache.org/jira/browse/BEAM-10050
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Michał Walenia
>Priority: P2
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've seen this fail a few times in precommits [Example 
> failure|https://builds.apache.org/job/beam_PreCommit_Java_Commit/11515/]
> {code}
> java.lang.AssertionError: Annotate 
> video/ParDo(AnnotateVideoFromURI)/ParMultiDo(AnnotateVideoFromURI).output: 
> expected: but was:
>   at 
> org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:169)
>   at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:411)
>   at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:403)
>   at 
> org.apache.beam.sdk.extensions.ml.VideoIntelligenceIT.annotateVideoFromURINoContext(VideoIntelligenceIT.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10054) Direct Runner execution stalls with test pipeline

2020-05-21 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10054:
---
Status: Open  (was: Triage Needed)

> Direct Runner execution stalls with test pipeline
> -
>
> Key: BEAM-10054
> URL: https://issues.apache.org/jira/browse/BEAM-10054
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>
> Internally, we have a test pipeline which runs with the DirectRunner. When 
> upgrading from 2.18.0 to 2.21.0 the test failed with the following exception:
> {noformat}
> tp = Exception('Monitor task detected a pipeline stall.',), value = None, tb 
> = None
> def raise_(tp, value=None, tb=None):
> """
> A function that matches the Python 2.x ``raise`` statement. This
> allows re-raising exceptions with the cls value and traceback on
> Python 2 and 3.
> """
> if value is not None and isinstance(tp, Exception):
> raise TypeError("instance exception may not have a separate 
> value")
> if value is not None:
> exc = tp(value)
> else:
> exc = tp
> if exc.__traceback__ is not tb:
> raise exc.with_traceback(tb)
> >   raise exc
> E   Exception: Monitor task detected a pipeline stall.
> {noformat}
> I was able to bisect the error. This commit introduced the failure: 
> https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731
> If the following conditions evaluates to False, the pipeline runs correctly: 
> https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731#diff-2bb845e226f3a97c0f0f737d0558c5dbR1273



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10054) Direct Runner execution stalls with test pipeline

2020-05-21 Thread Maximilian Michels (Jira)
Maximilian Michels created BEAM-10054:
-

 Summary: Direct Runner execution stalls with test pipeline
 Key: BEAM-10054
 URL: https://issues.apache.org/jira/browse/BEAM-10054
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Maximilian Michels
Assignee: Maximilian Michels


Internally, we have a test pipeline which runs with the DirectRunner. When 
upgrading from 2.18.0 to 2.21.0 the test failed with the following exception:

{noformat}
tp = Exception('Monitor task detected a pipeline stall.',), value = None, tb = 
None

def raise_(tp, value=None, tb=None):
"""
A function that matches the Python 2.x ``raise`` statement. This
allows re-raising exceptions with the cls value and traceback on
Python 2 and 3.
"""
if value is not None and isinstance(tp, Exception):
raise TypeError("instance exception may not have a separate value")
if value is not None:
exc = tp(value)
else:
exc = tp
if exc.__traceback__ is not tb:
raise exc.with_traceback(tb)
>   raise exc
E   Exception: Monitor task detected a pipeline stall.
{noformat}

I was able to bisect the error. This commit introduced the failure: 
https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731

If the following conditions evaluates to False, the pipeline runs correctly: 
https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731#diff-2bb845e226f3a97c0f0f737d0558c5dbR1273



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9603) Support Dynamic Timer in Java SDK over FnApi

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9603?focusedWorklogId=436074&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436074
 ]

ASF GitHub Bot logged work on BEAM-9603:


Author: ASF GitHub Bot
Created on: 21/May/20 16:06
Start Date: 21/May/20 16:06
Worklog Time Spent: 10m 
  Work Description: y1chi commented on a change in pull request #11756:
URL: https://github.com/apache/beam/pull/11756#discussion_r428754395



##
File path: 
sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java
##
@@ -947,49 +910,213 @@ public void testTimers() throws Exception {
 timerInGlobalWindow("A", new Instant(1600L), new Instant(10012L)),
 timerInGlobalWindow("X", new Instant(1700L), new Instant(10022L)),
 timerInGlobalWindow("C", new Instant(1800L), new Instant(10022L)),
-timerInGlobalWindow("B", new Instant(1900L), new 
Instant(10022L;
+timerInGlobalWindow("B", new Instant(1900L), new Instant(10022L)),
+timerInGlobalWindow("B", new Instant(2000L), new Instant(10032L)),
+timerInGlobalWindow("Y", new Instant(2100L), new 
Instant(10042L;
+assertThat(
+fakeTimerClient.getTimers(eventFamilyTimer),
+contains(
+timerInGlobalWindow("X", "event-timer1", new Instant(1000L), new 
Instant(1003L)),
+timerInGlobalWindow("Y", "event-timer1", new Instant(1100L), new 
Instant(1103L)),
+timerInGlobalWindow("X", "event-timer1", new Instant(1200L), new 
Instant(1203L)),
+timerInGlobalWindow("Y", "event-timer1", new Instant(1300L), new 
Instant(1303L)),
+timerInGlobalWindow("A", "event-timer1", new Instant(1400L), new 
Instant(2413L)),
+timerInGlobalWindow("B", "event-timer1", new Instant(1500L), new 
Instant(2513L)),
+timerInGlobalWindow("A", "event-timer1", new Instant(1600L), new 
Instant(2613L)),
+timerInGlobalWindow("X", "event-timer1", new Instant(1700L), new 
Instant(1723L)),
+timerInGlobalWindow("C", "event-timer1", new Instant(1800L), new 
Instant(1823L)),
+timerInGlobalWindow("B", "event-timer1", new Instant(1900L), new 
Instant(1923L)),
+timerInGlobalWindow("B", "event-timer1", new Instant(2000L), new 
Instant(2033L)),
+timerInGlobalWindow("Y", "event-timer1", new Instant(2100L), new 
Instant(2143L;
+assertThat(
+fakeTimerClient.getTimers(processingFamilyTimer),
+contains(
+timerInGlobalWindow("X", "processing-timer1", new Instant(1000L), 
new Instant(10004L)),
+timerInGlobalWindow("Y", "processing-timer1", new Instant(1100L), 
new Instant(10004L)),
+timerInGlobalWindow("X", "processing-timer1", new Instant(1200L), 
new Instant(10004L)),
+timerInGlobalWindow("Y", "processing-timer1", new Instant(1300L), 
new Instant(10004L)),
+timerInGlobalWindow("A", "processing-timer1", new Instant(1400L), 
new Instant(10014L)),
+timerInGlobalWindow("B", "processing-timer1", new Instant(1500L), 
new Instant(10014L)),
+timerInGlobalWindow("A", "processing-timer1", new Instant(1600L), 
new Instant(10014L)),
+timerInGlobalWindow("X", "processing-timer1", new Instant(1700L), 
new Instant(10024L)),
+timerInGlobalWindow("C", "processing-timer1", new Instant(1800L), 
new Instant(10024L)),
+timerInGlobalWindow("B", "processing-timer1", new Instant(1900L), 
new Instant(10024L)),
+timerInGlobalWindow("B", "processing-timer1", new Instant(2000L), 
new Instant(10034L)),
+timerInGlobalWindow(
+"Y", "processing-timer1", new Instant(2100L), new 
Instant(10044L;
 mainOutputValues.clear();
 
 assertFalse(fakeTimerClient.isOutboundClosed(eventTimer));
 assertFalse(fakeTimerClient.isOutboundClosed(processingTimer));
+assertFalse(fakeTimerClient.isOutboundClosed(eventFamilyTimer));
+assertFalse(fakeTimerClient.isOutboundClosed(processingFamilyTimer));
 fakeTimerClient.closeInbound(eventTimer);
 fakeTimerClient.closeInbound(processingTimer);
+fakeTimerClient.closeInbound(eventFamilyTimer);
+fakeTimerClient.closeInbound(processingFamilyTimer);
 
 Iterables.getOnlyElement(finishFunctionRegistry.getFunctions()).run();
 assertThat(mainOutputValues, empty());
 
 assertTrue(fakeTimerClient.isOutboundClosed(eventTimer));
 assertTrue(fakeTimerClient.isOutboundClosed(processingTimer));
+assertTrue(fakeTimerClient.isOutboundClosed(eventFamilyTimer));
+assertTrue(fakeTimerClient.isOutboundClosed(processingFamilyTimer));
 
 Iterables.getOnlyElement(teardownFunctions).run();
 assertThat(mainOutputValues, empty());
 
 assertEquals(
 ImmutableMap.builder()
 .put(bagUserStateKey("bag", 

[jira] [Work logged] (BEAM-10054) Direct Runner execution stalls with test pipeline

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10054?focusedWorklogId=436073&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436073
 ]

ASF GitHub Bot logged work on BEAM-10054:
-

Author: ASF GitHub Bot
Created on: 21/May/20 16:06
Start Date: 21/May/20 16:06
Worklog Time Spent: 10m 
  Work Description: mxm opened a new pull request #11777:
URL: https://github.com/apache/beam/pull/11777


   We have a test pipeline which runs with the DirectRunner. When upgrading 
from 2.18.0 to 2.21.0 the test failed with the following exception:
   
   ```
   tp = Exception('Monitor task detected a pipeline stall.',), value = None, tb 
= None
   
   def raise_(tp, value=None, tb=None):
   """
   A function that matches the Python 2.x ``raise`` statement. This
   allows re-raising exceptions with the cls value and traceback on
   Python 2 and 3.
   """
   if value is not None and isinstance(tp, Exception):
   raise TypeError("instance exception may not have a separate 
value")
   if value is not None:
   exc = tp(value)
   else:
   exc = tp
   if exc.__traceback__ is not tb:
   raise exc.with_traceback(tb)
   >   raise exc
   E   Exception: Monitor task detected a pipeline stall.
   ```
   
   I was able to bisect the error. This commit introduced the failure: 
ea9b1f350b88c2996cafb4d24351869e82857731
   
   The fix lets to the pipeline running correctly.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://buil

[jira] [Updated] (BEAM-10025) Samza runner failing testOutputTimestampDefault

2020-05-21 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-10025:
-
Fix Version/s: 2.22.0

> Samza runner failing testOutputTimestampDefault
> ---
>
> Key: BEAM-10025
> URL: https://issues.apache.org/jira/browse/BEAM-10025
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Hai Lu
>Priority: P2
>  Labels: currently-failing
> Fix For: 2.22.0
>
>
> This is causing postcommit to fail
> java.lang.AssertionError: Expected 1 successful assertions, but found 0.
> Expected: is <1L>
>  but: was <0L>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10054) Direct Runner execution stalls with test pipeline

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10054?focusedWorklogId=436076&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436076
 ]

ASF GitHub Bot logged work on BEAM-10054:
-

Author: ASF GitHub Bot
Created on: 21/May/20 16:09
Start Date: 21/May/20 16:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11777:
URL: https://github.com/apache/beam/pull/11777#issuecomment-632181118


   R: @rohdesamuel



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436076)
Time Spent: 20m  (was: 10m)

> Direct Runner execution stalls with test pipeline
> -
>
> Key: BEAM-10054
> URL: https://issues.apache.org/jira/browse/BEAM-10054
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Internally, we have a test pipeline which runs with the DirectRunner. When 
> upgrading from 2.18.0 to 2.21.0 the test failed with the following exception:
> {noformat}
> tp = Exception('Monitor task detected a pipeline stall.',), value = None, tb 
> = None
> def raise_(tp, value=None, tb=None):
> """
> A function that matches the Python 2.x ``raise`` statement. This
> allows re-raising exceptions with the cls value and traceback on
> Python 2 and 3.
> """
> if value is not None and isinstance(tp, Exception):
> raise TypeError("instance exception may not have a separate 
> value")
> if value is not None:
> exc = tp(value)
> else:
> exc = tp
> if exc.__traceback__ is not tb:
> raise exc.with_traceback(tb)
> >   raise exc
> E   Exception: Monitor task detected a pipeline stall.
> {noformat}
> I was able to bisect the error. This commit introduced the failure: 
> https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731
> If the following conditions evaluates to False, the pipeline runs correctly: 
> https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731#diff-2bb845e226f3a97c0f0f737d0558c5dbR1273



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10024) Spark runner failing testOutputTimestampDefault

2020-05-21 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-10024:
-
Fix Version/s: 2.22.0

> Spark runner failing testOutputTimestampDefault
> ---
>
> Key: BEAM-10024
> URL: https://issues.apache.org/jira/browse/BEAM-10024
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: currently-failing
> Fix For: 2.22.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is causing postcommit to fail
> java.lang.UnsupportedOperationException: Found TimerId annotations on 
> org.apache.beam.sdk.transforms.ParDoTest$TimerTests$12, but DoFn cannot yet 
> be used with timers in the SparkRunner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10025) Samza runner failing testOutputTimestampDefault

2020-05-21 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113321#comment-17113321
 ] 

Brian Hulette commented on BEAM-10025:
--

This is failing on 2.22 release branch, is it a release blocker? I assume it is 
so I went ahead and tagged fix version 2.22.0, but I would be pleased to learn 
otherwise

> Samza runner failing testOutputTimestampDefault
> ---
>
> Key: BEAM-10025
> URL: https://issues.apache.org/jira/browse/BEAM-10025
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Hai Lu
>Priority: P2
>  Labels: currently-failing
> Fix For: 2.22.0
>
>
> This is causing postcommit to fail
> java.lang.AssertionError: Expected 1 successful assertions, but found 0.
> Expected: is <1L>
>  but: was <0L>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10054) Direct Runner execution stalls with test pipeline

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10054?focusedWorklogId=436077&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436077
 ]

ASF GitHub Bot logged work on BEAM-10054:
-

Author: ASF GitHub Bot
Created on: 21/May/20 16:10
Start Date: 21/May/20 16:10
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11777:
URL: https://github.com/apache/beam/pull/11777#issuecomment-632181899


   Please have a look @rohdesamuel if you consider the fix valid. I'm not very 
familiar with the Python SDK triggering code.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436077)
Time Spent: 0.5h  (was: 20m)

> Direct Runner execution stalls with test pipeline
> -
>
> Key: BEAM-10054
> URL: https://issues.apache.org/jira/browse/BEAM-10054
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Internally, we have a test pipeline which runs with the DirectRunner. When 
> upgrading from 2.18.0 to 2.21.0 the test failed with the following exception:
> {noformat}
> tp = Exception('Monitor task detected a pipeline stall.',), value = None, tb 
> = None
> def raise_(tp, value=None, tb=None):
> """
> A function that matches the Python 2.x ``raise`` statement. This
> allows re-raising exceptions with the cls value and traceback on
> Python 2 and 3.
> """
> if value is not None and isinstance(tp, Exception):
> raise TypeError("instance exception may not have a separate 
> value")
> if value is not None:
> exc = tp(value)
> else:
> exc = tp
> if exc.__traceback__ is not tb:
> raise exc.with_traceback(tb)
> >   raise exc
> E   Exception: Monitor task detected a pipeline stall.
> {noformat}
> I was able to bisect the error. This commit introduced the failure: 
> https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731
> If the following conditions evaluates to False, the pipeline runs correctly: 
> https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731#diff-2bb845e226f3a97c0f0f737d0558c5dbR1273



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10024) Spark runner failing testOutputTimestampDefault

2020-05-21 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113322#comment-17113322
 ] 

Brian Hulette commented on BEAM-10024:
--

Same question here as BEAM-10025 - is this a release blocker?

> Spark runner failing testOutputTimestampDefault
> ---
>
> Key: BEAM-10024
> URL: https://issues.apache.org/jira/browse/BEAM-10024
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: currently-failing
> Fix For: 2.22.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is causing postcommit to fail
> java.lang.UnsupportedOperationException: Found TimerId annotations on 
> org.apache.beam.sdk.transforms.ParDoTest$TimerTests$12, but DoFn cannot yet 
> be used with timers in the SparkRunner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9603) Support Dynamic Timer in Java SDK over FnApi

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9603?focusedWorklogId=436079&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436079
 ]

ASF GitHub Bot logged work on BEAM-9603:


Author: ASF GitHub Bot
Created on: 21/May/20 16:12
Start Date: 21/May/20 16:12
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #11756:
URL: https://github.com/apache/beam/pull/11756#issuecomment-632182849


   retest all please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436079)
Time Spent: 5h 10m  (was: 5h)

> Support Dynamic Timer in Java SDK over FnApi
> 
>
> Key: BEAM-9603
> URL: https://issues.apache.org/jira/browse/BEAM-9603
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-harness
>Reporter: Boyuan Zhang
>Assignee: Yichi Zhang
>Priority: P2
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-21 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-10016:
-
Fix Version/s: 2.22.0

> Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
> ---
>
> Key: BEAM-10016
> URL: https://issues.apache.org/jira/browse/BEAM-10016
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.22.0
>
>
> Both beam_PostCommit_Java_PVR_Flink_Batch and 
> beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2.
> SEVERE: Error in task code:  CHAIN MapPartition (MapPartition at [6]{Values, 
> FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map 
> (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: 
> PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) 
> (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be 
> cast to [B
>   at 
> org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
>   at 
> org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541)
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109)
>   at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203)
>   at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-21 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113327#comment-17113327
 ] 

Brian Hulette commented on BEAM-10016:
--

This is failing on 2.22.0 release branch. Assuming it is a release blocker?

> Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
> ---
>
> Key: BEAM-10016
> URL: https://issues.apache.org/jira/browse/BEAM-10016
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.22.0
>
>
> Both beam_PostCommit_Java_PVR_Flink_Batch and 
> beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2.
> SEVERE: Error in task code:  CHAIN MapPartition (MapPartition at [6]{Values, 
> FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map 
> (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: 
> PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) 
> (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be 
> cast to [B
>   at 
> org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
>   at 
> org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541)
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109)
>   at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203)
>   at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9603) Support Dynamic Timer in Java SDK over FnApi

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9603?focusedWorklogId=436087&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436087
 ]

ASF GitHub Bot logged work on BEAM-9603:


Author: ASF GitHub Bot
Created on: 21/May/20 16:17
Start Date: 21/May/20 16:17
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #11756:
URL: https://github.com/apache/beam/pull/11756#issuecomment-632185565


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436087)
Time Spent: 5h 20m  (was: 5h 10m)

> Support Dynamic Timer in Java SDK over FnApi
> 
>
> Key: BEAM-9603
> URL: https://issues.apache.org/jira/browse/BEAM-9603
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-harness
>Reporter: Boyuan Zhang
>Assignee: Yichi Zhang
>Priority: P2
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10050) VideoIntelligenceIT.annotateVideoFromURINoContext is flaky

2020-05-21 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113338#comment-17113338
 ] 

Brian Hulette commented on BEAM-10050:
--

Also looks like it happened in 2.22.0 release branch validation here: 
https://builds.apache.org/job/beam_Release_Gradle_Build/47

> VideoIntelligenceIT.annotateVideoFromURINoContext is flaky
> --
>
> Key: BEAM-10050
> URL: https://issues.apache.org/jira/browse/BEAM-10050
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Michał Walenia
>Priority: P2
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've seen this fail a few times in precommits [Example 
> failure|https://builds.apache.org/job/beam_PreCommit_Java_Commit/11515/]
> {code}
> java.lang.AssertionError: Annotate 
> video/ParDo(AnnotateVideoFromURI)/ParMultiDo(AnnotateVideoFromURI).output: 
> expected: but was:
>   at 
> org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:169)
>   at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:411)
>   at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:403)
>   at 
> org.apache.beam.sdk.extensions.ml.VideoIntelligenceIT.annotateVideoFromURINoContext(VideoIntelligenceIT.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10048) Remove "manual steps" from release guide.

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10048?focusedWorklogId=436089&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436089
 ]

ASF GitHub Bot logged work on BEAM-10048:
-

Author: ASF GitHub Bot
Created on: 21/May/20 16:32
Start Date: 21/May/20 16:32
Worklog Time Spent: 10m 
  Work Description: ibzib commented on a change in pull request #11764:
URL: https://github.com/apache/beam/pull/11764#discussion_r428769910



##
File path: website/www/site/content/en/contribute/release-guide.md
##
@@ -583,189 +582,44 @@ For this step, we recommend you using automation script 
to create a RC, but you
   1. Stage source release into dist.apache.org dev 
[repo](https://dist.apache.org/repos/dist/dev/beam/).
   1. Stage,sign and hash python binaries into dist.apache.ord dev repo python 
dir
   1. Stage SDK docker images to [docker hub Apache 
organization](https://hub.docker.com/search?q=apache%2Fbeam&type=image).
-  1. Create a PR to update beam and beam-site, changes includes:
+  1. Create a PR to update beam-site, changes includes:
  * Copy python doc into beam-site
  * Copy java doc into beam-site
- * Update release version into 
[_config.yml](https://github.com/apache/beam/blob/master/website/_config.yml).
  
  Tasks you need to do manually
-  1. Add new release into `website/src/get-started/downloads.md`.
-  1. Update last release download links in 
`website/src/get-started/downloads.md`.
-  1. Update `website/src/.htaccess` to redirect to the new version.
-  1. Build and stage python wheels.
+  1. Verify the script worked.
+  1. Verify that the source and Python binaries are present in 
[dist.apache.org](https://dist.apache.org/repos/dist/dev/beam).
+  1. Verify Docker images are published. How to find images:
+  1. Visit 
[https://hub.docker.com/u/apache](https://hub.docker.com/search?q=apache%2Fbeam&type=image)
+  2. Visit each repository and navigate to *tags* tab.
+  3. Verify images are pushed with tags: ${RELEASE}_rc{RC_NUM}
+  1. Verify that third party licenses are included in Docker containers by 
logging in to the images.
+  - For Python SDK images, there should be around 80 ~ 100 
dependencies.
+  Please note that dependencies for the SDKs with different Python 
versions vary.
+  Need to verify all Python images by replacing `${ver}` with each 
supported Python version `X.Y`.
+  ```
+  docker run -it --entrypoint=/bin/bash 
apache/beam_python${ver}_sdk:${RELEASE}_rc{RC_NUM}
+  ls -al /opt/apache/beam/third_party_licenses/ | wc -l
+  ```
+  - For Java SDK images, there should be around 1400 dependencies.
+  ```
+  docker run -it --entrypoint=/bin/bash 
apache/beam_java_sdk:${RELEASE}_rc{RC_NUM}
+  ls -al /opt/apache/beam/third_party_licenses/ | wc -l
+  ```
   1. Publish staging artifacts
-  1. Go to the staging repo to close the staging repository on [Apache 
Nexus](https://repository.apache.org/#stagingRepositories). 
+  1. Go to the staging repo to close the staging repository on [Apache 
Nexus](https://repository.apache.org/#stagingRepositories).
   1. When prompted for a description, enter “Apache Beam, version X, 
release candidate Y”.
-
-
-### (Alternative) Run all steps manually

Review comment:
   Because it's mostly an exact copy of build_release_candidate.sh. Any 
part of this block that wasn't part of build_release_candidate.sh was moved 
into the "manual steps" section.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436089)
Time Spent: 50m  (was: 40m)

> Remove "manual steps" from release guide.
> -
>
> Key: BEAM-10048
> URL: https://issues.apache.org/jira/browse/BEAM-10048
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> release-guide.md contains most of the same instructions as 
> build_release_candidate.sh ("(Alternative) Run all steps manually"). This is 
> not ideal:
> - Mirroring the instructions in release-guide.md doesn't add any value.
> - Every single change to the process requires two identical changes to each 
> file, and this makes it unnecessarily difficult to keep the two in sync.
> - All the extra instructions make release-guide.md harder to rea

[jira] [Work logged] (BEAM-10048) Remove "manual steps" from release guide.

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10048?focusedWorklogId=436090&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436090
 ]

ASF GitHub Bot logged work on BEAM-10048:
-

Author: ASF GitHub Bot
Created on: 21/May/20 16:33
Start Date: 21/May/20 16:33
Worklog Time Spent: 10m 
  Work Description: ibzib commented on a change in pull request #11764:
URL: https://github.com/apache/beam/pull/11764#discussion_r428770502



##
File path: release/src/main/scripts/build_release_candidate.sh
##
@@ -113,7 +113,7 @@ if [[ $confirmation = "y" ]]; then
   echo "-Staging Java Artifacts into Maven---"

Review comment:
   I wasn't sure the best way to do that. Maybe we can add it in a later PR.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436090)
Time Spent: 1h  (was: 50m)

> Remove "manual steps" from release guide.
> -
>
> Key: BEAM-10048
> URL: https://issues.apache.org/jira/browse/BEAM-10048
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> release-guide.md contains most of the same instructions as 
> build_release_candidate.sh ("(Alternative) Run all steps manually"). This is 
> not ideal:
> - Mirroring the instructions in release-guide.md doesn't add any value.
> - Every single change to the process requires two identical changes to each 
> file, and this makes it unnecessarily difficult to keep the two in sync.
> - All the extra instructions make release-guide.md harder to read, obscuring 
> information that the release manager actually does need to know.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-21 Thread Maximilian Michels (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113351#comment-17113351
 ] 

Maximilian Michels commented on BEAM-10016:
---

I think yes, if someone wants to have a look, please take the issue. Otherwise 
I have a look in the next days.

> Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
> ---
>
> Key: BEAM-10016
> URL: https://issues.apache.org/jira/browse/BEAM-10016
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.22.0
>
>
> Both beam_PostCommit_Java_PVR_Flink_Batch and 
> beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2.
> SEVERE: Error in task code:  CHAIN MapPartition (MapPartition at [6]{Values, 
> FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map 
> (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: 
> PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) 
> (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be 
> cast to [B
>   at 
> org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
>   at 
> org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541)
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109)
>   at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203)
>   at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10048) Remove "manual steps" from release guide.

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10048?focusedWorklogId=436092&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436092
 ]

ASF GitHub Bot logged work on BEAM-10048:
-

Author: ASF GitHub Bot
Created on: 21/May/20 16:40
Start Date: 21/May/20 16:40
Worklog Time Spent: 10m 
  Work Description: ibzib commented on a change in pull request #11764:
URL: https://github.com/apache/beam/pull/11764#discussion_r428774392



##
File path: website/www/site/content/en/contribute/release-guide.md
##
@@ -583,189 +582,44 @@ For this step, we recommend you using automation script 
to create a RC, but you
   1. Stage source release into dist.apache.org dev 
[repo](https://dist.apache.org/repos/dist/dev/beam/).
   1. Stage,sign and hash python binaries into dist.apache.ord dev repo python 
dir
   1. Stage SDK docker images to [docker hub Apache 
organization](https://hub.docker.com/search?q=apache%2Fbeam&type=image).
-  1. Create a PR to update beam and beam-site, changes includes:
+  1. Create a PR to update beam-site, changes includes:
  * Copy python doc into beam-site
  * Copy java doc into beam-site
- * Update release version into 
[_config.yml](https://github.com/apache/beam/blob/master/website/_config.yml).
  
  Tasks you need to do manually
-  1. Add new release into `website/src/get-started/downloads.md`.
-  1. Update last release download links in 
`website/src/get-started/downloads.md`.
-  1. Update `website/src/.htaccess` to redirect to the new version.
-  1. Build and stage python wheels.
+  1. Verify the script worked.
+  1. Verify that the source and Python binaries are present in 
[dist.apache.org](https://dist.apache.org/repos/dist/dev/beam).
+  1. Verify Docker images are published. How to find images:
+  1. Visit 
[https://hub.docker.com/u/apache](https://hub.docker.com/search?q=apache%2Fbeam&type=image)
+  2. Visit each repository and navigate to *tags* tab.
+  3. Verify images are pushed with tags: ${RELEASE}_rc{RC_NUM}
+  1. Verify that third party licenses are included in Docker containers by 
logging in to the images.
+  - For Python SDK images, there should be around 80 ~ 100 
dependencies.
+  Please note that dependencies for the SDKs with different Python 
versions vary.
+  Need to verify all Python images by replacing `${ver}` with each 
supported Python version `X.Y`.
+  ```
+  docker run -it --entrypoint=/bin/bash 
apache/beam_python${ver}_sdk:${RELEASE}_rc{RC_NUM}
+  ls -al /opt/apache/beam/third_party_licenses/ | wc -l
+  ```
+  - For Java SDK images, there should be around 1400 dependencies.
+  ```
+  docker run -it --entrypoint=/bin/bash 
apache/beam_java_sdk:${RELEASE}_rc{RC_NUM}
+  ls -al /opt/apache/beam/third_party_licenses/ | wc -l
+  ```
   1. Publish staging artifacts
-  1. Go to the staging repo to close the staging repository on [Apache 
Nexus](https://repository.apache.org/#stagingRepositories). 
+  1. Go to the staging repo to close the staging repository on [Apache 
Nexus](https://repository.apache.org/#stagingRepositories).

Review comment:
   I added more detailed instructions.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436092)
Time Spent: 1h 10m  (was: 1h)

> Remove "manual steps" from release guide.
> -
>
> Key: BEAM-10048
> URL: https://issues.apache.org/jira/browse/BEAM-10048
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> release-guide.md contains most of the same instructions as 
> build_release_candidate.sh ("(Alternative) Run all steps manually"). This is 
> not ideal:
> - Mirroring the instructions in release-guide.md doesn't add any value.
> - Every single change to the process requires two identical changes to each 
> file, and this makes it unnecessarily difficult to keep the two in sync.
> - All the extra instructions make release-guide.md harder to read, obscuring 
> information that the release manager actually does need to know.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9946) Enhance Partition transform to provide partitionfn with SideInputs

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9946?focusedWorklogId=436093&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436093
 ]

ASF GitHub Bot logged work on BEAM-9946:


Author: ASF GitHub Bot
Created on: 21/May/20 16:47
Start Date: 21/May/20 16:47
Worklog Time Spent: 10m 
  Work Description: apilloud commented on a change in pull request #11682:
URL: https://github.com/apache/beam/pull/11682#discussion_r428778641



##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Partition.java
##
@@ -85,7 +141,14 @@
* @throws IllegalArgumentException if {@code numPartitions <= 0}
*/
   public static  Partition of(int numPartitions, PartitionFn 
partitionFn) {
-return new Partition<>(new PartitionDoFn(numPartitions, partitionFn));
+
+Contextful ctfFn =
+Contextful.fn(
+(T element, Contextful.Fn.Context c) ->
+partitionFn.partitionFor(element, numPartitions),
+Requirements.empty());
+Object aClass = partitionFn;

Review comment:
   The statement `Object aClass = partitionFn;` has no effect. You can just 
pass `partitionFn` directly into the function.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436093)
Remaining Estimate: 94h 20m  (was: 94.5h)
Time Spent: 1h 40m  (was: 1.5h)

> Enhance Partition transform to provide partitionfn with SideInputs
> --
>
> Key: BEAM-9946
> URL: https://issues.apache.org/jira/browse/BEAM-9946
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: P2
>   Original Estimate: 96h
>  Time Spent: 1h 40m
>  Remaining Estimate: 94h 20m
>
> Currently _Partition_ transform can partition a collection into n collections 
> based on only _element_ value in _PartitionFn_ to decide on which partition a 
> particular element belongs to.
> {code:java}
> public interface PartitionFn extends Serializable {
> int partitionFor(T elem, int numPartitions);
>   }
> public static  Partition of(int numPartitions, PartitionFn 
> partitionFn) {
> return new Partition<>(new PartitionDoFn(numPartitions, partitionFn));
>   }
> {code}
> It will be useful to introduce new API with additional _sideInputs_ provided 
> to partition function. User will be able to write logic to use both _element_ 
> value and _sideInputs_ to decide on which partition a particular element 
> belongs to.
> Option-1: Proposed new API:
> {code:java}
>   public interface PartitionWithSideInputsFn extends Serializable {
> int partitionFor(T elem, int numPartitions, Context c);
>   }
> public static  Partition of(int numPartitions, 
> PartitionWithSideInputsFn partitionFn, Requirements requirements) {
>  ...
>   }
> {code}
> User can use any of the two APIs as per there partitioning function logic.
> Option-2: Redesign old API with Builder Pattern which can provide optionally 
> a _Requirements_ with _sideInputs._ Deprecate old API.
> {code:java}
> // using sideviews
> Partition.into(numberOfPartitions).via(
> fn(
>   (input,c) ->  {
> // use c.sideInput(view)
> // use input
> // return partitionnumber
>  },requiresSideInputs(view))
> )
> // without using sideviews
> Partition.into(numberOfPartitions).via(
> fn((input,c) ->  {
> // use input
> // return partitionnumber
>  })
> )
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10054) Direct Runner execution stalls with test pipeline

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10054?focusedWorklogId=436094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436094
 ]

ASF GitHub Bot logged work on BEAM-10054:
-

Author: ASF GitHub Bot
Created on: 21/May/20 16:47
Start Date: 21/May/20 16:47
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #11777:
URL: https://github.com/apache/beam/pull/11777#discussion_r428778698



##
File path: sdks/python/apache_beam/transforms/trigger.py
##
@@ -1368,7 +1368,7 @@ def _output(
 if timestamp is None:
   # If no watermark hold was set, output at end of window.
   timestamp = window.max_timestamp()
-elif input_watermark < window.end and self.trigger_fn.has_ontime_pane():
+elif output_watermark < window.end and self.trigger_fn.has_ontime_pane():

Review comment:
   Note that this is the fix in question. Please check @rohdesamuel if that 
makes sense.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436094)
Time Spent: 40m  (was: 0.5h)

> Direct Runner execution stalls with test pipeline
> -
>
> Key: BEAM-10054
> URL: https://issues.apache.org/jira/browse/BEAM-10054
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Internally, we have a test pipeline which runs with the DirectRunner. When 
> upgrading from 2.18.0 to 2.21.0 the test failed with the following exception:
> {noformat}
> tp = Exception('Monitor task detected a pipeline stall.',), value = None, tb 
> = None
> def raise_(tp, value=None, tb=None):
> """
> A function that matches the Python 2.x ``raise`` statement. This
> allows re-raising exceptions with the cls value and traceback on
> Python 2 and 3.
> """
> if value is not None and isinstance(tp, Exception):
> raise TypeError("instance exception may not have a separate 
> value")
> if value is not None:
> exc = tp(value)
> else:
> exc = tp
> if exc.__traceback__ is not tb:
> raise exc.with_traceback(tb)
> >   raise exc
> E   Exception: Monitor task detected a pipeline stall.
> {noformat}
> I was able to bisect the error. This commit introduced the failure: 
> https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731
> If the following conditions evaluates to False, the pipeline runs correctly: 
> https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731#diff-2bb845e226f3a97c0f0f737d0558c5dbR1273



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9946) Enhance Partition transform to provide partitionfn with SideInputs

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9946?focusedWorklogId=436095&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436095
 ]

ASF GitHub Bot logged work on BEAM-9946:


Author: ASF GitHub Bot
Created on: 21/May/20 16:48
Start Date: 21/May/20 16:48
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #11682:
URL: https://github.com/apache/beam/pull/11682#issuecomment-632218345


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436095)
Remaining Estimate: 94h 10m  (was: 94h 20m)
Time Spent: 1h 50m  (was: 1h 40m)

> Enhance Partition transform to provide partitionfn with SideInputs
> --
>
> Key: BEAM-9946
> URL: https://issues.apache.org/jira/browse/BEAM-9946
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: P2
>   Original Estimate: 96h
>  Time Spent: 1h 50m
>  Remaining Estimate: 94h 10m
>
> Currently _Partition_ transform can partition a collection into n collections 
> based on only _element_ value in _PartitionFn_ to decide on which partition a 
> particular element belongs to.
> {code:java}
> public interface PartitionFn extends Serializable {
> int partitionFor(T elem, int numPartitions);
>   }
> public static  Partition of(int numPartitions, PartitionFn 
> partitionFn) {
> return new Partition<>(new PartitionDoFn(numPartitions, partitionFn));
>   }
> {code}
> It will be useful to introduce new API with additional _sideInputs_ provided 
> to partition function. User will be able to write logic to use both _element_ 
> value and _sideInputs_ to decide on which partition a particular element 
> belongs to.
> Option-1: Proposed new API:
> {code:java}
>   public interface PartitionWithSideInputsFn extends Serializable {
> int partitionFor(T elem, int numPartitions, Context c);
>   }
> public static  Partition of(int numPartitions, 
> PartitionWithSideInputsFn partitionFn, Requirements requirements) {
>  ...
>   }
> {code}
> User can use any of the two APIs as per there partitioning function logic.
> Option-2: Redesign old API with Builder Pattern which can provide optionally 
> a _Requirements_ with _sideInputs._ Deprecate old API.
> {code:java}
> // using sideviews
> Partition.into(numberOfPartitions).via(
> fn(
>   (input,c) ->  {
> // use c.sideInput(view)
> // use input
> // return partitionnumber
>  },requiresSideInputs(view))
> )
> // without using sideviews
> Partition.into(numberOfPartitions).via(
> fn((input,c) ->  {
> // use input
> // return partitionnumber
>  })
> )
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10052) check hash and avoid duplicates when uploading artifact in Python Dataflow Runner

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10052?focusedWorklogId=436098&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436098
 ]

ASF GitHub Bot logged work on BEAM-10052:
-

Author: ASF GitHub Bot
Created on: 21/May/20 16:50
Start Date: 21/May/20 16:50
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11771:
URL: https://github.com/apache/beam/pull/11771#issuecomment-632219614


   WDYT about https://github.com/ihji/beam/pull/1 ?
   (only change is to Environments.java other changes should go away if you 
rebase)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436098)
Time Spent: 0.5h  (was: 20m)

> check hash and avoid duplicates when uploading artifact in Python Dataflow 
> Runner
> -
>
> Key: BEAM-10052
> URL: https://issues.apache.org/jira/browse/BEAM-10052
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> xlang pipeline could have many duplicated jars. it would be great if we check 
> hash and avoid duplicate uploads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10055) Add --region to 3 of the python examples

2020-05-21 Thread Ted Romer (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Romer updated BEAM-10055:
-
Priority: P3  (was: P2)

> Add --region to 3 of the python examples
> 
>
> Key: BEAM-10055
> URL: https://issues.apache.org/jira/browse/BEAM-10055
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ted Romer
>Priority: P3
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Proposed fix: 
> {color:#FF}[https://github.com/tedromer/beam/compare/tedromer:ef811fe...tedromer:1f39865]{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10055) Add --region to 3 of the python examples

2020-05-21 Thread Ted Romer (Jira)
Ted Romer created BEAM-10055:


 Summary: Add --region to 3 of the python examples
 Key: BEAM-10055
 URL: https://issues.apache.org/jira/browse/BEAM-10055
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Ted Romer


Proposed fix: 
{color:#FF}[https://github.com/tedromer/beam/compare/tedromer:ef811fe...tedromer:1f39865]{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-21 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113372#comment-17113372
 ] 

Kyle Weaver commented on BEAM-10016:


This test was newly added, so this is probably not a regression. It does seem 
like an important issue we should look into, but not sure if it should be 
considered a blocker.

> Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
> ---
>
> Key: BEAM-10016
> URL: https://issues.apache.org/jira/browse/BEAM-10016
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.22.0
>
>
> Both beam_PostCommit_Java_PVR_Flink_Batch and 
> beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2.
> SEVERE: Error in task code:  CHAIN MapPartition (MapPartition at [6]{Values, 
> FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map 
> (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: 
> PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) 
> (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be 
> cast to [B
>   at 
> org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
>   at 
> org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541)
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109)
>   at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203)
>   at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10024) Spark runner failing testOutputTimestampDefault

2020-05-21 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113375#comment-17113375
 ] 

Kyle Weaver commented on BEAM-10024:


No. A new test was added without being properly annotated, causing it to run 
where it should have been skipped.

> Spark runner failing testOutputTimestampDefault
> ---
>
> Key: BEAM-10024
> URL: https://issues.apache.org/jira/browse/BEAM-10024
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: currently-failing
> Fix For: 2.22.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is causing postcommit to fail
> java.lang.UnsupportedOperationException: Found TimerId annotations on 
> org.apache.beam.sdk.transforms.ParDoTest$TimerTests$12, but DoFn cannot yet 
> be used with timers in the SparkRunner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10024) Spark runner failing testOutputTimestampDefault

2020-05-21 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-10024:
---
Fix Version/s: (was: 2.22.0)

> Spark runner failing testOutputTimestampDefault
> ---
>
> Key: BEAM-10024
> URL: https://issues.apache.org/jira/browse/BEAM-10024
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: currently-failing
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is causing postcommit to fail
> java.lang.UnsupportedOperationException: Found TimerId annotations on 
> org.apache.beam.sdk.transforms.ParDoTest$TimerTests$12, but DoFn cannot yet 
> be used with timers in the SparkRunner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10056) Side Input Validation too tight, doesn't allow CoGBK

2020-05-21 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10056:
---
Status: Open  (was: Triage Needed)

> Side Input Validation too tight, doesn't allow CoGBK
> 
>
> Key: BEAM-10056
> URL: https://issues.apache.org/jira/browse/BEAM-10056
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: P1
>
> The following doesn't pass validation, though it should as it's a valid 
> signature for ParDo accepting a PCollection *clientHistory>>
> func (fn *writer) StartBundle(ctx context.Context) error
> func (fn *writer) ProcessElement(
> ctx context.Context,
> key string,
> iter1, iter2 func(**clientHistory) bool)
> func (fn *writer) FinishBundle(ctx context.Context)
> It returns an error:
> Missing side inputs in the StartBundle method of a DoFn. If side inputs are 
> present in ProcessElement those side inputs must also be present in 
> StartBundle.
> Full error:
> inserting ParDo in scope root:
> graph.AsDoFn: for Fn named <...pii...>/userpackage.writer:
> side inputs expected in method StartBundle [recovered]
> panic: Missing side inputs in the StartBundle method of a DoFn. If 
> side inputs are present in ProcessElement those side inputs must also be 
> present in StartBundle.
> Full error:
> inserting ParDo in scope root:
> graph.AsDoFn: for Fn named <...pii...>/userpackage.writer:
> side inputs expected in method StartBundle
> This is happening in the input unaware validation, which means it needs to be 
> loosened, and validated elsewhere.
> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/graph/fn.go#L527
> There are "sibling" cases for the DoFn  signature
> func (fn *writer) StartBundle(context.Context, side func(**clientHistory) 
> bool) error
> func (fn *writer) ProcessElement(
> ctx context.Context,
> key string,
> iter, side func(**clientHistory) bool)
> func (fn *writer) FinishBundle( context.Context, side, func(**clientHistory) 
> bool)
> and
> func (fn *writer) StartBundle(context.Context, side1, side2 
> func(**clientHistory) bool) error
> func (fn *writer) ProcessElement(
> ctx context.Context,
> key string,
> side1, side2 func(**clientHistory) bool)
> func (fn *writer) FinishBundle( context.Context, side1, side2 
> func(**clientHistory) bool)
> Would be for  > with <*clientHistory> on the 
> side, and
>   with <*clientHistory> and <*clientHistory> on the side 
> respectively.
> Which would only be determinable fully with the input, and should provide a 
> clear error when PCollection binding is occuring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10056) Side Input Validation too tight, doesn't allow CoGBK

2020-05-21 Thread Robert Burke (Jira)
Robert Burke created BEAM-10056:
---

 Summary: Side Input Validation too tight, doesn't allow CoGBK
 Key: BEAM-10056
 URL: https://issues.apache.org/jira/browse/BEAM-10056
 Project: Beam
  Issue Type: Bug
  Components: sdk-go
Reporter: Robert Burke
Assignee: Robert Burke


The following doesn't pass validation, though it should as it's a valid 
signature for ParDo accepting a PCollection>

func (fn *writer) StartBundle(ctx context.Context) error

func (fn *writer) ProcessElement(
ctx context.Context,
key string,
iter1, iter2 func(**clientHistory) bool)

func (fn *writer) FinishBundle(ctx context.Context)

It returns an error:

Missing side inputs in the StartBundle method of a DoFn. If side inputs are 
present in ProcessElement those side inputs must also be present in StartBundle.
Full error:
inserting ParDo in scope root:
graph.AsDoFn: for Fn named <...pii...>/userpackage.writer:
side inputs expected in method StartBundle [recovered]
panic: Missing side inputs in the StartBundle method of a DoFn. If side 
inputs are present in ProcessElement those side inputs must also be present in 
StartBundle.
Full error:
inserting ParDo in scope root:
graph.AsDoFn: for Fn named <...pii...>/userpackage.writer:
side inputs expected in method StartBundle


This is happening in the input unaware validation, which means it needs to be 
loosened, and validated elsewhere.
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/graph/fn.go#L527

There are "sibling" cases for the DoFn  signature

func (fn *writer) StartBundle(context.Context, side func(**clientHistory) bool) 
error

func (fn *writer) ProcessElement(
ctx context.Context,
key string,
iter, side func(**clientHistory) bool)

func (fn *writer) FinishBundle( context.Context, side, func(**clientHistory) 
bool)

and

func (fn *writer) StartBundle(context.Context, side1, side2 
func(**clientHistory) bool) error

func (fn *writer) ProcessElement(
ctx context.Context,
key string,
side1, side2 func(**clientHistory) bool)

func (fn *writer) FinishBundle( context.Context, side1, side2 
func(**clientHistory) bool)

Would be for  > with <*clientHistory> on the 
side, and
  with <*clientHistory> and <*clientHistory> on the side respectively.

Which would only be determinable fully with the input, and should provide a 
clear error when PCollection binding is occuring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10025) Samza runner failing testOutputTimestampDefault

2020-05-21 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113376#comment-17113376
 ] 

Kyle Weaver commented on BEAM-10025:


I'm guessing the cause is similar to BEAM-10024, so no, it should not be a 
release blocker.

> Samza runner failing testOutputTimestampDefault
> ---
>
> Key: BEAM-10025
> URL: https://issues.apache.org/jira/browse/BEAM-10025
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Hai Lu
>Priority: P2
>  Labels: currently-failing
> Fix For: 2.22.0
>
>
> This is causing postcommit to fail
> java.lang.AssertionError: Expected 1 successful assertions, but found 0.
> Expected: is <1L>
>  but: was <0L>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10025) Samza runner failing testOutputTimestampDefault

2020-05-21 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-10025:
---
Fix Version/s: (was: 2.22.0)

> Samza runner failing testOutputTimestampDefault
> ---
>
> Key: BEAM-10025
> URL: https://issues.apache.org/jira/browse/BEAM-10025
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Hai Lu
>Priority: P2
>  Labels: currently-failing
>
> This is causing postcommit to fail
> java.lang.AssertionError: Expected 1 successful assertions, but found 0.
> Expected: is <1L>
>  but: was <0L>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=436100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436100
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 21/May/20 16:56
Start Date: 21/May/20 16:56
Worklog Time Spent: 10m 
  Work Description: chamikaramj merged pull request #11757:
URL: https://github.com/apache/beam/pull/11757


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436100)
Time Spent: 22h  (was: 21h 50m)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: P1
>  Time Spent: 22h
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-21 Thread Maximilian Michels (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113380#comment-17113380
 ] 

Maximilian Michels commented on BEAM-10016:
---

Just to clarify, since this is only failing for Beam 2.22.0, we don't have to 
block the 2.21.0 release.

> Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
> ---
>
> Key: BEAM-10016
> URL: https://issues.apache.org/jira/browse/BEAM-10016
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.22.0
>
>
> Both beam_PostCommit_Java_PVR_Flink_Batch and 
> beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2.
> SEVERE: Error in task code:  CHAIN MapPartition (MapPartition at [6]{Values, 
> FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map 
> (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: 
> PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) 
> (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be 
> cast to [B
>   at 
> org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
>   at 
> org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541)
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109)
>   at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203)
>   at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9692) Clean Python DataflowRunner to use portable pipelines

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9692?focusedWorklogId=436101&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436101
 ]

ASF GitHub Bot logged work on BEAM-9692:


Author: ASF GitHub Bot
Created on: 21/May/20 16:57
Start Date: 21/May/20 16:57
Worklog Time Spent: 10m 
  Work Description: robertwb merged pull request #11503:
URL: https://github.com/apache/beam/pull/11503


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436101)
Time Spent: 4.5h  (was: 4h 20m)

> Clean Python DataflowRunner to use portable pipelines
> -
>
> Key: BEAM-9692
> URL: https://issues.apache.org/jira/browse/BEAM-9692
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: P2
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-21 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113383#comment-17113383
 ] 

Kyle Weaver commented on BEAM-10016:


I'm pretty sure it's only failing for Beam 2.22.0 because it was added after 
the 2.21.0 release cut. I haven't tried it but I suspect the test would fail on 
previous Beam releases if backported.

> Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
> ---
>
> Key: BEAM-10016
> URL: https://issues.apache.org/jira/browse/BEAM-10016
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.22.0
>
>
> Both beam_PostCommit_Java_PVR_Flink_Batch and 
> beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2.
> SEVERE: Error in task code:  CHAIN MapPartition (MapPartition at [6]{Values, 
> FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map 
> (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: 
> PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) 
> (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be 
> cast to [B
>   at 
> org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
>   at 
> org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541)
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109)
>   at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203)
>   at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9577?focusedWorklogId=436110&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436110
 ]

ASF GitHub Bot logged work on BEAM-9577:


Author: ASF GitHub Bot
Created on: 21/May/20 17:24
Start Date: 21/May/20 17:24
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #11521:
URL: https://github.com/apache/beam/pull/11521#issuecomment-632237798


   @robertwb I think this broke Java PVR Spark Batch. First failure is here: 
https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/2887/ Not 
sure if there is a jira already



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436110)
Time Spent: 23h 10m  (was: 23h)

> Update artifact staging and retrieval protocols to be dependency aware.
> ---
>
> Key: BEAM-9577
> URL: https://issues.apache.org/jira/browse/BEAM-9577
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
>  Time Spent: 23h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9971) beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)

2020-05-21 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113410#comment-17113410
 ] 

Brian Hulette commented on BEAM-9971:
-

Pretty sure the culprit is https://github.com/apache/beam/pull/11521. This was 
part of the first failure: 
https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/2887/

cc: [~robertwb]

> beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)
> --
>
> Key: BEAM-9971
> URL: https://issues.apache.org/jira/browse/BEAM-9971
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: portability-spark
>
> This happens sporadically. One time the issue affected 14 tests; another time 
> it affected 112 tests.
> java.lang.RuntimeException: The Runner experienced the following error during 
> execution:
> java.io.FileNotFoundException: 
> /tmp/spark-0812a463-8d6b-4c97-be4b-de43baf67108/userFiles-b90ca2e1-2041-442d-ae78-c8e9c30bff49/beam-runners-spark-2.22.0-SNAPSHOT.jar
>  (No such file or directory)
>   at 
> org.apache.beam.runners.portability.JobServicePipelineResult.propagateErrors(JobServicePipelineResult.java:165)
>   at 
> org.apache.beam.runners.portability.JobServicePipelineResult.waitUntilFinish(JobServicePipelineResult.java:110)
>   at 
> org.apache.beam.runners.portability.testing.TestPortableRunner.run(TestPortableRunner.java:83)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
>   at 
> org.apache.beam.runners.core.metrics.MetricsPusherTest.pushesUserMetrics(MetricsPusherTest.java:70)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.GeneratedMethodAccessor161.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDis

[jira] [Commented] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

2020-05-21 Thread Jacob Ferriero (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113411#comment-17113411
 ] 

Jacob Ferriero commented on BEAM-9468:
--

Yes!

> Add Google Cloud Healthcare API IO Connectors
> -
>
> Key: BEAM-9468
> URL: https://issues.apache.org/jira/browse/BEAM-9468
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: P3
>  Time Spent: 53.5h
>  Remaining Estimate: 0h
>
> Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud 
> Healthcare API|https://cloud.google.com/healthcare/docs/]
> HL7v2IO
> FHIRIO
> DICOM 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

2020-05-21 Thread Jacob Ferriero (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Ferriero updated BEAM-9468:
-
Fix Version/s: 2.22.0

> Add Google Cloud Healthcare API IO Connectors
> -
>
> Key: BEAM-9468
> URL: https://issues.apache.org/jira/browse/BEAM-9468
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: P3
> Fix For: 2.22.0
>
>  Time Spent: 53.5h
>  Remaining Estimate: 0h
>
> Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud 
> Healthcare API|https://cloud.google.com/healthcare/docs/]
> HL7v2IO
> FHIRIO
> DICOM 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9971) beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)

2020-05-21 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113412#comment-17113412
 ] 

Brian Hulette commented on BEAM-9971:
-

This is happening on the release branch and really harms the signal of this 
test suite. Latest attempt had 134/204 tests fail: 
https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch_PR/79/testReport/

> beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)
> --
>
> Key: BEAM-9971
> URL: https://issues.apache.org/jira/browse/BEAM-9971
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: portability-spark
>
> This happens sporadically. One time the issue affected 14 tests; another time 
> it affected 112 tests.
> java.lang.RuntimeException: The Runner experienced the following error during 
> execution:
> java.io.FileNotFoundException: 
> /tmp/spark-0812a463-8d6b-4c97-be4b-de43baf67108/userFiles-b90ca2e1-2041-442d-ae78-c8e9c30bff49/beam-runners-spark-2.22.0-SNAPSHOT.jar
>  (No such file or directory)
>   at 
> org.apache.beam.runners.portability.JobServicePipelineResult.propagateErrors(JobServicePipelineResult.java:165)
>   at 
> org.apache.beam.runners.portability.JobServicePipelineResult.waitUntilFinish(JobServicePipelineResult.java:110)
>   at 
> org.apache.beam.runners.portability.testing.TestPortableRunner.run(TestPortableRunner.java:83)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
>   at 
> org.apache.beam.runners.core.metrics.MetricsPusherTest.pushesUserMetrics(MetricsPusherTest.java:70)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.GeneratedMethodAccessor161.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.Conte

[jira] [Resolved] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

2020-05-21 Thread Jacob Ferriero (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Ferriero resolved BEAM-9468.
--
Resolution: Fixed

> Add Google Cloud Healthcare API IO Connectors
> -
>
> Key: BEAM-9468
> URL: https://issues.apache.org/jira/browse/BEAM-9468
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: P3
> Fix For: 2.22.0
>
>  Time Spent: 53.5h
>  Remaining Estimate: 0h
>
> Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud 
> Healthcare API|https://cloud.google.com/healthcare/docs/]
> HL7v2IO
> FHIRIO
> DICOM 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9831) HL7v2IO Improvements

2020-05-21 Thread Jacob Ferriero (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Ferriero resolved BEAM-9831.
--
Fix Version/s: 2.22.0
   Resolution: Fixed

> HL7v2IO Improvements
> 
>
> Key: BEAM-9831
> URL: https://issues.apache.org/jira/browse/BEAM-9831
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> # HL7v2MessageCoder constructor should be public for use by end users
>  # Currently HL7v2IO.ListHL7v2Messages blocks on pagination through list 
> messages results before emitting any output data elements (due to high fan 
> out from a single input element). We should add early firings so that 
> downstream processing can proceed on early pages while later pages are still 
> being scrolled through.
>  # We should drop all output only fields of HL7v2Message and only keep data 
> and labels when calling ingestMessages, rather than expecting the user to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization

2020-05-21 Thread Jacob Ferriero (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Ferriero resolved BEAM-9856.
--
Fix Version/s: 2.22.0
   Resolution: Fixed

> HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
> --
>
> Key: BEAM-9856
> URL: https://issues.apache.org/jira/browse/BEAM-9856
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: P3
> Fix For: 2.22.0
>
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> Currently the List Messages API paginates through in a single ProcessElement 
> Call.
> However we could get a restriction based on createTime using Messages.List 
> filter and orderby.
>  
> This is inline with the future roadmap of  HL7v2 bulk export API becomes 
> available that should allow splitting on (e.g. create time dimension). 
> Leveraging this bulk export might be  a future optimization to explore.
>  
> This could take one of two forms:
> 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make 
> optimization the runner's problem, potentially unnecessarily complex for this 
> use case )
> 2. static splitting on some time partition e.g. finding the earliest 
> createTime and emitting a PCollection of 1 hour partitions and paginating 
> through each hour of data w/ in the time frame that the store spans, in a 
> separate ProcessElement. (easy to implement but will likely have hot keys / 
> stragglers based on "busy hours")
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9577?focusedWorklogId=436113&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436113
 ]

ASF GitHub Bot logged work on BEAM-9577:


Author: ASF GitHub Bot
Created on: 21/May/20 17:31
Start Date: 21/May/20 17:31
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #11521:
URL: https://github.com/apache/beam/pull/11521#issuecomment-632241122


   [BEAM-9971](https://issues.apache.org/jira/browse/BEAM-9971) was filed to 
track



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436113)
Time Spent: 23h 20m  (was: 23h 10m)

> Update artifact staging and retrieval protocols to be dependency aware.
> ---
>
> Key: BEAM-9577
> URL: https://issues.apache.org/jira/browse/BEAM-9577
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
>  Time Spent: 23h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9971) beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)

2020-05-21 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-9971:

Fix Version/s: 2.22.0

> beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)
> --
>
> Key: BEAM-9971
> URL: https://issues.apache.org/jira/browse/BEAM-9971
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: portability-spark
> Fix For: 2.22.0
>
>
> This happens sporadically. One time the issue affected 14 tests; another time 
> it affected 112 tests.
> java.lang.RuntimeException: The Runner experienced the following error during 
> execution:
> java.io.FileNotFoundException: 
> /tmp/spark-0812a463-8d6b-4c97-be4b-de43baf67108/userFiles-b90ca2e1-2041-442d-ae78-c8e9c30bff49/beam-runners-spark-2.22.0-SNAPSHOT.jar
>  (No such file or directory)
>   at 
> org.apache.beam.runners.portability.JobServicePipelineResult.propagateErrors(JobServicePipelineResult.java:165)
>   at 
> org.apache.beam.runners.portability.JobServicePipelineResult.waitUntilFinish(JobServicePipelineResult.java:110)
>   at 
> org.apache.beam.runners.portability.testing.TestPortableRunner.run(TestPortableRunner.java:83)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
>   at 
> org.apache.beam.runners.core.metrics.MetricsPusherTest.pushesUserMetrics(MetricsPusherTest.java:70)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.GeneratedMethodAccessor161.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at com.su

[jira] [Assigned] (BEAM-10055) Add --region to 3 of the python examples

2020-05-21 Thread Ahmet Altay (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay reassigned BEAM-10055:
--

Assignee: Ted Romer

> Add --region to 3 of the python examples
> 
>
> Key: BEAM-10055
> URL: https://issues.apache.org/jira/browse/BEAM-10055
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ted Romer
>Assignee: Ted Romer
>Priority: P3
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Proposed fix: 
> {color:#FF}[https://github.com/tedromer/beam/compare/tedromer:ef811fe...tedromer:1f39865]{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10055) Add --region to 3 of the python examples

2020-05-21 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113421#comment-17113421
 ] 

Kyle Weaver commented on BEAM-10055:


Sorry I missed those. Please feel free to add me as a reviewer when you have a 
PR ready.

> Add --region to 3 of the python examples
> 
>
> Key: BEAM-10055
> URL: https://issues.apache.org/jira/browse/BEAM-10055
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ted Romer
>Assignee: Ted Romer
>Priority: P3
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Proposed fix: 
> {color:#FF}[https://github.com/tedromer/beam/compare/tedromer:ef811fe...tedromer:1f39865]{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9971) beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)

2020-05-21 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113420#comment-17113420
 ] 

Brian Hulette commented on BEAM-9971:
-

Marking this as a 2.22 release blocker since it causes around 50% of the VR 
tests to flake.

> beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)
> --
>
> Key: BEAM-9971
> URL: https://issues.apache.org/jira/browse/BEAM-9971
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: portability-spark
> Fix For: 2.22.0
>
>
> This happens sporadically. One time the issue affected 14 tests; another time 
> it affected 112 tests.
> java.lang.RuntimeException: The Runner experienced the following error during 
> execution:
> java.io.FileNotFoundException: 
> /tmp/spark-0812a463-8d6b-4c97-be4b-de43baf67108/userFiles-b90ca2e1-2041-442d-ae78-c8e9c30bff49/beam-runners-spark-2.22.0-SNAPSHOT.jar
>  (No such file or directory)
>   at 
> org.apache.beam.runners.portability.JobServicePipelineResult.propagateErrors(JobServicePipelineResult.java:165)
>   at 
> org.apache.beam.runners.portability.JobServicePipelineResult.waitUntilFinish(JobServicePipelineResult.java:110)
>   at 
> org.apache.beam.runners.portability.testing.TestPortableRunner.run(TestPortableRunner.java:83)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
>   at 
> org.apache.beam.runners.core.metrics.MetricsPusherTest.pushesUserMetrics(MetricsPusherTest.java:70)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.GeneratedMethodAccessor161.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.grad

[jira] [Commented] (BEAM-9971) beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)

2020-05-21 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113425#comment-17113425
 ] 

Kyle Weaver commented on BEAM-9971:
---

Thanks for pointing out the Java dependencies change Brian -- I somehow 
completely missed that connection. I'll fix it as soon as I can.

> beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)
> --
>
> Key: BEAM-9971
> URL: https://issues.apache.org/jira/browse/BEAM-9971
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: portability-spark
> Fix For: 2.22.0
>
>
> This happens sporadically. One time the issue affected 14 tests; another time 
> it affected 112 tests.
> java.lang.RuntimeException: The Runner experienced the following error during 
> execution:
> java.io.FileNotFoundException: 
> /tmp/spark-0812a463-8d6b-4c97-be4b-de43baf67108/userFiles-b90ca2e1-2041-442d-ae78-c8e9c30bff49/beam-runners-spark-2.22.0-SNAPSHOT.jar
>  (No such file or directory)
>   at 
> org.apache.beam.runners.portability.JobServicePipelineResult.propagateErrors(JobServicePipelineResult.java:165)
>   at 
> org.apache.beam.runners.portability.JobServicePipelineResult.waitUntilFinish(JobServicePipelineResult.java:110)
>   at 
> org.apache.beam.runners.portability.testing.TestPortableRunner.run(TestPortableRunner.java:83)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
>   at 
> org.apache.beam.runners.core.metrics.MetricsPusherTest.pushesUserMetrics(MetricsPusherTest.java:70)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.GeneratedMethodAccessor161.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoade

[jira] [Updated] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-05-21 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-8889:

Fix Version/s: 2.22.0

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: P2
>  Labels: gcs
> Fix For: 2.22.0
>
>   Original Estimate: 168h
>  Time Spent: 36h 20m
>  Remaining Estimate: 131h 40m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9825) Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9825?focusedWorklogId=436120&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436120
 ]

ASF GitHub Bot logged work on BEAM-9825:


Author: ASF GitHub Bot
Created on: 21/May/20 17:46
Start Date: 21/May/20 17:46
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #11610:
URL: https://github.com/apache/beam/pull/11610#issuecomment-632248973


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436120)
Remaining Estimate: 86h  (was: 86h 10m)
Time Spent: 10h  (was: 9h 50m)

> Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll
> --
>
> Key: BEAM-9825
> URL: https://issues.apache.org/jira/browse/BEAM-9825
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: P2
>   Original Estimate: 96h
>  Time Spent: 10h
>  Remaining Estimate: 86h
>
> I'd like to propose following new high-level transforms.
>  * Intersect
> Compute the intersection between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that common to both _leftCollection_ and 
> _rightCollection_
>  
>  * Except
> Compute the difference between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that are in _leftCollection_ but not in 
> _rightCollection_
>  * Union
> Find the elements that are either of two PCollection.
> Implement IntersetAll, ExceptAll and UnionAll variants of transforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9825) Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9825?focusedWorklogId=436121&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436121
 ]

ASF GitHub Bot logged work on BEAM-9825:


Author: ASF GitHub Bot
Created on: 21/May/20 17:47
Start Date: 21/May/20 17:47
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #11610:
URL: https://github.com/apache/beam/pull/11610#issuecomment-632249476


   run Java Precommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436121)
Remaining Estimate: 85h 50m  (was: 86h)
Time Spent: 10h 10m  (was: 10h)

> Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll
> --
>
> Key: BEAM-9825
> URL: https://issues.apache.org/jira/browse/BEAM-9825
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: P2
>   Original Estimate: 96h
>  Time Spent: 10h 10m
>  Remaining Estimate: 85h 50m
>
> I'd like to propose following new high-level transforms.
>  * Intersect
> Compute the intersection between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that common to both _leftCollection_ and 
> _rightCollection_
>  
>  * Except
> Compute the difference between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that are in _leftCollection_ but not in 
> _rightCollection_
>  * Union
> Find the elements that are either of two PCollection.
> Implement IntersetAll, ExceptAll and UnionAll variants of transforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-05-21 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113432#comment-17113432
 ] 

Brian Hulette commented on BEAM-8889:
-

Is this complete with https://github.com/apache/beam/pull/11651?

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: P2
>  Labels: gcs
> Fix For: 2.22.0
>
>   Original Estimate: 168h
>  Time Spent: 36h 20m
>  Remaining Estimate: 131h 40m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9825) Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9825?focusedWorklogId=436122&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436122
 ]

ASF GitHub Bot logged work on BEAM-9825:


Author: ASF GitHub Bot
Created on: 21/May/20 17:48
Start Date: 21/May/20 17:48
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #11610:
URL: https://github.com/apache/beam/pull/11610#issuecomment-632249853


   Tests retriggered. 
   
   And it becomes a really big PR :-) Nice work!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 436122)
Remaining Estimate: 85h 40m  (was: 85h 50m)
Time Spent: 10h 20m  (was: 10h 10m)

> Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll
> --
>
> Key: BEAM-9825
> URL: https://issues.apache.org/jira/browse/BEAM-9825
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: P2
>   Original Estimate: 96h
>  Time Spent: 10h 20m
>  Remaining Estimate: 85h 40m
>
> I'd like to propose following new high-level transforms.
>  * Intersect
> Compute the intersection between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that common to both _leftCollection_ and 
> _rightCollection_
>  
>  * Except
> Compute the difference between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that are in _leftCollection_ but not in 
> _rightCollection_
>  * Union
> Find the elements that are either of two PCollection.
> Implement IntersetAll, ExceptAll and UnionAll variants of transforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >