[jira] [Updated] (BEAM-9361) NPE When putting Avro record with enum through SqlTransform
[ https://issues.apache.org/jira/browse/BEAM-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MAKSIM TSYGAN updated BEAM-9361: Attachment: image-2020-05-21-10-36-00-375.png > NPE When putting Avro record with enum through SqlTransform > --- > > Key: BEAM-9361 > URL: https://issues.apache.org/jira/browse/BEAM-9361 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.19.0 >Reporter: Niels Basjes >Priority: P2 > Attachments: image-2020-05-21-10-36-00-375.png > > > I ran into this problem when trying to put my Avro records through the > SqlTransform. > I was able to reduce the reproduction path to the code below. > This code fails on my machine (using Beam 2.19.0) with the following > NullPointerException > {code:java} > org.apache.beam.sdk.extensions.sql.impl.ParseException: Unable to parse > query SELECT name, direction FROM InputStreamat > org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:175) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103) > at > org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:125) > at > org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:83) > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:539) > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:490) > at > org.apache.beam.sdk.values.PCollectionTuple.apply(PCollectionTuple.java:261) > at com.bol.analytics.m2.TestAvro2SQL.testAvro2SQL(TestAvro2SQL.java:99) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33) > at > com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230) > at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58) > Caused by: > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.ValidationException: > java.lang.NullPointerException > at > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.prepare.PlannerImpl.validate(PlannerImpl.java:217) > at > org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:144) > ... 31 more > Caused by: java.lang.NullPointerException > at > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:45) > at > org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.toRelDataType(CalciteUtils.java:280) > at > org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.toRelDataType(CalciteUtils.java:287) > at > org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.lambda$toCalciteRowType$0(CalciteUtils.java:261) > at > java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110) > at java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:581) > at > org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils
[jira] [Updated] (BEAM-10045) Reduce logging related to ratelimitExceeded error for BQ sink when performing streaming inserts
[ https://issues.apache.org/jira/browse/BEAM-10045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-10045: Status: Open (was: Triage Needed) > Reduce logging related to ratelimitExceeded error for BQ sink when performing > streaming inserts > --- > > Key: BEAM-10045 > URL: https://issues.apache.org/jira/browse/BEAM-10045 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: P2 > > These errors are usually temporary and pipelines may recover. > > We can consider not logging till we backoff for a certain amount of time here. > [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L792] > > cc: [~reuvenlax] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-10053) Timers exception on "Job Drain" while using stateful beam processing in global window
[ https://issues.apache.org/jira/browse/BEAM-10053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía reassigned BEAM-10053: --- Assignee: (was: Aizhamal Nurmamat kyzy) > Timers exception on "Job Drain" while using stateful beam processing in > global window > - > > Key: BEAM-10053 > URL: https://issues.apache.org/jira/browse/BEAM-10053 > Project: Beam > Issue Type: Bug > Components: beam-community, runner-dataflow, sdk-java-core >Affects Versions: 2.19.0 >Reporter: MOHIL >Priority: P2 > > Hello, > > I have a use case where I have two sets of PCollections (RecordA and RecordB) > coming from a real time streaming source like Kafka. > > Both Records are correlated with a common key, let's say KEY. > > The purpose is to enrich RecordA with RecordB's data for which I am using > CoGbByKey. Since RecordA and RecordB for a common key can come within 1-2 > minutes of event time, I am maintaining a sliding window for both records and > then do CoGpByKey for both PCollections. > > The sliding windows that will find both RecordA and RecordB for a common key > KEY, will emit enriched output. Now, since multiple sliding windows can emit > the same output, I finally remove duplicate results by feeding aforementioned > outputs to a global window where I maintain a state to check whether output > has already been processed or not. Since it is a global window, I maintain a > Timer on state (for GC) to let it expire after 10 minutes have elapsed since > state has been written. > > This is working perfectly fine w.r.t the expected results. However, I am > unable to stop job gracefully i.e. Drain the job gracefully. I see following > exception: > > java.lang.IllegalStateException: > org.apache.beam.runners.dataflow.worker.SimpleParDoFn@16b089a6 received state > cleanup timer for window > org.apache.beam.sdk.transforms.windowing.GlobalWindow@29ca0210 that is before > the appropriate cleanup time 294248-01-24T04:00:54.776Z > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:842) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processSystemTimer(SimpleParDoFn.java:384) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.access$700(SimpleParDoFn.java:73) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn$TimerType$2.processTimer(SimpleParDoFn.java:444) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:467) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:354) > org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:52) > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85) > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1316) > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149) > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1049) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > java.lang.IllegalStateException: > org.apache.beam.runners.dataflow.worker.SimpleParDoFn@59902a10 received state > cleanup timer for window > org.apache.beam.sdk.transforms.windowing.GlobalWindow@29ca0210 that is before > the appropriate cleanup time 294248-01-24T04:00:54.776Z > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:842) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processSystemTimer(SimpleParDoFn.java:384) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.access$700(SimpleParDoFn.java:73) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn$TimerType$2.processTimer(SimpleParDoFn.java:444) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:467) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:354) > org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:52) > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85) > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1316) > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149) > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1049) > java.util
[jira] [Updated] (BEAM-10053) Timers exception on "Job Drain" while using stateful beam processing in global window
[ https://issues.apache.org/jira/browse/BEAM-10053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-10053: Component/s: (was: beam-community) > Timers exception on "Job Drain" while using stateful beam processing in > global window > - > > Key: BEAM-10053 > URL: https://issues.apache.org/jira/browse/BEAM-10053 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-java-core >Affects Versions: 2.19.0 >Reporter: MOHIL >Priority: P2 > > Hello, > > I have a use case where I have two sets of PCollections (RecordA and RecordB) > coming from a real time streaming source like Kafka. > > Both Records are correlated with a common key, let's say KEY. > > The purpose is to enrich RecordA with RecordB's data for which I am using > CoGbByKey. Since RecordA and RecordB for a common key can come within 1-2 > minutes of event time, I am maintaining a sliding window for both records and > then do CoGpByKey for both PCollections. > > The sliding windows that will find both RecordA and RecordB for a common key > KEY, will emit enriched output. Now, since multiple sliding windows can emit > the same output, I finally remove duplicate results by feeding aforementioned > outputs to a global window where I maintain a state to check whether output > has already been processed or not. Since it is a global window, I maintain a > Timer on state (for GC) to let it expire after 10 minutes have elapsed since > state has been written. > > This is working perfectly fine w.r.t the expected results. However, I am > unable to stop job gracefully i.e. Drain the job gracefully. I see following > exception: > > java.lang.IllegalStateException: > org.apache.beam.runners.dataflow.worker.SimpleParDoFn@16b089a6 received state > cleanup timer for window > org.apache.beam.sdk.transforms.windowing.GlobalWindow@29ca0210 that is before > the appropriate cleanup time 294248-01-24T04:00:54.776Z > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:842) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processSystemTimer(SimpleParDoFn.java:384) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.access$700(SimpleParDoFn.java:73) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn$TimerType$2.processTimer(SimpleParDoFn.java:444) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:467) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:354) > org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:52) > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85) > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1316) > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149) > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1049) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > java.lang.IllegalStateException: > org.apache.beam.runners.dataflow.worker.SimpleParDoFn@59902a10 received state > cleanup timer for window > org.apache.beam.sdk.transforms.windowing.GlobalWindow@29ca0210 that is before > the appropriate cleanup time 294248-01-24T04:00:54.776Z > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:842) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processSystemTimer(SimpleParDoFn.java:384) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.access$700(SimpleParDoFn.java:73) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn$TimerType$2.processTimer(SimpleParDoFn.java:444) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:467) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:354) > org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:52) > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85) > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1316) > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149) > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1049) > java.util.concurrent.ThreadPoolExecut
[jira] [Commented] (BEAM-10053) Timers exception on "Job Drain" while using stateful beam processing in global window
[ https://issues.apache.org/jira/browse/BEAM-10053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112872#comment-17112872 ] Ismaël Mejía commented on BEAM-10053: - [~lcwik] or [~reuvenlax] can you please take a look or assign to someone who can. > Timers exception on "Job Drain" while using stateful beam processing in > global window > - > > Key: BEAM-10053 > URL: https://issues.apache.org/jira/browse/BEAM-10053 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-java-core >Affects Versions: 2.19.0 >Reporter: MOHIL >Priority: P2 > > Hello, > > I have a use case where I have two sets of PCollections (RecordA and RecordB) > coming from a real time streaming source like Kafka. > > Both Records are correlated with a common key, let's say KEY. > > The purpose is to enrich RecordA with RecordB's data for which I am using > CoGbByKey. Since RecordA and RecordB for a common key can come within 1-2 > minutes of event time, I am maintaining a sliding window for both records and > then do CoGpByKey for both PCollections. > > The sliding windows that will find both RecordA and RecordB for a common key > KEY, will emit enriched output. Now, since multiple sliding windows can emit > the same output, I finally remove duplicate results by feeding aforementioned > outputs to a global window where I maintain a state to check whether output > has already been processed or not. Since it is a global window, I maintain a > Timer on state (for GC) to let it expire after 10 minutes have elapsed since > state has been written. > > This is working perfectly fine w.r.t the expected results. However, I am > unable to stop job gracefully i.e. Drain the job gracefully. I see following > exception: > > java.lang.IllegalStateException: > org.apache.beam.runners.dataflow.worker.SimpleParDoFn@16b089a6 received state > cleanup timer for window > org.apache.beam.sdk.transforms.windowing.GlobalWindow@29ca0210 that is before > the appropriate cleanup time 294248-01-24T04:00:54.776Z > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:842) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processSystemTimer(SimpleParDoFn.java:384) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.access$700(SimpleParDoFn.java:73) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn$TimerType$2.processTimer(SimpleParDoFn.java:444) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:467) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:354) > org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:52) > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85) > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1316) > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149) > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1049) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > java.lang.IllegalStateException: > org.apache.beam.runners.dataflow.worker.SimpleParDoFn@59902a10 received state > cleanup timer for window > org.apache.beam.sdk.transforms.windowing.GlobalWindow@29ca0210 that is before > the appropriate cleanup time 294248-01-24T04:00:54.776Z > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:842) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processSystemTimer(SimpleParDoFn.java:384) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.access$700(SimpleParDoFn.java:73) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn$TimerType$2.processTimer(SimpleParDoFn.java:444) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:467) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:354) > org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:52) > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85) > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1316) > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149) > org.apache.beam.runners.dataflow.worker.Stream
[jira] [Updated] (BEAM-10044) Programming Guide has curly quotes in code samples
[ https://issues.apache.org/jira/browse/BEAM-10044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-10044: Status: Open (was: Triage Needed) > Programming Guide has curly quotes in code samples > -- > > Key: BEAM-10044 > URL: https://issues.apache.org/jira/browse/BEAM-10044 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Ashwin Ramaswami >Priority: P2 > Time Spent: 40m > Remaining Estimate: 0h > > Curly quotes should be converted to regular quotes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10050) VideoIntelligenceIT.annotateVideoFromURINoContext is flaky
[ https://issues.apache.org/jira/browse/BEAM-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-10050: Status: Open (was: Triage Needed) > VideoIntelligenceIT.annotateVideoFromURINoContext is flaky > -- > > Key: BEAM-10050 > URL: https://issues.apache.org/jira/browse/BEAM-10050 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Brian Hulette >Assignee: Michał Walenia >Priority: P2 > > I've seen this fail a few times in precommits [Example > failure](VideoIntelligenceIT.annotateVideoFromURINoContext) > {code} > java.lang.AssertionError: Annotate > video/ParDo(AnnotateVideoFromURI)/ParMultiDo(AnnotateVideoFromURI).output: > expected: but was: > at > org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:169) > at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:411) > at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:403) > at > org.apache.beam.sdk.extensions.ml.VideoIntelligenceIT.annotateVideoFromURINoContext(VideoIntelligenceIT.java:51) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-10044) Programming Guide has curly quotes in code samples
[ https://issues.apache.org/jira/browse/BEAM-10044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía resolved BEAM-10044. - Fix Version/s: Not applicable Resolution: Fixed > Programming Guide has curly quotes in code samples > -- > > Key: BEAM-10044 > URL: https://issues.apache.org/jira/browse/BEAM-10044 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Ashwin Ramaswami >Priority: P2 > Fix For: Not applicable > > Time Spent: 40m > Remaining Estimate: 0h > > Curly quotes should be converted to regular quotes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10044) [website] Programming Guide has curly quotes in code samples
[ https://issues.apache.org/jira/browse/BEAM-10044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-10044: Summary: [website] Programming Guide has curly quotes in code samples (was: Programming Guide has curly quotes in code samples) > [website] Programming Guide has curly quotes in code samples > > > Key: BEAM-10044 > URL: https://issues.apache.org/jira/browse/BEAM-10044 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Ashwin Ramaswami >Priority: P2 > Fix For: Not applicable > > Time Spent: 40m > Remaining Estimate: 0h > > Curly quotes should be converted to regular quotes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9361) NPE When putting Avro record with enum through SqlTransform
[ https://issues.apache.org/jira/browse/BEAM-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112875#comment-17112875 ] MAKSIM TSYGAN commented on BEAM-9361: - I think that need add Enum in SQL type mapping in org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils class for work it in SQL: !image-2020-05-21-10-36-00-375.png! I think that also need add mapping for Enum type for zeta-sql. > NPE When putting Avro record with enum through SqlTransform > --- > > Key: BEAM-9361 > URL: https://issues.apache.org/jira/browse/BEAM-9361 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.19.0 >Reporter: Niels Basjes >Priority: P2 > Attachments: image-2020-05-21-10-36-00-375.png > > > I ran into this problem when trying to put my Avro records through the > SqlTransform. > I was able to reduce the reproduction path to the code below. > This code fails on my machine (using Beam 2.19.0) with the following > NullPointerException > {code:java} > org.apache.beam.sdk.extensions.sql.impl.ParseException: Unable to parse > query SELECT name, direction FROM InputStreamat > org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:175) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103) > at > org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:125) > at > org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:83) > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:539) > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:490) > at > org.apache.beam.sdk.values.PCollectionTuple.apply(PCollectionTuple.java:261) > at com.bol.analytics.m2.TestAvro2SQL.testAvro2SQL(TestAvro2SQL.java:99) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33) > at > com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230) > at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58) > Caused by: > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.ValidationException: > java.lang.NullPointerException > at > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.prepare.PlannerImpl.validate(PlannerImpl.java:217) > at > org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:144) > ... 31 more > Caused by: java.lang.NullPointerException > at > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:45) > at > org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.toRelDataType(CalciteUtils.java:280) > at > org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.toRelDataType(CalciteUtils.java:287) > at > org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.lambda$toCalciteRowType$0(CalciteUtils.j
[jira] [Comment Edited] (BEAM-9361) NPE When putting Avro record with enum through SqlTransform
[ https://issues.apache.org/jira/browse/BEAM-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112875#comment-17112875 ] MAKSIM TSYGAN edited comment on BEAM-9361 at 5/21/20, 7:43 AM: --- I think that need add SQL type mapping in org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils class for work it in SQL: !image-2020-05-21-10-36-00-375.png! I think that also need add mapping for Enum type for zeta-sql. was (Author: maksim.tsygan): I think that need add Enum in SQL type mapping in org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils class for work it in SQL: !image-2020-05-21-10-36-00-375.png! I think that also need add mapping for Enum type for zeta-sql. > NPE When putting Avro record with enum through SqlTransform > --- > > Key: BEAM-9361 > URL: https://issues.apache.org/jira/browse/BEAM-9361 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.19.0 >Reporter: Niels Basjes >Priority: P2 > Attachments: image-2020-05-21-10-36-00-375.png > > > I ran into this problem when trying to put my Avro records through the > SqlTransform. > I was able to reduce the reproduction path to the code below. > This code fails on my machine (using Beam 2.19.0) with the following > NullPointerException > {code:java} > org.apache.beam.sdk.extensions.sql.impl.ParseException: Unable to parse > query SELECT name, direction FROM InputStreamat > org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:175) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103) > at > org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:125) > at > org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:83) > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:539) > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:490) > at > org.apache.beam.sdk.values.PCollectionTuple.apply(PCollectionTuple.java:261) > at com.bol.analytics.m2.TestAvro2SQL.testAvro2SQL(TestAvro2SQL.java:99) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33) > at > com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230) > at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58) > Caused by: > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.ValidationException: > java.lang.NullPointerException > at > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.prepare.PlannerImpl.validate(PlannerImpl.java:217) > at > org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:144) > ... 31 more > Caused by: java.lang.NullPointerException > at > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:45) > at > or
[jira] [Comment Edited] (BEAM-9361) NPE When putting Avro record with enum through SqlTransform
[ https://issues.apache.org/jira/browse/BEAM-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112875#comment-17112875 ] MAKSIM TSYGAN edited comment on BEAM-9361 at 5/21/20, 7:44 AM: --- I think that need add SQL type mapping in org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils class for work it in SQL: !image-2020-05-21-10-36-00-375.png! Also we need add mapping for zeta-sql. was (Author: maksim.tsygan): I think that need add SQL type mapping in org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils class for work it in SQL: !image-2020-05-21-10-36-00-375.png! I think that also need add mapping for Enum type for zeta-sql. > NPE When putting Avro record with enum through SqlTransform > --- > > Key: BEAM-9361 > URL: https://issues.apache.org/jira/browse/BEAM-9361 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.19.0 >Reporter: Niels Basjes >Priority: P2 > Attachments: image-2020-05-21-10-36-00-375.png > > > I ran into this problem when trying to put my Avro records through the > SqlTransform. > I was able to reduce the reproduction path to the code below. > This code fails on my machine (using Beam 2.19.0) with the following > NullPointerException > {code:java} > org.apache.beam.sdk.extensions.sql.impl.ParseException: Unable to parse > query SELECT name, direction FROM InputStreamat > org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:175) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103) > at > org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:125) > at > org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:83) > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:539) > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:490) > at > org.apache.beam.sdk.values.PCollectionTuple.apply(PCollectionTuple.java:261) > at com.bol.analytics.m2.TestAvro2SQL.testAvro2SQL(TestAvro2SQL.java:99) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33) > at > com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230) > at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58) > Caused by: > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.ValidationException: > java.lang.NullPointerException > at > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.prepare.PlannerImpl.validate(PlannerImpl.java:217) > at > org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:144) > ... 31 more > Caused by: java.lang.NullPointerException > at > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:45) > at > org.apache.beam.sdk.extensions.sq
[jira] [Work logged] (BEAM-10052) check hash and avoid duplicates when uploading artifact in Python Dataflow Runner
[ https://issues.apache.org/jira/browse/BEAM-10052?focusedWorklogId=435875&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435875 ] ASF GitHub Bot logged work on BEAM-10052: - Author: ASF GitHub Bot Created on: 21/May/20 08:10 Start Date: 21/May/20 08:10 Worklog Time Spent: 10m Work Description: ihji opened a new pull request #11771: URL: https://github.com/apache/beam/pull/11771 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStruct
[jira] [Work logged] (BEAM-10052) check hash and avoid duplicates when uploading artifact in Python Dataflow Runner
[ https://issues.apache.org/jira/browse/BEAM-10052?focusedWorklogId=435876&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435876 ] ASF GitHub Bot logged work on BEAM-10052: - Author: ASF GitHub Bot Created on: 21/May/20 08:10 Start Date: 21/May/20 08:10 Worklog Time Spent: 10m Work Description: ihji commented on pull request #11771: URL: https://github.com/apache/beam/pull/11771#issuecomment-631950394 R: @chamikaramj This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 435876) Time Spent: 20m (was: 10m) > check hash and avoid duplicates when uploading artifact in Python Dataflow > Runner > - > > Key: BEAM-10052 > URL: https://issues.apache.org/jira/browse/BEAM-10052 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Time Spent: 20m > Remaining Estimate: 0h > > xlang pipeline could have many duplicated jars. it would be great if we check > hash and avoid duplicate uploads. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9383) Staging Dataflow artifacts from environment
[ https://issues.apache.org/jira/browse/BEAM-9383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112909#comment-17112909 ] Heejong Lee commented on BEAM-9383: --- [https://github.com/apache/beam/pull/11771] could mitigate the problem by removing duplicated artifacts from multiple environments. > Staging Dataflow artifacts from environment > --- > > Key: BEAM-9383 > URL: https://issues.apache.org/jira/browse/BEAM-9383 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P0 > Fix For: 2.22.0 > > Time Spent: 12h > Remaining Estimate: 0h > > Staging Dataflow artifacts from environment -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10043) Grammar issues / typos in language-switch.js comments
[ https://issues.apache.org/jira/browse/BEAM-10043?focusedWorklogId=435883&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435883 ] ASF GitHub Bot logged work on BEAM-10043: - Author: ASF GitHub Bot Created on: 21/May/20 08:48 Start Date: 21/May/20 08:48 Worklog Time Spent: 10m Work Description: kamilwu commented on pull request #11760: URL: https://github.com/apache/beam/pull/11760#issuecomment-631967204 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 435883) Remaining Estimate: 0h Time Spent: 10m > Grammar issues / typos in language-switch.js comments > - > > Key: BEAM-10043 > URL: https://issues.apache.org/jira/browse/BEAM-10043 > Project: Beam > Issue Type: Bug > Components: beam-model >Reporter: Ashwin Ramaswami >Priority: P3 > Time Spent: 10m > Remaining Estimate: 0h > > Grammar issues / typos in language-switch.js comments -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9994) Cannot create a virtualenv using Python 3.8 on Jenkins machines
[ https://issues.apache.org/jira/browse/BEAM-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112986#comment-17112986 ] Kamil Wasilewski commented on BEAM-9994: Yes, I did that manually. If someone has instructions on how to update the master VM image, please share it. > Cannot create a virtualenv using Python 3.8 on Jenkins machines > --- > > Key: BEAM-9994 > URL: https://issues.apache.org/jira/browse/BEAM-9994 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: P2 > > Command: *virtualenv --python /usr/bin/python3.8 env* > Output: > {noformat} > Running virtualenv with interpreter /usr/bin/python3.8 > Traceback (most recent call last): > File "/usr/local/lib/python3.5/dist-packages/virtualenv.py", line 22, in > > import distutils.spawn > ModuleNotFoundError: No module named 'distutils.spawn' > {noformat} > Example test affected: > https://builds.apache.org/job/beam_PreCommit_PythonFormatter_Commit/1723/console -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10043) Grammar issues / typos in language-switch.js comments
[ https://issues.apache.org/jira/browse/BEAM-10043?focusedWorklogId=435888&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435888 ] ASF GitHub Bot logged work on BEAM-10043: - Author: ASF GitHub Bot Created on: 21/May/20 09:09 Start Date: 21/May/20 09:09 Worklog Time Spent: 10m Work Description: kamilwu commented on pull request #11760: URL: https://github.com/apache/beam/pull/11760#issuecomment-631976285 Thanks @epicfaace! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 435888) Time Spent: 20m (was: 10m) > Grammar issues / typos in language-switch.js comments > - > > Key: BEAM-10043 > URL: https://issues.apache.org/jira/browse/BEAM-10043 > Project: Beam > Issue Type: Bug > Components: beam-model >Reporter: Ashwin Ramaswami >Priority: P3 > Time Spent: 20m > Remaining Estimate: 0h > > Grammar issues / typos in language-switch.js comments -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10043) Grammar issues / typos in language-switch.js comments
[ https://issues.apache.org/jira/browse/BEAM-10043?focusedWorklogId=435889&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435889 ] ASF GitHub Bot logged work on BEAM-10043: - Author: ASF GitHub Bot Created on: 21/May/20 09:10 Start Date: 21/May/20 09:10 Worklog Time Spent: 10m Work Description: kamilwu merged pull request #11760: URL: https://github.com/apache/beam/pull/11760 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 435889) Time Spent: 0.5h (was: 20m) > Grammar issues / typos in language-switch.js comments > - > > Key: BEAM-10043 > URL: https://issues.apache.org/jira/browse/BEAM-10043 > Project: Beam > Issue Type: Bug > Components: beam-model >Reporter: Ashwin Ramaswami >Priority: P3 > Time Spent: 0.5h > Remaining Estimate: 0h > > Grammar issues / typos in language-switch.js comments -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10043) Grammar issues / typos in language-switch.js comments
[ https://issues.apache.org/jira/browse/BEAM-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamil Wasilewski updated BEAM-10043: Status: Open (was: Triage Needed) > Grammar issues / typos in language-switch.js comments > - > > Key: BEAM-10043 > URL: https://issues.apache.org/jira/browse/BEAM-10043 > Project: Beam > Issue Type: Bug > Components: beam-model >Reporter: Ashwin Ramaswami >Priority: P3 > Time Spent: 0.5h > Remaining Estimate: 0h > > Grammar issues / typos in language-switch.js comments -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-10043) Grammar issues / typos in language-switch.js comments
[ https://issues.apache.org/jira/browse/BEAM-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamil Wasilewski resolved BEAM-10043. - Fix Version/s: Not applicable Resolution: Fixed > Grammar issues / typos in language-switch.js comments > - > > Key: BEAM-10043 > URL: https://issues.apache.org/jira/browse/BEAM-10043 > Project: Beam > Issue Type: Bug > Components: beam-model >Reporter: Ashwin Ramaswami >Priority: P3 > Fix For: Not applicable > > Time Spent: 0.5h > Remaining Estimate: 0h > > Grammar issues / typos in language-switch.js comments -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9722) Add batch SnowflakeIO.Read to Java SDK
[ https://issues.apache.org/jira/browse/BEAM-9722?focusedWorklogId=435898&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435898 ] ASF GitHub Bot logged work on BEAM-9722: Author: ASF GitHub Bot Created on: 21/May/20 09:41 Start Date: 21/May/20 09:41 Worklog Time Spent: 10m Work Description: kkucharc commented on pull request #11360: URL: https://github.com/apache/beam/pull/11360#issuecomment-631990753 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 435898) Time Spent: 8h (was: 7h 50m) > Add batch SnowflakeIO.Read to Java SDK > -- > > Key: BEAM-9722 > URL: https://issues.apache.org/jira/browse/BEAM-9722 > Project: Beam > Issue Type: New Feature > Components: io-ideas >Reporter: Kasia Kucharczyk >Assignee: Dariusz Aniszewski >Priority: P2 > Time Spent: 8h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9930) Add announcement for Beam Summit Digital 2020 to the blog
[ https://issues.apache.org/jira/browse/BEAM-9930?focusedWorklogId=435923&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435923 ] ASF GitHub Bot logged work on BEAM-9930: Author: ASF GitHub Bot Created on: 21/May/20 10:26 Start Date: 21/May/20 10:26 Worklog Time Spent: 10m Work Description: mxm opened a new pull request #11772: URL: https://github.com/apache/beam/pull/11772 Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.ap
[jira] [Work logged] (BEAM-9722) Add batch SnowflakeIO.Read to Java SDK
[ https://issues.apache.org/jira/browse/BEAM-9722?focusedWorklogId=435937&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435937 ] ASF GitHub Bot logged work on BEAM-9722: Author: ASF GitHub Bot Created on: 21/May/20 10:52 Start Date: 21/May/20 10:52 Worklog Time Spent: 10m Work Description: DariuszAniszewski commented on pull request #11360: URL: https://github.com/apache/beam/pull/11360#issuecomment-632019864 Just a small comment about the force-push from above - it was mistakenly done, then reverted. HEAD of this branch is still on **3ba192a** and comment is a leftover. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 435937) Time Spent: 8h 10m (was: 8h) > Add batch SnowflakeIO.Read to Java SDK > -- > > Key: BEAM-9722 > URL: https://issues.apache.org/jira/browse/BEAM-9722 > Project: Beam > Issue Type: New Feature > Components: io-ideas >Reporter: Kasia Kucharczyk >Assignee: Dariusz Aniszewski >Priority: P2 > Time Spent: 8h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9723) [Java] PTransform that connects to Cloud DLP deidentification service
[ https://issues.apache.org/jira/browse/BEAM-9723?focusedWorklogId=435944&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435944 ] ASF GitHub Bot logged work on BEAM-9723: Author: ASF GitHub Bot Created on: 21/May/20 11:51 Start Date: 21/May/20 11:51 Worklog Time Spent: 10m Work Description: mwalenia commented on a change in pull request #11566: URL: https://github.com/apache/beam/pull/11566#discussion_r428605560 ## File path: sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/BatchRequestForDLP.java ## @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import com.google.privacy.dlp.v2.Table; +import java.util.ArrayList; +import java.util.List; +import java.util.concurrent.atomic.AtomicInteger; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.state.BagState; +import org.apache.beam.sdk.state.StateSpec; +import org.apache.beam.sdk.state.StateSpecs; +import org.apache.beam.sdk.state.TimeDomain; +import org.apache.beam.sdk.state.Timer; +import org.apache.beam.sdk.state.TimerSpec; +import org.apache.beam.sdk.state.TimerSpecs; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.values.KV; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * DoFn batching the input PCollection into bigger requests in order to better utilize the Cloud DLP + * service. + */ +@Experimental +class BatchRequestForDLP extends DoFn, KV>> { + public static final Logger LOG = LoggerFactory.getLogger(BatchRequestForDLP.class); + + private final Counter numberOfRowsBagged = + Metrics.counter(BatchRequestForDLP.class, "numberOfRowsBagged"); + private final Integer batchSize; + + @StateId("elementsBag") + private final StateSpec>> elementsBag = StateSpecs.bag(); + + @TimerId("eventTimer") + private final TimerSpec eventTimer = TimerSpecs.timer(TimeDomain.EVENT_TIME); + + public BatchRequestForDLP(Integer batchSize) { +this.batchSize = batchSize; + } + + @ProcessElement + public void process( + @Element KV element, + @StateId("elementsBag") BagState> elementsBag, + @TimerId("eventTimer") Timer eventTimer, + BoundedWindow w) { +elementsBag.add(element); +eventTimer.set(w.maxTimestamp()); + } + + @OnTimer("eventTimer") + public void onTimer( + @StateId("elementsBag") BagState> elementsBag, + OutputReceiver>> output) { +String key = elementsBag.read().iterator().next().getKey(); +AtomicInteger bufferSize = new AtomicInteger(); +List rows = new ArrayList<>(); +elementsBag +.read() +.forEach( +element -> { + int elementSize = element.getValue().getSerializedSize(); + boolean clearBuffer = bufferSize.intValue() + elementSize > batchSize; + if (clearBuffer) { +numberOfRowsBagged.inc(rows.size()); +LOG.debug("Clear Buffer {} , Key {}", bufferSize.intValue(), element.getKey()); Review comment: Sure, I'll move it and add units. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 435944) Time Spent: 4h 20m (was: 4h 10m) > [Java] PTransform that connects to Cloud DLP deidentification service > - > > Key: BEAM-9723 > URL: https://issues.apache.org/jira/browse/BEAM-9723 > Project: Beam > Issue Type: Sub-task > Components: io-java-gcp >Reporter: Michał Walenia >As
[jira] [Work logged] (BEAM-9723) [Java] PTransform that connects to Cloud DLP deidentification service
[ https://issues.apache.org/jira/browse/BEAM-9723?focusedWorklogId=435947&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435947 ] ASF GitHub Bot logged work on BEAM-9723: Author: ASF GitHub Bot Created on: 21/May/20 11:53 Start Date: 21/May/20 11:53 Worklog Time Spent: 10m Work Description: mwalenia commented on a change in pull request #11566: URL: https://github.com/apache/beam/pull/11566#discussion_r428606587 ## File path: sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/DLPTextOperationsIT.java ## @@ -0,0 +1,154 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import static org.junit.Assert.assertTrue; + +import com.google.privacy.dlp.v2.CharacterMaskConfig; +import com.google.privacy.dlp.v2.DeidentifyConfig; +import com.google.privacy.dlp.v2.DeidentifyContentResponse; +import com.google.privacy.dlp.v2.Finding; +import com.google.privacy.dlp.v2.InfoType; +import com.google.privacy.dlp.v2.InfoTypeTransformations; +import com.google.privacy.dlp.v2.InspectConfig; +import com.google.privacy.dlp.v2.InspectContentResponse; +import com.google.privacy.dlp.v2.Likelihood; +import com.google.privacy.dlp.v2.PrimitiveTransformation; +import java.util.ArrayList; +import java.util.List; +import org.apache.beam.sdk.extensions.gcp.options.GcpOptions; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.junit.Rule; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +public class DLPTextOperationsIT { + @Rule public TestPipeline testPipeline = TestPipeline.create(); + + private static final String IDENTIFYING_TEXT = "mary@example.com"; + private static InfoType emailAddress = InfoType.newBuilder().setName("EMAIL_ADDRESS").build(); + private static final InspectConfig inspectConfig = + InspectConfig.newBuilder() + .addInfoTypes(emailAddress) + .setMinLikelihood(Likelihood.LIKELY) + .build(); + + @Test + public void inspectsText() { +String projectId = testPipeline.getOptions().as(GcpOptions.class).getProject(); +PCollection> inspectionResult = +testPipeline +.apply(Create.of(KV.of("", IDENTIFYING_TEXT))) +.apply( +DLPInspectText.newBuilder() +.setBatchSize(52400) Review comment: @santhh Is it 52400 or 524000? It should be the latter, since we're setting the limit to 524 kb, I think. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 435947) Time Spent: 4.5h (was: 4h 20m) > [Java] PTransform that connects to Cloud DLP deidentification service > - > > Key: BEAM-9723 > URL: https://issues.apache.org/jira/browse/BEAM-9723 > Project: Beam > Issue Type: Sub-task > Components: io-java-gcp >Reporter: Michał Walenia >Assignee: Michał Walenia >Priority: P2 > Time Spent: 4.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn
[ https://issues.apache.org/jira/browse/BEAM-1589?focusedWorklogId=435964&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435964 ] ASF GitHub Bot logged work on BEAM-1589: Author: ASF GitHub Bot Created on: 21/May/20 12:44 Start Date: 21/May/20 12:44 Worklog Time Spent: 10m Work Description: harriskistpay opened a new pull request #11773: URL: https://github.com/apache/beam/pull/11773 Updated CHANGES.md file Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit
[jira] [Work logged] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn
[ https://issues.apache.org/jira/browse/BEAM-1589?focusedWorklogId=435965&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435965 ] ASF GitHub Bot logged work on BEAM-1589: Author: ASF GitHub Bot Created on: 21/May/20 12:45 Start Date: 21/May/20 12:45 Worklog Time Spent: 10m Work Description: harriskistpay closed pull request #11773: URL: https://github.com/apache/beam/pull/11773 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 435965) Time Spent: 13h (was: 12h 50m) > Add OnWindowExpiration method to Stateful DoFn > -- > > Key: BEAM-1589 > URL: https://issues.apache.org/jira/browse/BEAM-1589 > Project: Beam > Issue Type: New Feature > Components: runner-core, sdk-java-core >Reporter: Jingsong Lee >Assignee: Rehman Murad Ali >Priority: P2 > Time Spent: 13h > Remaining Estimate: 0h > > See BEAM-1517 > This allows the user to do some work before the state's garbage collection. > It seems kind of annoying, but on the other hand forgetting to set a final > timer to flush state is probably data loss most of the time. > FlinkRunner does this work very simply, but other runners, such as > DirectRunner, need to traverse all the states to do this, and maybe it's a > little hard. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn
[ https://issues.apache.org/jira/browse/BEAM-1589?focusedWorklogId=435966&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435966 ] ASF GitHub Bot logged work on BEAM-1589: Author: ASF GitHub Bot Created on: 21/May/20 12:47 Start Date: 21/May/20 12:47 Worklog Time Spent: 10m Work Description: rehmanmuradali opened a new pull request #11774: URL: https://github.com/apache/beam/pull/11774 Updated CHANGES.md Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Jav
[jira] [Work logged] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn
[ https://issues.apache.org/jira/browse/BEAM-1589?focusedWorklogId=435967&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435967 ] ASF GitHub Bot logged work on BEAM-1589: Author: ASF GitHub Bot Created on: 21/May/20 12:48 Start Date: 21/May/20 12:48 Worklog Time Spent: 10m Work Description: rehmanmuradali commented on pull request #11774: URL: https://github.com/apache/beam/pull/11774#issuecomment-632067025 R: @reuvenlax This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 435967) Time Spent: 13h 20m (was: 13h 10m) > Add OnWindowExpiration method to Stateful DoFn > -- > > Key: BEAM-1589 > URL: https://issues.apache.org/jira/browse/BEAM-1589 > Project: Beam > Issue Type: New Feature > Components: runner-core, sdk-java-core >Reporter: Jingsong Lee >Assignee: Rehman Murad Ali >Priority: P2 > Time Spent: 13h 20m > Remaining Estimate: 0h > > See BEAM-1517 > This allows the user to do some work before the state's garbage collection. > It seems kind of annoying, but on the other hand forgetting to set a final > timer to flush state is probably data loss most of the time. > FlinkRunner does this work very simply, but other runners, such as > DirectRunner, need to traverse all the states to do this, and maybe it's a > little hard. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10050) VideoIntelligenceIT.annotateVideoFromURINoContext is flaky
[ https://issues.apache.org/jira/browse/BEAM-10050?focusedWorklogId=435973&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435973 ] ASF GitHub Bot logged work on BEAM-10050: - Author: ASF GitHub Bot Created on: 21/May/20 13:03 Start Date: 21/May/20 13:03 Worklog Time Spent: 10m Work Description: mwalenia opened a new pull request #11775: URL: https://github.com/apache/beam/pull/11775 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 435973) Remaining Estimate: 0h Time Spent: 10m > VideoIntelligenceIT.annotateVideoFromURINoContext is flaky > -- > > Key: BEAM-10050 > URL: https://issues.apache.org/jira/browse/BEAM-10050 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Brian Hulette >Assignee: Michał Walenia >Priority: P2 > Time Spent: 10m > Remaining Estimate: 0h > > I've seen this fail a few times in precommits [Example > failure](VideoIntelligenceIT.annotateVideoFromURINoContext) > {code} > java.lang.AssertionError: Annotate > video/ParDo(AnnotateVideoFromURI)/ParMultiDo(AnnotateVideoFromURI).output: > expected: but was: > at > org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:169) > at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:411) > at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:403) > at > org.apache.beam.sdk.extensions.ml.VideoIntelligenceIT.annotateVideoFromURINoContext(VideoIntelligenceIT.java:51) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9723) [Java] PTransform that connects to Cloud DLP deidentification service
[ https://issues.apache.org/jira/browse/BEAM-9723?focusedWorklogId=435996&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435996 ] ASF GitHub Bot logged work on BEAM-9723: Author: ASF GitHub Bot Created on: 21/May/20 13:47 Start Date: 21/May/20 13:47 Worklog Time Spent: 10m Work Description: mwalenia commented on a change in pull request #11566: URL: https://github.com/apache/beam/pull/11566#discussion_r428661780 ## File path: sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPDeidentifyText.java ## @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import com.google.auto.value.AutoValue; +import com.google.cloud.dlp.v2.DlpServiceClient; +import com.google.privacy.dlp.v2.ContentItem; +import com.google.privacy.dlp.v2.DeidentifyConfig; +import com.google.privacy.dlp.v2.DeidentifyContentRequest; +import com.google.privacy.dlp.v2.DeidentifyContentResponse; +import com.google.privacy.dlp.v2.FieldId; +import com.google.privacy.dlp.v2.InspectConfig; +import com.google.privacy.dlp.v2.ProjectName; +import com.google.privacy.dlp.v2.Table; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.stream.Collectors; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionView; + +/** + * A {@link PTransform} connecting to Cloud DLP and deidentifying text according to provided + * settings. The transform supports both CSV formatted input data and unstructured input. + * + * If the csvHeader property is set, csvDelimiter also should be, else the results will be + * incorrect. If csvHeader is not set, input is assumed to be unstructured. + * + * Either inspectTemplateName (String) or inspectConfig {@link InspectConfig} need to be set. The + * situation is the same with deidentifyTemplateName and deidentifyConfig ({@link DeidentifyConfig}. + * + * Batch size defines how big are batches sent to DLP at once in bytes. + * + * The transform outputs {@link KV} of {@link String} (eg. filename) and {@link + * DeidentifyContentResponse}, which will contain {@link Table} of results for the user to consume. + */ +@Experimental +@AutoValue +public abstract class DLPDeidentifyText +extends PTransform< +PCollection>, PCollection>> { + + @Nullable + public abstract String inspectTemplateName(); + + @Nullable + public abstract String deidentifyTemplateName(); + + @Nullable + public abstract InspectConfig inspectConfig(); + + @Nullable + public abstract DeidentifyConfig deidentifyConfig(); + + @Nullable + public abstract PCollectionView> csvHeader(); + + @Nullable + public abstract String csvDelimiter(); + + public abstract Integer batchSize(); + + public abstract String projectId(); + + @AutoValue.Builder + public abstract static class Builder { +public abstract Builder setInspectTemplateName(String inspectTemplateName); + +public abstract Builder setCsvHeader(PCollectionView> csvHeader); + +public abstract Builder setCsvDelimiter(String delimiter); + +public abstract Builder setBatchSize(Integer batchSize); + +public abstract Builder setProjectId(String projectId); + +public abstract Builder setDeidentifyTemplateName(String deidentifyTemplateName); + +public abstract Builder setInspectConfig(InspectConfig inspectConfig); + +public abstract Builder setDeidentifyConfig(DeidentifyConfig deidentifyConfig); + +public abstract DLPDeidentifyText build(); + } + + public static DLPDeidentifyText.Builder newBuilder() { +return new AutoValue_DLPDeidentifyText.Builder(); + } + + /** + * The transform batches the contents of input PCollection and then calls Cloud DLP service to + * perform the deidentification. + * + * @
[jira] [Work logged] (BEAM-9723) [Java] PTransform that connects to Cloud DLP deidentification service
[ https://issues.apache.org/jira/browse/BEAM-9723?focusedWorklogId=436000&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436000 ] ASF GitHub Bot logged work on BEAM-9723: Author: ASF GitHub Bot Created on: 21/May/20 13:55 Start Date: 21/May/20 13:55 Worklog Time Spent: 10m Work Description: mwalenia commented on a change in pull request #11566: URL: https://github.com/apache/beam/pull/11566#discussion_r428666183 ## File path: sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/MapStringToDlpRow.java ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import com.google.privacy.dlp.v2.Table; +import com.google.privacy.dlp.v2.Value; +import java.util.Arrays; +import java.util.List; +import java.util.Objects; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.values.KV; + +class MapStringToDlpRow extends DoFn, KV> { + private final String delimiter; + + public MapStringToDlpRow(String delimiter) { +this.delimiter = delimiter; + } + + @ProcessElement + public void processElement(ProcessContext context) { Review comment: Ok, will do! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436000) Time Spent: 4h 50m (was: 4h 40m) > [Java] PTransform that connects to Cloud DLP deidentification service > - > > Key: BEAM-9723 > URL: https://issues.apache.org/jira/browse/BEAM-9723 > Project: Beam > Issue Type: Sub-task > Components: io-java-gcp >Reporter: Michał Walenia >Assignee: Michał Walenia >Priority: P2 > Time Spent: 4h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9679) Core Transforms | Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9679?focusedWorklogId=436008&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436008 ] ASF GitHub Bot logged work on BEAM-9679: Author: ASF GitHub Bot Created on: 21/May/20 14:04 Start Date: 21/May/20 14:04 Worklog Time Spent: 10m Work Description: henryken commented on pull request #11734: URL: https://github.com/apache/beam/pull/11734#issuecomment-632104571 Thanks @damondouglas! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436008) Time Spent: 2h 10m (was: 2h) > Core Transforms | Go SDK Code Katas > --- > > Key: BEAM-9679 > URL: https://issues.apache.org/jira/browse/BEAM-9679 > Project: Beam > Issue Type: Sub-task > Components: katas, sdk-go >Reporter: Damon Douglas >Assignee: Damon Douglas >Priority: P2 > Time Spent: 2h 10m > Remaining Estimate: 0h > > A kata devoted to core beam transforms patterns after > [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms] > where the take away is an individual's ability to master the following using > an Apache Beam pipeline using the Golang SDK. > * Branching > * > [CoGroupByKey|[https://github.com/damondouglas/beam/tree/BEAM-9679-core-transform-groupbykey]] > * Combine > * Composite Transform > * DoFn Additional Parameters > * Flatten > * GroupByKey > * [Map|[https://github.com/apache/beam/pull/11564]] > * Partition > * Side Input -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9722) Add batch SnowflakeIO.Read to Java SDK
[ https://issues.apache.org/jira/browse/BEAM-9722?focusedWorklogId=436012&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436012 ] ASF GitHub Bot logged work on BEAM-9722: Author: ASF GitHub Bot Created on: 21/May/20 14:08 Start Date: 21/May/20 14:08 Worklog Time Spent: 10m Work Description: kkucharc commented on pull request #11360: URL: https://github.com/apache/beam/pull/11360#issuecomment-632106649 I retested failing test - probably the previous one was timeouting. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436012) Time Spent: 8h 20m (was: 8h 10m) > Add batch SnowflakeIO.Read to Java SDK > -- > > Key: BEAM-9722 > URL: https://issues.apache.org/jira/browse/BEAM-9722 > Project: Beam > Issue Type: New Feature > Components: io-ideas >Reporter: Kasia Kucharczyk >Assignee: Dariusz Aniszewski >Priority: P2 > Time Spent: 8h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9723) [Java] PTransform that connects to Cloud DLP deidentification service
[ https://issues.apache.org/jira/browse/BEAM-9723?focusedWorklogId=436013&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436013 ] ASF GitHub Bot logged work on BEAM-9723: Author: ASF GitHub Bot Created on: 21/May/20 14:12 Start Date: 21/May/20 14:12 Worklog Time Spent: 10m Work Description: mwalenia commented on a change in pull request #11566: URL: https://github.com/apache/beam/pull/11566#discussion_r428676595 ## File path: sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPReidentifyText.java ## @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import com.google.auto.value.AutoValue; +import com.google.cloud.dlp.v2.DlpServiceClient; +import com.google.privacy.dlp.v2.ContentItem; +import com.google.privacy.dlp.v2.DeidentifyConfig; +import com.google.privacy.dlp.v2.FieldId; +import com.google.privacy.dlp.v2.InspectConfig; +import com.google.privacy.dlp.v2.ProjectName; +import com.google.privacy.dlp.v2.ReidentifyContentRequest; +import com.google.privacy.dlp.v2.ReidentifyContentResponse; +import com.google.privacy.dlp.v2.Table; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.stream.Collectors; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionView; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * A {@link PTransform} connecting to Cloud DLP and inspecting text for identifying data according + * to provided settings. + * + * Either inspectTemplateName (String) or inspectConfig {@link InspectConfig} need to be set, the + * same goes for reidentifyTemplateName or reidentifyConfig. + * + * Batch size defines how big are batches sent to DLP at once in bytes. + */ +@Experimental +@AutoValue +public abstract class DLPReidentifyText +extends PTransform< +PCollection>, PCollection>> { + + public static final Logger LOG = LoggerFactory.getLogger(DLPInspectText.class); + + public static final Integer DLP_PAYLOAD_LIMIT = 52400; + + @Nullable + public abstract String inspectTemplateName(); + + @Nullable + public abstract String reidentifyTemplateName(); + + @Nullable + public abstract InspectConfig inspectConfig(); + + @Nullable + public abstract DeidentifyConfig reidentifyConfig(); + + @Nullable + public abstract String csvDelimiter(); + + @Nullable + public abstract PCollectionView> csvHeaders(); + + public abstract Integer batchSize(); + + public abstract String projectId(); + + @AutoValue.Builder + public abstract static class Builder { +public abstract Builder setInspectTemplateName(String inspectTemplateName); + +public abstract Builder setInspectConfig(InspectConfig inspectConfig); + +public abstract Builder setReidentifyConfig(DeidentifyConfig deidentifyConfig); + +public abstract Builder setReidentifyTemplateName(String deidentifyTemplateName); + +public abstract Builder setBatchSize(Integer batchSize); + +public abstract Builder setCsvHeaders(PCollectionView> csvHeaders); + +public abstract Builder setCsvDelimiter(String delimiter); + +public abstract Builder setProjectId(String projectId); + +public abstract DLPReidentifyText build(); + } + + public static DLPReidentifyText.Builder newBuilder() { +return new AutoValue_DLPReidentifyText.Builder(); + } + + @Override + public PCollection> expand( + PCollection> input) { +return input +.apply(ParDo.of(new MapStringToDlpRow(csvDelimiter( +.apply("Batch Contents", ParDo.of(new BatchRequestForDLP(batchSize( +.apply( +"DLPDeidentify", +ParDo.of( +new ReidentifyText( +projectId(), +inspectTemp
[jira] [Work logged] (BEAM-9723) [Java] PTransform that connects to Cloud DLP deidentification service
[ https://issues.apache.org/jira/browse/BEAM-9723?focusedWorklogId=436016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436016 ] ASF GitHub Bot logged work on BEAM-9723: Author: ASF GitHub Bot Created on: 21/May/20 14:17 Start Date: 21/May/20 14:17 Worklog Time Spent: 10m Work Description: mwalenia commented on a change in pull request #11566: URL: https://github.com/apache/beam/pull/11566#discussion_r428679955 ## File path: sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPDeidentifyText.java ## @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import com.google.auto.value.AutoValue; +import com.google.cloud.dlp.v2.DlpServiceClient; +import com.google.privacy.dlp.v2.ContentItem; +import com.google.privacy.dlp.v2.DeidentifyConfig; +import com.google.privacy.dlp.v2.DeidentifyContentRequest; +import com.google.privacy.dlp.v2.DeidentifyContentResponse; +import com.google.privacy.dlp.v2.FieldId; +import com.google.privacy.dlp.v2.InspectConfig; +import com.google.privacy.dlp.v2.ProjectName; +import com.google.privacy.dlp.v2.Table; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.stream.Collectors; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionView; + +/** + * A {@link PTransform} connecting to Cloud DLP and deidentifying text according to provided + * settings. The transform supports both CSV formatted input data and unstructured input. + * + * If the csvHeader property is set, csvDelimiter also should be, else the results will be + * incorrect. If csvHeader is not set, input is assumed to be unstructured. + * + * Either inspectTemplateName (String) or inspectConfig {@link InspectConfig} need to be set. The + * situation is the same with deidentifyTemplateName and deidentifyConfig ({@link DeidentifyConfig}. + * + * Batch size defines how big are batches sent to DLP at once in bytes. + * + * The transform outputs {@link KV} of {@link String} (eg. filename) and {@link + * DeidentifyContentResponse}, which will contain {@link Table} of results for the user to consume. + */ +@Experimental +@AutoValue +public abstract class DLPDeidentifyText +extends PTransform< +PCollection>, PCollection>> { + + @Nullable + public abstract String inspectTemplateName(); + + @Nullable + public abstract String deidentifyTemplateName(); + + @Nullable + public abstract InspectConfig inspectConfig(); + + @Nullable + public abstract DeidentifyConfig deidentifyConfig(); + + @Nullable + public abstract PCollectionView> csvHeader(); + + @Nullable + public abstract String csvDelimiter(); + + public abstract Integer batchSize(); + + public abstract String projectId(); + + @AutoValue.Builder + public abstract static class Builder { +public abstract Builder setInspectTemplateName(String inspectTemplateName); + +public abstract Builder setCsvHeader(PCollectionView> csvHeader); + +public abstract Builder setCsvDelimiter(String delimiter); + +public abstract Builder setBatchSize(Integer batchSize); + +public abstract Builder setProjectId(String projectId); + +public abstract Builder setDeidentifyTemplateName(String deidentifyTemplateName); + +public abstract Builder setInspectConfig(InspectConfig inspectConfig); + +public abstract Builder setDeidentifyConfig(DeidentifyConfig deidentifyConfig); + +public abstract DLPDeidentifyText build(); + } + Review comment: Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries
[jira] [Work logged] (BEAM-9723) [Java] PTransform that connects to Cloud DLP deidentification service
[ https://issues.apache.org/jira/browse/BEAM-9723?focusedWorklogId=436017&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436017 ] ASF GitHub Bot logged work on BEAM-9723: Author: ASF GitHub Bot Created on: 21/May/20 14:20 Start Date: 21/May/20 14:20 Worklog Time Spent: 10m Work Description: mwalenia commented on pull request #11566: URL: https://github.com/apache/beam/pull/11566#issuecomment-632113183 @tysonjh , @santhh Thanks for the review, I added the first batch of fixes. I'll take care of the Javadocs tomorrow. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436017) Time Spent: 5h 20m (was: 5h 10m) > [Java] PTransform that connects to Cloud DLP deidentification service > - > > Key: BEAM-9723 > URL: https://issues.apache.org/jira/browse/BEAM-9723 > Project: Beam > Issue Type: Sub-task > Components: io-java-gcp >Reporter: Michał Walenia >Assignee: Michał Walenia >Priority: P2 > Time Spent: 5h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9421) AI Platform pipeline patterns
[ https://issues.apache.org/jira/browse/BEAM-9421?focusedWorklogId=436019&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436019 ] ASF GitHub Bot logged work on BEAM-9421: Author: ASF GitHub Bot Created on: 21/May/20 14:36 Start Date: 21/May/20 14:36 Worklog Time Spent: 10m Work Description: kamilwu opened a new pull request #11776: URL: https://github.com/apache/beam/pull/11776 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStruct
[jira] [Assigned] (BEAM-5863) Automate Community Metrics infrastructure deployment
[ https://issues.apache.org/jira/browse/BEAM-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamil Wasilewski reassigned BEAM-5863: -- Assignee: Kamil Wasilewski (was: Michał Walenia) > Automate Community Metrics infrastructure deployment > > > Key: BEAM-5863 > URL: https://issues.apache.org/jira/browse/BEAM-5863 > Project: Beam > Issue Type: Sub-task > Components: community-metrics, project-management >Reporter: Scott Wegner >Assignee: Kamil Wasilewski >Priority: P3 > Labels: community-metrics > > Currently the deployment process for the production Community Metrics stack > is manual (documented > [here|https://cwiki.apache.org/confluence/display/BEAM/Community+Metrics]). > If we end up having to deploy more than a few times a year, it would be nice > to automate these steps. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10053) Timers exception on "Job Drain" while using stateful beam processing in global window
[ https://issues.apache.org/jira/browse/BEAM-10053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113286#comment-17113286 ] Reuven Lax commented on BEAM-10053: --- Something appears off here: *org.apache.beam.sdk.transforms.windowing.GlobalWindow@29ca0210 that is before the appropriate cleanup time 294248-01-24T04:00:54.776Z* However the end of the GlobalWindow should be 9223371950454775, which is 294247-01-09T04:00:54+00:00 I don't see any allowed lateness set on the global window. Luke, do you have any idea why there is almost a 1 year difference between these timestamps? > Timers exception on "Job Drain" while using stateful beam processing in > global window > - > > Key: BEAM-10053 > URL: https://issues.apache.org/jira/browse/BEAM-10053 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-java-core >Affects Versions: 2.19.0 >Reporter: MOHIL >Priority: P2 > > Hello, > > I have a use case where I have two sets of PCollections (RecordA and RecordB) > coming from a real time streaming source like Kafka. > > Both Records are correlated with a common key, let's say KEY. > > The purpose is to enrich RecordA with RecordB's data for which I am using > CoGbByKey. Since RecordA and RecordB for a common key can come within 1-2 > minutes of event time, I am maintaining a sliding window for both records and > then do CoGpByKey for both PCollections. > > The sliding windows that will find both RecordA and RecordB for a common key > KEY, will emit enriched output. Now, since multiple sliding windows can emit > the same output, I finally remove duplicate results by feeding aforementioned > outputs to a global window where I maintain a state to check whether output > has already been processed or not. Since it is a global window, I maintain a > Timer on state (for GC) to let it expire after 10 minutes have elapsed since > state has been written. > > This is working perfectly fine w.r.t the expected results. However, I am > unable to stop job gracefully i.e. Drain the job gracefully. I see following > exception: > > java.lang.IllegalStateException: > org.apache.beam.runners.dataflow.worker.SimpleParDoFn@16b089a6 received state > cleanup timer for window > org.apache.beam.sdk.transforms.windowing.GlobalWindow@29ca0210 that is before > the appropriate cleanup time 294248-01-24T04:00:54.776Z > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:842) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processSystemTimer(SimpleParDoFn.java:384) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.access$700(SimpleParDoFn.java:73) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn$TimerType$2.processTimer(SimpleParDoFn.java:444) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:467) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:354) > org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:52) > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85) > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1316) > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149) > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1049) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > java.lang.IllegalStateException: > org.apache.beam.runners.dataflow.worker.SimpleParDoFn@59902a10 received state > cleanup timer for window > org.apache.beam.sdk.transforms.windowing.GlobalWindow@29ca0210 that is before > the appropriate cleanup time 294248-01-24T04:00:54.776Z > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:842) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processSystemTimer(SimpleParDoFn.java:384) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.access$700(SimpleParDoFn.java:73) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn$TimerType$2.processTimer(SimpleParDoFn.java:444) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:467) > org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:354) > org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:52) > org.apache.beam.runners.dataflow.worker.
[jira] [Work logged] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn
[ https://issues.apache.org/jira/browse/BEAM-1589?focusedWorklogId=436061&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436061 ] ASF GitHub Bot logged work on BEAM-1589: Author: ASF GitHub Bot Created on: 21/May/20 15:47 Start Date: 21/May/20 15:47 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #11774: URL: https://github.com/apache/beam/pull/11774#issuecomment-632168553 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436061) Time Spent: 13.5h (was: 13h 20m) > Add OnWindowExpiration method to Stateful DoFn > -- > > Key: BEAM-1589 > URL: https://issues.apache.org/jira/browse/BEAM-1589 > Project: Beam > Issue Type: New Feature > Components: runner-core, sdk-java-core >Reporter: Jingsong Lee >Assignee: Rehman Murad Ali >Priority: P2 > Time Spent: 13.5h > Remaining Estimate: 0h > > See BEAM-1517 > This allows the user to do some work before the state's garbage collection. > It seems kind of annoying, but on the other hand forgetting to set a final > timer to flush state is probably data loss most of the time. > FlinkRunner does this work very simply, but other runners, such as > DirectRunner, need to traverse all the states to do this, and maybe it's a > little hard. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn
[ https://issues.apache.org/jira/browse/BEAM-1589?focusedWorklogId=436062&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436062 ] ASF GitHub Bot logged work on BEAM-1589: Author: ASF GitHub Bot Created on: 21/May/20 15:48 Start Date: 21/May/20 15:48 Worklog Time Spent: 10m Work Description: reuvenlax merged pull request #11774: URL: https://github.com/apache/beam/pull/11774 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436062) Time Spent: 13h 40m (was: 13.5h) > Add OnWindowExpiration method to Stateful DoFn > -- > > Key: BEAM-1589 > URL: https://issues.apache.org/jira/browse/BEAM-1589 > Project: Beam > Issue Type: New Feature > Components: runner-core, sdk-java-core >Reporter: Jingsong Lee >Assignee: Rehman Murad Ali >Priority: P2 > Time Spent: 13h 40m > Remaining Estimate: 0h > > See BEAM-1517 > This allows the user to do some work before the state's garbage collection. > It seems kind of annoying, but on the other hand forgetting to set a final > timer to flush state is probably data loss most of the time. > FlinkRunner does this work very simply, but other runners, such as > DirectRunner, need to traverse all the states to do this, and maybe it's a > little hard. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10050) VideoIntelligenceIT.annotateVideoFromURINoContext is flaky
[ https://issues.apache.org/jira/browse/BEAM-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette updated BEAM-10050: - Description: I've seen this fail a few times in precommits [Example failure|https://builds.apache.org/job/beam_PreCommit_Java_Commit/11515/] {code} java.lang.AssertionError: Annotate video/ParDo(AnnotateVideoFromURI)/ParMultiDo(AnnotateVideoFromURI).output: expected: but was: at org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:169) at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:411) at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:403) at org.apache.beam.sdk.extensions.ml.VideoIntelligenceIT.annotateVideoFromURINoContext(VideoIntelligenceIT.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) {code} was: I've seen this fail a few times in precommits [Example failure](VideoIntelligenceIT.annotateVideoFromURINoContext) {code} java.lang.AssertionError: Annotate video/ParDo(AnnotateVideoFromURI)/ParMultiDo(AnnotateVideoFromURI).output: expected: but was: at org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:169) at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:411) at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:403) at org.apache.beam.sdk.extensions.ml.VideoIntelligenceIT.annotateVideoFromURINoContext(VideoIntelligenceIT.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) {code} > VideoIntelligenceIT.annotateVideoFromURINoContext is flaky > -- > > Key: BEAM-10050 > URL: https://issues.apache.org/jira/browse/BEAM-10050 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Brian Hulette >Assignee: Michał Walenia >Priority: P2 > Time Spent: 10m > Remaining Estimate: 0h > > I've seen this fail a few times in precommits [Example > failure|https://builds.apache.org/job/beam_PreCommit_Java_Commit/11515/] > {code} > java.lang.AssertionError: Annotate > video/ParDo(AnnotateVideoFromURI)/ParMultiDo(AnnotateVideoFromURI).output: > expected: but was: > at > org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:169) > at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:411) > at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:403) > at > org.apache.beam.sdk.extensions.ml.VideoIntelligenceIT.annotateVideoFromURINoContext(VideoIntelligenceIT.java:51) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10050) VideoIntelligenceIT.annotateVideoFromURINoContext is flaky
[ https://issues.apache.org/jira/browse/BEAM-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113310#comment-17113310 ] Brian Hulette commented on BEAM-10050: -- Sorry about that! I meant to link one in the description but I copied the wrong thing. I updated it with a link to this job: https://builds.apache.org/job/beam_PreCommit_Java_Commit/11515/ > VideoIntelligenceIT.annotateVideoFromURINoContext is flaky > -- > > Key: BEAM-10050 > URL: https://issues.apache.org/jira/browse/BEAM-10050 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Brian Hulette >Assignee: Michał Walenia >Priority: P2 > Time Spent: 10m > Remaining Estimate: 0h > > I've seen this fail a few times in precommits [Example > failure|https://builds.apache.org/job/beam_PreCommit_Java_Commit/11515/] > {code} > java.lang.AssertionError: Annotate > video/ParDo(AnnotateVideoFromURI)/ParMultiDo(AnnotateVideoFromURI).output: > expected: but was: > at > org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:169) > at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:411) > at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:403) > at > org.apache.beam.sdk.extensions.ml.VideoIntelligenceIT.annotateVideoFromURINoContext(VideoIntelligenceIT.java:51) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10054) Direct Runner execution stalls with test pipeline
[ https://issues.apache.org/jira/browse/BEAM-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-10054: --- Status: Open (was: Triage Needed) > Direct Runner execution stalls with test pipeline > - > > Key: BEAM-10054 > URL: https://issues.apache.org/jira/browse/BEAM-10054 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: P2 > > Internally, we have a test pipeline which runs with the DirectRunner. When > upgrading from 2.18.0 to 2.21.0 the test failed with the following exception: > {noformat} > tp = Exception('Monitor task detected a pipeline stall.',), value = None, tb > = None > def raise_(tp, value=None, tb=None): > """ > A function that matches the Python 2.x ``raise`` statement. This > allows re-raising exceptions with the cls value and traceback on > Python 2 and 3. > """ > if value is not None and isinstance(tp, Exception): > raise TypeError("instance exception may not have a separate > value") > if value is not None: > exc = tp(value) > else: > exc = tp > if exc.__traceback__ is not tb: > raise exc.with_traceback(tb) > > raise exc > E Exception: Monitor task detected a pipeline stall. > {noformat} > I was able to bisect the error. This commit introduced the failure: > https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731 > If the following conditions evaluates to False, the pipeline runs correctly: > https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731#diff-2bb845e226f3a97c0f0f737d0558c5dbR1273 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10054) Direct Runner execution stalls with test pipeline
Maximilian Michels created BEAM-10054: - Summary: Direct Runner execution stalls with test pipeline Key: BEAM-10054 URL: https://issues.apache.org/jira/browse/BEAM-10054 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Maximilian Michels Assignee: Maximilian Michels Internally, we have a test pipeline which runs with the DirectRunner. When upgrading from 2.18.0 to 2.21.0 the test failed with the following exception: {noformat} tp = Exception('Monitor task detected a pipeline stall.',), value = None, tb = None def raise_(tp, value=None, tb=None): """ A function that matches the Python 2.x ``raise`` statement. This allows re-raising exceptions with the cls value and traceback on Python 2 and 3. """ if value is not None and isinstance(tp, Exception): raise TypeError("instance exception may not have a separate value") if value is not None: exc = tp(value) else: exc = tp if exc.__traceback__ is not tb: raise exc.with_traceback(tb) > raise exc E Exception: Monitor task detected a pipeline stall. {noformat} I was able to bisect the error. This commit introduced the failure: https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731 If the following conditions evaluates to False, the pipeline runs correctly: https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731#diff-2bb845e226f3a97c0f0f737d0558c5dbR1273 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9603) Support Dynamic Timer in Java SDK over FnApi
[ https://issues.apache.org/jira/browse/BEAM-9603?focusedWorklogId=436074&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436074 ] ASF GitHub Bot logged work on BEAM-9603: Author: ASF GitHub Bot Created on: 21/May/20 16:06 Start Date: 21/May/20 16:06 Worklog Time Spent: 10m Work Description: y1chi commented on a change in pull request #11756: URL: https://github.com/apache/beam/pull/11756#discussion_r428754395 ## File path: sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java ## @@ -947,49 +910,213 @@ public void testTimers() throws Exception { timerInGlobalWindow("A", new Instant(1600L), new Instant(10012L)), timerInGlobalWindow("X", new Instant(1700L), new Instant(10022L)), timerInGlobalWindow("C", new Instant(1800L), new Instant(10022L)), -timerInGlobalWindow("B", new Instant(1900L), new Instant(10022L; +timerInGlobalWindow("B", new Instant(1900L), new Instant(10022L)), +timerInGlobalWindow("B", new Instant(2000L), new Instant(10032L)), +timerInGlobalWindow("Y", new Instant(2100L), new Instant(10042L; +assertThat( +fakeTimerClient.getTimers(eventFamilyTimer), +contains( +timerInGlobalWindow("X", "event-timer1", new Instant(1000L), new Instant(1003L)), +timerInGlobalWindow("Y", "event-timer1", new Instant(1100L), new Instant(1103L)), +timerInGlobalWindow("X", "event-timer1", new Instant(1200L), new Instant(1203L)), +timerInGlobalWindow("Y", "event-timer1", new Instant(1300L), new Instant(1303L)), +timerInGlobalWindow("A", "event-timer1", new Instant(1400L), new Instant(2413L)), +timerInGlobalWindow("B", "event-timer1", new Instant(1500L), new Instant(2513L)), +timerInGlobalWindow("A", "event-timer1", new Instant(1600L), new Instant(2613L)), +timerInGlobalWindow("X", "event-timer1", new Instant(1700L), new Instant(1723L)), +timerInGlobalWindow("C", "event-timer1", new Instant(1800L), new Instant(1823L)), +timerInGlobalWindow("B", "event-timer1", new Instant(1900L), new Instant(1923L)), +timerInGlobalWindow("B", "event-timer1", new Instant(2000L), new Instant(2033L)), +timerInGlobalWindow("Y", "event-timer1", new Instant(2100L), new Instant(2143L; +assertThat( +fakeTimerClient.getTimers(processingFamilyTimer), +contains( +timerInGlobalWindow("X", "processing-timer1", new Instant(1000L), new Instant(10004L)), +timerInGlobalWindow("Y", "processing-timer1", new Instant(1100L), new Instant(10004L)), +timerInGlobalWindow("X", "processing-timer1", new Instant(1200L), new Instant(10004L)), +timerInGlobalWindow("Y", "processing-timer1", new Instant(1300L), new Instant(10004L)), +timerInGlobalWindow("A", "processing-timer1", new Instant(1400L), new Instant(10014L)), +timerInGlobalWindow("B", "processing-timer1", new Instant(1500L), new Instant(10014L)), +timerInGlobalWindow("A", "processing-timer1", new Instant(1600L), new Instant(10014L)), +timerInGlobalWindow("X", "processing-timer1", new Instant(1700L), new Instant(10024L)), +timerInGlobalWindow("C", "processing-timer1", new Instant(1800L), new Instant(10024L)), +timerInGlobalWindow("B", "processing-timer1", new Instant(1900L), new Instant(10024L)), +timerInGlobalWindow("B", "processing-timer1", new Instant(2000L), new Instant(10034L)), +timerInGlobalWindow( +"Y", "processing-timer1", new Instant(2100L), new Instant(10044L; mainOutputValues.clear(); assertFalse(fakeTimerClient.isOutboundClosed(eventTimer)); assertFalse(fakeTimerClient.isOutboundClosed(processingTimer)); +assertFalse(fakeTimerClient.isOutboundClosed(eventFamilyTimer)); +assertFalse(fakeTimerClient.isOutboundClosed(processingFamilyTimer)); fakeTimerClient.closeInbound(eventTimer); fakeTimerClient.closeInbound(processingTimer); +fakeTimerClient.closeInbound(eventFamilyTimer); +fakeTimerClient.closeInbound(processingFamilyTimer); Iterables.getOnlyElement(finishFunctionRegistry.getFunctions()).run(); assertThat(mainOutputValues, empty()); assertTrue(fakeTimerClient.isOutboundClosed(eventTimer)); assertTrue(fakeTimerClient.isOutboundClosed(processingTimer)); +assertTrue(fakeTimerClient.isOutboundClosed(eventFamilyTimer)); +assertTrue(fakeTimerClient.isOutboundClosed(processingFamilyTimer)); Iterables.getOnlyElement(teardownFunctions).run(); assertThat(mainOutputValues, empty()); assertEquals( ImmutableMap.builder() .put(bagUserStateKey("bag",
[jira] [Work logged] (BEAM-10054) Direct Runner execution stalls with test pipeline
[ https://issues.apache.org/jira/browse/BEAM-10054?focusedWorklogId=436073&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436073 ] ASF GitHub Bot logged work on BEAM-10054: - Author: ASF GitHub Bot Created on: 21/May/20 16:06 Start Date: 21/May/20 16:06 Worklog Time Spent: 10m Work Description: mxm opened a new pull request #11777: URL: https://github.com/apache/beam/pull/11777 We have a test pipeline which runs with the DirectRunner. When upgrading from 2.18.0 to 2.21.0 the test failed with the following exception: ``` tp = Exception('Monitor task detected a pipeline stall.',), value = None, tb = None def raise_(tp, value=None, tb=None): """ A function that matches the Python 2.x ``raise`` statement. This allows re-raising exceptions with the cls value and traceback on Python 2 and 3. """ if value is not None and isinstance(tp, Exception): raise TypeError("instance exception may not have a separate value") if value is not None: exc = tp(value) else: exc = tp if exc.__traceback__ is not tb: raise exc.with_traceback(tb) > raise exc E Exception: Monitor task detected a pipeline stall. ``` I was able to bisect the error. This commit introduced the failure: ea9b1f350b88c2996cafb4d24351869e82857731 The fix lets to the pipeline running correctly. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://buil
[jira] [Updated] (BEAM-10025) Samza runner failing testOutputTimestampDefault
[ https://issues.apache.org/jira/browse/BEAM-10025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette updated BEAM-10025: - Fix Version/s: 2.22.0 > Samza runner failing testOutputTimestampDefault > --- > > Key: BEAM-10025 > URL: https://issues.apache.org/jira/browse/BEAM-10025 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Hai Lu >Priority: P2 > Labels: currently-failing > Fix For: 2.22.0 > > > This is causing postcommit to fail > java.lang.AssertionError: Expected 1 successful assertions, but found 0. > Expected: is <1L> > but: was <0L> -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10054) Direct Runner execution stalls with test pipeline
[ https://issues.apache.org/jira/browse/BEAM-10054?focusedWorklogId=436076&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436076 ] ASF GitHub Bot logged work on BEAM-10054: - Author: ASF GitHub Bot Created on: 21/May/20 16:09 Start Date: 21/May/20 16:09 Worklog Time Spent: 10m Work Description: mxm commented on pull request #11777: URL: https://github.com/apache/beam/pull/11777#issuecomment-632181118 R: @rohdesamuel This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436076) Time Spent: 20m (was: 10m) > Direct Runner execution stalls with test pipeline > - > > Key: BEAM-10054 > URL: https://issues.apache.org/jira/browse/BEAM-10054 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: P2 > Time Spent: 20m > Remaining Estimate: 0h > > Internally, we have a test pipeline which runs with the DirectRunner. When > upgrading from 2.18.0 to 2.21.0 the test failed with the following exception: > {noformat} > tp = Exception('Monitor task detected a pipeline stall.',), value = None, tb > = None > def raise_(tp, value=None, tb=None): > """ > A function that matches the Python 2.x ``raise`` statement. This > allows re-raising exceptions with the cls value and traceback on > Python 2 and 3. > """ > if value is not None and isinstance(tp, Exception): > raise TypeError("instance exception may not have a separate > value") > if value is not None: > exc = tp(value) > else: > exc = tp > if exc.__traceback__ is not tb: > raise exc.with_traceback(tb) > > raise exc > E Exception: Monitor task detected a pipeline stall. > {noformat} > I was able to bisect the error. This commit introduced the failure: > https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731 > If the following conditions evaluates to False, the pipeline runs correctly: > https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731#diff-2bb845e226f3a97c0f0f737d0558c5dbR1273 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10024) Spark runner failing testOutputTimestampDefault
[ https://issues.apache.org/jira/browse/BEAM-10024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette updated BEAM-10024: - Fix Version/s: 2.22.0 > Spark runner failing testOutputTimestampDefault > --- > > Key: BEAM-10024 > URL: https://issues.apache.org/jira/browse/BEAM-10024 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: P2 > Labels: currently-failing > Fix For: 2.22.0 > > Time Spent: 50m > Remaining Estimate: 0h > > This is causing postcommit to fail > java.lang.UnsupportedOperationException: Found TimerId annotations on > org.apache.beam.sdk.transforms.ParDoTest$TimerTests$12, but DoFn cannot yet > be used with timers in the SparkRunner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10025) Samza runner failing testOutputTimestampDefault
[ https://issues.apache.org/jira/browse/BEAM-10025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113321#comment-17113321 ] Brian Hulette commented on BEAM-10025: -- This is failing on 2.22 release branch, is it a release blocker? I assume it is so I went ahead and tagged fix version 2.22.0, but I would be pleased to learn otherwise > Samza runner failing testOutputTimestampDefault > --- > > Key: BEAM-10025 > URL: https://issues.apache.org/jira/browse/BEAM-10025 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Hai Lu >Priority: P2 > Labels: currently-failing > Fix For: 2.22.0 > > > This is causing postcommit to fail > java.lang.AssertionError: Expected 1 successful assertions, but found 0. > Expected: is <1L> > but: was <0L> -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10054) Direct Runner execution stalls with test pipeline
[ https://issues.apache.org/jira/browse/BEAM-10054?focusedWorklogId=436077&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436077 ] ASF GitHub Bot logged work on BEAM-10054: - Author: ASF GitHub Bot Created on: 21/May/20 16:10 Start Date: 21/May/20 16:10 Worklog Time Spent: 10m Work Description: mxm commented on pull request #11777: URL: https://github.com/apache/beam/pull/11777#issuecomment-632181899 Please have a look @rohdesamuel if you consider the fix valid. I'm not very familiar with the Python SDK triggering code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436077) Time Spent: 0.5h (was: 20m) > Direct Runner execution stalls with test pipeline > - > > Key: BEAM-10054 > URL: https://issues.apache.org/jira/browse/BEAM-10054 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: P2 > Time Spent: 0.5h > Remaining Estimate: 0h > > Internally, we have a test pipeline which runs with the DirectRunner. When > upgrading from 2.18.0 to 2.21.0 the test failed with the following exception: > {noformat} > tp = Exception('Monitor task detected a pipeline stall.',), value = None, tb > = None > def raise_(tp, value=None, tb=None): > """ > A function that matches the Python 2.x ``raise`` statement. This > allows re-raising exceptions with the cls value and traceback on > Python 2 and 3. > """ > if value is not None and isinstance(tp, Exception): > raise TypeError("instance exception may not have a separate > value") > if value is not None: > exc = tp(value) > else: > exc = tp > if exc.__traceback__ is not tb: > raise exc.with_traceback(tb) > > raise exc > E Exception: Monitor task detected a pipeline stall. > {noformat} > I was able to bisect the error. This commit introduced the failure: > https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731 > If the following conditions evaluates to False, the pipeline runs correctly: > https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731#diff-2bb845e226f3a97c0f0f737d0558c5dbR1273 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10024) Spark runner failing testOutputTimestampDefault
[ https://issues.apache.org/jira/browse/BEAM-10024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113322#comment-17113322 ] Brian Hulette commented on BEAM-10024: -- Same question here as BEAM-10025 - is this a release blocker? > Spark runner failing testOutputTimestampDefault > --- > > Key: BEAM-10024 > URL: https://issues.apache.org/jira/browse/BEAM-10024 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: P2 > Labels: currently-failing > Fix For: 2.22.0 > > Time Spent: 50m > Remaining Estimate: 0h > > This is causing postcommit to fail > java.lang.UnsupportedOperationException: Found TimerId annotations on > org.apache.beam.sdk.transforms.ParDoTest$TimerTests$12, but DoFn cannot yet > be used with timers in the SparkRunner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9603) Support Dynamic Timer in Java SDK over FnApi
[ https://issues.apache.org/jira/browse/BEAM-9603?focusedWorklogId=436079&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436079 ] ASF GitHub Bot logged work on BEAM-9603: Author: ASF GitHub Bot Created on: 21/May/20 16:12 Start Date: 21/May/20 16:12 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #11756: URL: https://github.com/apache/beam/pull/11756#issuecomment-632182849 retest all please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436079) Time Spent: 5h 10m (was: 5h) > Support Dynamic Timer in Java SDK over FnApi > > > Key: BEAM-9603 > URL: https://issues.apache.org/jira/browse/BEAM-9603 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness >Reporter: Boyuan Zhang >Assignee: Yichi Zhang >Priority: P2 > Time Spent: 5h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
[ https://issues.apache.org/jira/browse/BEAM-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette updated BEAM-10016: - Fix Version/s: 2.22.0 > Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2 > --- > > Key: BEAM-10016 > URL: https://issues.apache.org/jira/browse/BEAM-10016 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Maximilian Michels >Priority: P2 > Fix For: 2.22.0 > > > Both beam_PostCommit_Java_PVR_Flink_Batch and > beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test > org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2. > SEVERE: Error in task code: CHAIN MapPartition (MapPartition at [6]{Values, > FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map > (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: > PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) > (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be > cast to [B > at > org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41) > at > org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56) > at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541) > at > org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109) > at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203) > at > org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
[ https://issues.apache.org/jira/browse/BEAM-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113327#comment-17113327 ] Brian Hulette commented on BEAM-10016: -- This is failing on 2.22.0 release branch. Assuming it is a release blocker? > Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2 > --- > > Key: BEAM-10016 > URL: https://issues.apache.org/jira/browse/BEAM-10016 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Maximilian Michels >Priority: P2 > Fix For: 2.22.0 > > > Both beam_PostCommit_Java_PVR_Flink_Batch and > beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test > org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2. > SEVERE: Error in task code: CHAIN MapPartition (MapPartition at [6]{Values, > FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map > (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: > PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) > (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be > cast to [B > at > org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41) > at > org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56) > at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541) > at > org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109) > at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203) > at > org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9603) Support Dynamic Timer in Java SDK over FnApi
[ https://issues.apache.org/jira/browse/BEAM-9603?focusedWorklogId=436087&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436087 ] ASF GitHub Bot logged work on BEAM-9603: Author: ASF GitHub Bot Created on: 21/May/20 16:17 Start Date: 21/May/20 16:17 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #11756: URL: https://github.com/apache/beam/pull/11756#issuecomment-632185565 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436087) Time Spent: 5h 20m (was: 5h 10m) > Support Dynamic Timer in Java SDK over FnApi > > > Key: BEAM-9603 > URL: https://issues.apache.org/jira/browse/BEAM-9603 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness >Reporter: Boyuan Zhang >Assignee: Yichi Zhang >Priority: P2 > Time Spent: 5h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10050) VideoIntelligenceIT.annotateVideoFromURINoContext is flaky
[ https://issues.apache.org/jira/browse/BEAM-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113338#comment-17113338 ] Brian Hulette commented on BEAM-10050: -- Also looks like it happened in 2.22.0 release branch validation here: https://builds.apache.org/job/beam_Release_Gradle_Build/47 > VideoIntelligenceIT.annotateVideoFromURINoContext is flaky > -- > > Key: BEAM-10050 > URL: https://issues.apache.org/jira/browse/BEAM-10050 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Brian Hulette >Assignee: Michał Walenia >Priority: P2 > Time Spent: 10m > Remaining Estimate: 0h > > I've seen this fail a few times in precommits [Example > failure|https://builds.apache.org/job/beam_PreCommit_Java_Commit/11515/] > {code} > java.lang.AssertionError: Annotate > video/ParDo(AnnotateVideoFromURI)/ParMultiDo(AnnotateVideoFromURI).output: > expected: but was: > at > org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:169) > at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:411) > at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:403) > at > org.apache.beam.sdk.extensions.ml.VideoIntelligenceIT.annotateVideoFromURINoContext(VideoIntelligenceIT.java:51) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10048) Remove "manual steps" from release guide.
[ https://issues.apache.org/jira/browse/BEAM-10048?focusedWorklogId=436089&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436089 ] ASF GitHub Bot logged work on BEAM-10048: - Author: ASF GitHub Bot Created on: 21/May/20 16:32 Start Date: 21/May/20 16:32 Worklog Time Spent: 10m Work Description: ibzib commented on a change in pull request #11764: URL: https://github.com/apache/beam/pull/11764#discussion_r428769910 ## File path: website/www/site/content/en/contribute/release-guide.md ## @@ -583,189 +582,44 @@ For this step, we recommend you using automation script to create a RC, but you 1. Stage source release into dist.apache.org dev [repo](https://dist.apache.org/repos/dist/dev/beam/). 1. Stage,sign and hash python binaries into dist.apache.ord dev repo python dir 1. Stage SDK docker images to [docker hub Apache organization](https://hub.docker.com/search?q=apache%2Fbeam&type=image). - 1. Create a PR to update beam and beam-site, changes includes: + 1. Create a PR to update beam-site, changes includes: * Copy python doc into beam-site * Copy java doc into beam-site - * Update release version into [_config.yml](https://github.com/apache/beam/blob/master/website/_config.yml). Tasks you need to do manually - 1. Add new release into `website/src/get-started/downloads.md`. - 1. Update last release download links in `website/src/get-started/downloads.md`. - 1. Update `website/src/.htaccess` to redirect to the new version. - 1. Build and stage python wheels. + 1. Verify the script worked. + 1. Verify that the source and Python binaries are present in [dist.apache.org](https://dist.apache.org/repos/dist/dev/beam). + 1. Verify Docker images are published. How to find images: + 1. Visit [https://hub.docker.com/u/apache](https://hub.docker.com/search?q=apache%2Fbeam&type=image) + 2. Visit each repository and navigate to *tags* tab. + 3. Verify images are pushed with tags: ${RELEASE}_rc{RC_NUM} + 1. Verify that third party licenses are included in Docker containers by logging in to the images. + - For Python SDK images, there should be around 80 ~ 100 dependencies. + Please note that dependencies for the SDKs with different Python versions vary. + Need to verify all Python images by replacing `${ver}` with each supported Python version `X.Y`. + ``` + docker run -it --entrypoint=/bin/bash apache/beam_python${ver}_sdk:${RELEASE}_rc{RC_NUM} + ls -al /opt/apache/beam/third_party_licenses/ | wc -l + ``` + - For Java SDK images, there should be around 1400 dependencies. + ``` + docker run -it --entrypoint=/bin/bash apache/beam_java_sdk:${RELEASE}_rc{RC_NUM} + ls -al /opt/apache/beam/third_party_licenses/ | wc -l + ``` 1. Publish staging artifacts - 1. Go to the staging repo to close the staging repository on [Apache Nexus](https://repository.apache.org/#stagingRepositories). + 1. Go to the staging repo to close the staging repository on [Apache Nexus](https://repository.apache.org/#stagingRepositories). 1. When prompted for a description, enter “Apache Beam, version X, release candidate Y”. - - -### (Alternative) Run all steps manually Review comment: Because it's mostly an exact copy of build_release_candidate.sh. Any part of this block that wasn't part of build_release_candidate.sh was moved into the "manual steps" section. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436089) Time Spent: 50m (was: 40m) > Remove "manual steps" from release guide. > - > > Key: BEAM-10048 > URL: https://issues.apache.org/jira/browse/BEAM-10048 > Project: Beam > Issue Type: Improvement > Components: build-system, website >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: P2 > Time Spent: 50m > Remaining Estimate: 0h > > release-guide.md contains most of the same instructions as > build_release_candidate.sh ("(Alternative) Run all steps manually"). This is > not ideal: > - Mirroring the instructions in release-guide.md doesn't add any value. > - Every single change to the process requires two identical changes to each > file, and this makes it unnecessarily difficult to keep the two in sync. > - All the extra instructions make release-guide.md harder to rea
[jira] [Work logged] (BEAM-10048) Remove "manual steps" from release guide.
[ https://issues.apache.org/jira/browse/BEAM-10048?focusedWorklogId=436090&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436090 ] ASF GitHub Bot logged work on BEAM-10048: - Author: ASF GitHub Bot Created on: 21/May/20 16:33 Start Date: 21/May/20 16:33 Worklog Time Spent: 10m Work Description: ibzib commented on a change in pull request #11764: URL: https://github.com/apache/beam/pull/11764#discussion_r428770502 ## File path: release/src/main/scripts/build_release_candidate.sh ## @@ -113,7 +113,7 @@ if [[ $confirmation = "y" ]]; then echo "-Staging Java Artifacts into Maven---" Review comment: I wasn't sure the best way to do that. Maybe we can add it in a later PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436090) Time Spent: 1h (was: 50m) > Remove "manual steps" from release guide. > - > > Key: BEAM-10048 > URL: https://issues.apache.org/jira/browse/BEAM-10048 > Project: Beam > Issue Type: Improvement > Components: build-system, website >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: P2 > Time Spent: 1h > Remaining Estimate: 0h > > release-guide.md contains most of the same instructions as > build_release_candidate.sh ("(Alternative) Run all steps manually"). This is > not ideal: > - Mirroring the instructions in release-guide.md doesn't add any value. > - Every single change to the process requires two identical changes to each > file, and this makes it unnecessarily difficult to keep the two in sync. > - All the extra instructions make release-guide.md harder to read, obscuring > information that the release manager actually does need to know. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
[ https://issues.apache.org/jira/browse/BEAM-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113351#comment-17113351 ] Maximilian Michels commented on BEAM-10016: --- I think yes, if someone wants to have a look, please take the issue. Otherwise I have a look in the next days. > Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2 > --- > > Key: BEAM-10016 > URL: https://issues.apache.org/jira/browse/BEAM-10016 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Maximilian Michels >Priority: P2 > Fix For: 2.22.0 > > > Both beam_PostCommit_Java_PVR_Flink_Batch and > beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test > org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2. > SEVERE: Error in task code: CHAIN MapPartition (MapPartition at [6]{Values, > FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map > (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: > PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) > (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be > cast to [B > at > org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41) > at > org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56) > at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541) > at > org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109) > at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203) > at > org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10048) Remove "manual steps" from release guide.
[ https://issues.apache.org/jira/browse/BEAM-10048?focusedWorklogId=436092&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436092 ] ASF GitHub Bot logged work on BEAM-10048: - Author: ASF GitHub Bot Created on: 21/May/20 16:40 Start Date: 21/May/20 16:40 Worklog Time Spent: 10m Work Description: ibzib commented on a change in pull request #11764: URL: https://github.com/apache/beam/pull/11764#discussion_r428774392 ## File path: website/www/site/content/en/contribute/release-guide.md ## @@ -583,189 +582,44 @@ For this step, we recommend you using automation script to create a RC, but you 1. Stage source release into dist.apache.org dev [repo](https://dist.apache.org/repos/dist/dev/beam/). 1. Stage,sign and hash python binaries into dist.apache.ord dev repo python dir 1. Stage SDK docker images to [docker hub Apache organization](https://hub.docker.com/search?q=apache%2Fbeam&type=image). - 1. Create a PR to update beam and beam-site, changes includes: + 1. Create a PR to update beam-site, changes includes: * Copy python doc into beam-site * Copy java doc into beam-site - * Update release version into [_config.yml](https://github.com/apache/beam/blob/master/website/_config.yml). Tasks you need to do manually - 1. Add new release into `website/src/get-started/downloads.md`. - 1. Update last release download links in `website/src/get-started/downloads.md`. - 1. Update `website/src/.htaccess` to redirect to the new version. - 1. Build and stage python wheels. + 1. Verify the script worked. + 1. Verify that the source and Python binaries are present in [dist.apache.org](https://dist.apache.org/repos/dist/dev/beam). + 1. Verify Docker images are published. How to find images: + 1. Visit [https://hub.docker.com/u/apache](https://hub.docker.com/search?q=apache%2Fbeam&type=image) + 2. Visit each repository and navigate to *tags* tab. + 3. Verify images are pushed with tags: ${RELEASE}_rc{RC_NUM} + 1. Verify that third party licenses are included in Docker containers by logging in to the images. + - For Python SDK images, there should be around 80 ~ 100 dependencies. + Please note that dependencies for the SDKs with different Python versions vary. + Need to verify all Python images by replacing `${ver}` with each supported Python version `X.Y`. + ``` + docker run -it --entrypoint=/bin/bash apache/beam_python${ver}_sdk:${RELEASE}_rc{RC_NUM} + ls -al /opt/apache/beam/third_party_licenses/ | wc -l + ``` + - For Java SDK images, there should be around 1400 dependencies. + ``` + docker run -it --entrypoint=/bin/bash apache/beam_java_sdk:${RELEASE}_rc{RC_NUM} + ls -al /opt/apache/beam/third_party_licenses/ | wc -l + ``` 1. Publish staging artifacts - 1. Go to the staging repo to close the staging repository on [Apache Nexus](https://repository.apache.org/#stagingRepositories). + 1. Go to the staging repo to close the staging repository on [Apache Nexus](https://repository.apache.org/#stagingRepositories). Review comment: I added more detailed instructions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436092) Time Spent: 1h 10m (was: 1h) > Remove "manual steps" from release guide. > - > > Key: BEAM-10048 > URL: https://issues.apache.org/jira/browse/BEAM-10048 > Project: Beam > Issue Type: Improvement > Components: build-system, website >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: P2 > Time Spent: 1h 10m > Remaining Estimate: 0h > > release-guide.md contains most of the same instructions as > build_release_candidate.sh ("(Alternative) Run all steps manually"). This is > not ideal: > - Mirroring the instructions in release-guide.md doesn't add any value. > - Every single change to the process requires two identical changes to each > file, and this makes it unnecessarily difficult to keep the two in sync. > - All the extra instructions make release-guide.md harder to read, obscuring > information that the release manager actually does need to know. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9946) Enhance Partition transform to provide partitionfn with SideInputs
[ https://issues.apache.org/jira/browse/BEAM-9946?focusedWorklogId=436093&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436093 ] ASF GitHub Bot logged work on BEAM-9946: Author: ASF GitHub Bot Created on: 21/May/20 16:47 Start Date: 21/May/20 16:47 Worklog Time Spent: 10m Work Description: apilloud commented on a change in pull request #11682: URL: https://github.com/apache/beam/pull/11682#discussion_r428778641 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Partition.java ## @@ -85,7 +141,14 @@ * @throws IllegalArgumentException if {@code numPartitions <= 0} */ public static Partition of(int numPartitions, PartitionFn partitionFn) { -return new Partition<>(new PartitionDoFn(numPartitions, partitionFn)); + +Contextful ctfFn = +Contextful.fn( +(T element, Contextful.Fn.Context c) -> +partitionFn.partitionFor(element, numPartitions), +Requirements.empty()); +Object aClass = partitionFn; Review comment: The statement `Object aClass = partitionFn;` has no effect. You can just pass `partitionFn` directly into the function. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436093) Remaining Estimate: 94h 20m (was: 94.5h) Time Spent: 1h 40m (was: 1.5h) > Enhance Partition transform to provide partitionfn with SideInputs > -- > > Key: BEAM-9946 > URL: https://issues.apache.org/jira/browse/BEAM-9946 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Darshan Jani >Assignee: Darshan Jani >Priority: P2 > Original Estimate: 96h > Time Spent: 1h 40m > Remaining Estimate: 94h 20m > > Currently _Partition_ transform can partition a collection into n collections > based on only _element_ value in _PartitionFn_ to decide on which partition a > particular element belongs to. > {code:java} > public interface PartitionFn extends Serializable { > int partitionFor(T elem, int numPartitions); > } > public static Partition of(int numPartitions, PartitionFn > partitionFn) { > return new Partition<>(new PartitionDoFn(numPartitions, partitionFn)); > } > {code} > It will be useful to introduce new API with additional _sideInputs_ provided > to partition function. User will be able to write logic to use both _element_ > value and _sideInputs_ to decide on which partition a particular element > belongs to. > Option-1: Proposed new API: > {code:java} > public interface PartitionWithSideInputsFn extends Serializable { > int partitionFor(T elem, int numPartitions, Context c); > } > public static Partition of(int numPartitions, > PartitionWithSideInputsFn partitionFn, Requirements requirements) { > ... > } > {code} > User can use any of the two APIs as per there partitioning function logic. > Option-2: Redesign old API with Builder Pattern which can provide optionally > a _Requirements_ with _sideInputs._ Deprecate old API. > {code:java} > // using sideviews > Partition.into(numberOfPartitions).via( > fn( > (input,c) -> { > // use c.sideInput(view) > // use input > // return partitionnumber > },requiresSideInputs(view)) > ) > // without using sideviews > Partition.into(numberOfPartitions).via( > fn((input,c) -> { > // use input > // return partitionnumber > }) > ) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10054) Direct Runner execution stalls with test pipeline
[ https://issues.apache.org/jira/browse/BEAM-10054?focusedWorklogId=436094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436094 ] ASF GitHub Bot logged work on BEAM-10054: - Author: ASF GitHub Bot Created on: 21/May/20 16:47 Start Date: 21/May/20 16:47 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #11777: URL: https://github.com/apache/beam/pull/11777#discussion_r428778698 ## File path: sdks/python/apache_beam/transforms/trigger.py ## @@ -1368,7 +1368,7 @@ def _output( if timestamp is None: # If no watermark hold was set, output at end of window. timestamp = window.max_timestamp() -elif input_watermark < window.end and self.trigger_fn.has_ontime_pane(): +elif output_watermark < window.end and self.trigger_fn.has_ontime_pane(): Review comment: Note that this is the fix in question. Please check @rohdesamuel if that makes sense. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436094) Time Spent: 40m (was: 0.5h) > Direct Runner execution stalls with test pipeline > - > > Key: BEAM-10054 > URL: https://issues.apache.org/jira/browse/BEAM-10054 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: P2 > Time Spent: 40m > Remaining Estimate: 0h > > Internally, we have a test pipeline which runs with the DirectRunner. When > upgrading from 2.18.0 to 2.21.0 the test failed with the following exception: > {noformat} > tp = Exception('Monitor task detected a pipeline stall.',), value = None, tb > = None > def raise_(tp, value=None, tb=None): > """ > A function that matches the Python 2.x ``raise`` statement. This > allows re-raising exceptions with the cls value and traceback on > Python 2 and 3. > """ > if value is not None and isinstance(tp, Exception): > raise TypeError("instance exception may not have a separate > value") > if value is not None: > exc = tp(value) > else: > exc = tp > if exc.__traceback__ is not tb: > raise exc.with_traceback(tb) > > raise exc > E Exception: Monitor task detected a pipeline stall. > {noformat} > I was able to bisect the error. This commit introduced the failure: > https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731 > If the following conditions evaluates to False, the pipeline runs correctly: > https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731#diff-2bb845e226f3a97c0f0f737d0558c5dbR1273 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9946) Enhance Partition transform to provide partitionfn with SideInputs
[ https://issues.apache.org/jira/browse/BEAM-9946?focusedWorklogId=436095&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436095 ] ASF GitHub Bot logged work on BEAM-9946: Author: ASF GitHub Bot Created on: 21/May/20 16:48 Start Date: 21/May/20 16:48 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #11682: URL: https://github.com/apache/beam/pull/11682#issuecomment-632218345 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436095) Remaining Estimate: 94h 10m (was: 94h 20m) Time Spent: 1h 50m (was: 1h 40m) > Enhance Partition transform to provide partitionfn with SideInputs > -- > > Key: BEAM-9946 > URL: https://issues.apache.org/jira/browse/BEAM-9946 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Darshan Jani >Assignee: Darshan Jani >Priority: P2 > Original Estimate: 96h > Time Spent: 1h 50m > Remaining Estimate: 94h 10m > > Currently _Partition_ transform can partition a collection into n collections > based on only _element_ value in _PartitionFn_ to decide on which partition a > particular element belongs to. > {code:java} > public interface PartitionFn extends Serializable { > int partitionFor(T elem, int numPartitions); > } > public static Partition of(int numPartitions, PartitionFn > partitionFn) { > return new Partition<>(new PartitionDoFn(numPartitions, partitionFn)); > } > {code} > It will be useful to introduce new API with additional _sideInputs_ provided > to partition function. User will be able to write logic to use both _element_ > value and _sideInputs_ to decide on which partition a particular element > belongs to. > Option-1: Proposed new API: > {code:java} > public interface PartitionWithSideInputsFn extends Serializable { > int partitionFor(T elem, int numPartitions, Context c); > } > public static Partition of(int numPartitions, > PartitionWithSideInputsFn partitionFn, Requirements requirements) { > ... > } > {code} > User can use any of the two APIs as per there partitioning function logic. > Option-2: Redesign old API with Builder Pattern which can provide optionally > a _Requirements_ with _sideInputs._ Deprecate old API. > {code:java} > // using sideviews > Partition.into(numberOfPartitions).via( > fn( > (input,c) -> { > // use c.sideInput(view) > // use input > // return partitionnumber > },requiresSideInputs(view)) > ) > // without using sideviews > Partition.into(numberOfPartitions).via( > fn((input,c) -> { > // use input > // return partitionnumber > }) > ) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10052) check hash and avoid duplicates when uploading artifact in Python Dataflow Runner
[ https://issues.apache.org/jira/browse/BEAM-10052?focusedWorklogId=436098&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436098 ] ASF GitHub Bot logged work on BEAM-10052: - Author: ASF GitHub Bot Created on: 21/May/20 16:50 Start Date: 21/May/20 16:50 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11771: URL: https://github.com/apache/beam/pull/11771#issuecomment-632219614 WDYT about https://github.com/ihji/beam/pull/1 ? (only change is to Environments.java other changes should go away if you rebase) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436098) Time Spent: 0.5h (was: 20m) > check hash and avoid duplicates when uploading artifact in Python Dataflow > Runner > - > > Key: BEAM-10052 > URL: https://issues.apache.org/jira/browse/BEAM-10052 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Time Spent: 0.5h > Remaining Estimate: 0h > > xlang pipeline could have many duplicated jars. it would be great if we check > hash and avoid duplicate uploads. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10055) Add --region to 3 of the python examples
[ https://issues.apache.org/jira/browse/BEAM-10055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Romer updated BEAM-10055: - Priority: P3 (was: P2) > Add --region to 3 of the python examples > > > Key: BEAM-10055 > URL: https://issues.apache.org/jira/browse/BEAM-10055 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ted Romer >Priority: P3 > Original Estimate: 1h > Remaining Estimate: 1h > > Proposed fix: > {color:#FF}[https://github.com/tedromer/beam/compare/tedromer:ef811fe...tedromer:1f39865]{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10055) Add --region to 3 of the python examples
Ted Romer created BEAM-10055: Summary: Add --region to 3 of the python examples Key: BEAM-10055 URL: https://issues.apache.org/jira/browse/BEAM-10055 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Ted Romer Proposed fix: {color:#FF}[https://github.com/tedromer/beam/compare/tedromer:ef811fe...tedromer:1f39865]{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
[ https://issues.apache.org/jira/browse/BEAM-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113372#comment-17113372 ] Kyle Weaver commented on BEAM-10016: This test was newly added, so this is probably not a regression. It does seem like an important issue we should look into, but not sure if it should be considered a blocker. > Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2 > --- > > Key: BEAM-10016 > URL: https://issues.apache.org/jira/browse/BEAM-10016 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Maximilian Michels >Priority: P2 > Fix For: 2.22.0 > > > Both beam_PostCommit_Java_PVR_Flink_Batch and > beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test > org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2. > SEVERE: Error in task code: CHAIN MapPartition (MapPartition at [6]{Values, > FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map > (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: > PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) > (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be > cast to [B > at > org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41) > at > org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56) > at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541) > at > org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109) > at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203) > at > org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10024) Spark runner failing testOutputTimestampDefault
[ https://issues.apache.org/jira/browse/BEAM-10024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113375#comment-17113375 ] Kyle Weaver commented on BEAM-10024: No. A new test was added without being properly annotated, causing it to run where it should have been skipped. > Spark runner failing testOutputTimestampDefault > --- > > Key: BEAM-10024 > URL: https://issues.apache.org/jira/browse/BEAM-10024 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: P2 > Labels: currently-failing > Fix For: 2.22.0 > > Time Spent: 50m > Remaining Estimate: 0h > > This is causing postcommit to fail > java.lang.UnsupportedOperationException: Found TimerId annotations on > org.apache.beam.sdk.transforms.ParDoTest$TimerTests$12, but DoFn cannot yet > be used with timers in the SparkRunner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10024) Spark runner failing testOutputTimestampDefault
[ https://issues.apache.org/jira/browse/BEAM-10024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-10024: --- Fix Version/s: (was: 2.22.0) > Spark runner failing testOutputTimestampDefault > --- > > Key: BEAM-10024 > URL: https://issues.apache.org/jira/browse/BEAM-10024 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: P2 > Labels: currently-failing > Time Spent: 50m > Remaining Estimate: 0h > > This is causing postcommit to fail > java.lang.UnsupportedOperationException: Found TimerId annotations on > org.apache.beam.sdk.transforms.ParDoTest$TimerTests$12, but DoFn cannot yet > be used with timers in the SparkRunner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10056) Side Input Validation too tight, doesn't allow CoGBK
[ https://issues.apache.org/jira/browse/BEAM-10056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-10056: --- Status: Open (was: Triage Needed) > Side Input Validation too tight, doesn't allow CoGBK > > > Key: BEAM-10056 > URL: https://issues.apache.org/jira/browse/BEAM-10056 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: P1 > > The following doesn't pass validation, though it should as it's a valid > signature for ParDo accepting a PCollection *clientHistory>> > func (fn *writer) StartBundle(ctx context.Context) error > func (fn *writer) ProcessElement( > ctx context.Context, > key string, > iter1, iter2 func(**clientHistory) bool) > func (fn *writer) FinishBundle(ctx context.Context) > It returns an error: > Missing side inputs in the StartBundle method of a DoFn. If side inputs are > present in ProcessElement those side inputs must also be present in > StartBundle. > Full error: > inserting ParDo in scope root: > graph.AsDoFn: for Fn named <...pii...>/userpackage.writer: > side inputs expected in method StartBundle [recovered] > panic: Missing side inputs in the StartBundle method of a DoFn. If > side inputs are present in ProcessElement those side inputs must also be > present in StartBundle. > Full error: > inserting ParDo in scope root: > graph.AsDoFn: for Fn named <...pii...>/userpackage.writer: > side inputs expected in method StartBundle > This is happening in the input unaware validation, which means it needs to be > loosened, and validated elsewhere. > https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/graph/fn.go#L527 > There are "sibling" cases for the DoFn signature > func (fn *writer) StartBundle(context.Context, side func(**clientHistory) > bool) error > func (fn *writer) ProcessElement( > ctx context.Context, > key string, > iter, side func(**clientHistory) bool) > func (fn *writer) FinishBundle( context.Context, side, func(**clientHistory) > bool) > and > func (fn *writer) StartBundle(context.Context, side1, side2 > func(**clientHistory) bool) error > func (fn *writer) ProcessElement( > ctx context.Context, > key string, > side1, side2 func(**clientHistory) bool) > func (fn *writer) FinishBundle( context.Context, side1, side2 > func(**clientHistory) bool) > Would be for > with <*clientHistory> on the > side, and > with <*clientHistory> and <*clientHistory> on the side > respectively. > Which would only be determinable fully with the input, and should provide a > clear error when PCollection binding is occuring. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10056) Side Input Validation too tight, doesn't allow CoGBK
Robert Burke created BEAM-10056: --- Summary: Side Input Validation too tight, doesn't allow CoGBK Key: BEAM-10056 URL: https://issues.apache.org/jira/browse/BEAM-10056 Project: Beam Issue Type: Bug Components: sdk-go Reporter: Robert Burke Assignee: Robert Burke The following doesn't pass validation, though it should as it's a valid signature for ParDo accepting a PCollection> func (fn *writer) StartBundle(ctx context.Context) error func (fn *writer) ProcessElement( ctx context.Context, key string, iter1, iter2 func(**clientHistory) bool) func (fn *writer) FinishBundle(ctx context.Context) It returns an error: Missing side inputs in the StartBundle method of a DoFn. If side inputs are present in ProcessElement those side inputs must also be present in StartBundle. Full error: inserting ParDo in scope root: graph.AsDoFn: for Fn named <...pii...>/userpackage.writer: side inputs expected in method StartBundle [recovered] panic: Missing side inputs in the StartBundle method of a DoFn. If side inputs are present in ProcessElement those side inputs must also be present in StartBundle. Full error: inserting ParDo in scope root: graph.AsDoFn: for Fn named <...pii...>/userpackage.writer: side inputs expected in method StartBundle This is happening in the input unaware validation, which means it needs to be loosened, and validated elsewhere. https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/graph/fn.go#L527 There are "sibling" cases for the DoFn signature func (fn *writer) StartBundle(context.Context, side func(**clientHistory) bool) error func (fn *writer) ProcessElement( ctx context.Context, key string, iter, side func(**clientHistory) bool) func (fn *writer) FinishBundle( context.Context, side, func(**clientHistory) bool) and func (fn *writer) StartBundle(context.Context, side1, side2 func(**clientHistory) bool) error func (fn *writer) ProcessElement( ctx context.Context, key string, side1, side2 func(**clientHistory) bool) func (fn *writer) FinishBundle( context.Context, side1, side2 func(**clientHistory) bool) Would be for > with <*clientHistory> on the side, and with <*clientHistory> and <*clientHistory> on the side respectively. Which would only be determinable fully with the input, and should provide a clear error when PCollection binding is occuring. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10025) Samza runner failing testOutputTimestampDefault
[ https://issues.apache.org/jira/browse/BEAM-10025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113376#comment-17113376 ] Kyle Weaver commented on BEAM-10025: I'm guessing the cause is similar to BEAM-10024, so no, it should not be a release blocker. > Samza runner failing testOutputTimestampDefault > --- > > Key: BEAM-10025 > URL: https://issues.apache.org/jira/browse/BEAM-10025 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Hai Lu >Priority: P2 > Labels: currently-failing > Fix For: 2.22.0 > > > This is causing postcommit to fail > java.lang.AssertionError: Expected 1 successful assertions, but found 0. > Expected: is <1L> > but: was <0L> -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10025) Samza runner failing testOutputTimestampDefault
[ https://issues.apache.org/jira/browse/BEAM-10025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-10025: --- Fix Version/s: (was: 2.22.0) > Samza runner failing testOutputTimestampDefault > --- > > Key: BEAM-10025 > URL: https://issues.apache.org/jira/browse/BEAM-10025 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Hai Lu >Priority: P2 > Labels: currently-failing > > This is causing postcommit to fail > java.lang.AssertionError: Expected 1 successful assertions, but found 0. > Expected: is <1L> > but: was <0L> -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=436100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436100 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 21/May/20 16:56 Start Date: 21/May/20 16:56 Worklog Time Spent: 10m Work Description: chamikaramj merged pull request #11757: URL: https://github.com/apache/beam/pull/11757 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436100) Time Spent: 22h (was: 21h 50m) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: P1 > Time Spent: 22h > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
[ https://issues.apache.org/jira/browse/BEAM-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113380#comment-17113380 ] Maximilian Michels commented on BEAM-10016: --- Just to clarify, since this is only failing for Beam 2.22.0, we don't have to block the 2.21.0 release. > Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2 > --- > > Key: BEAM-10016 > URL: https://issues.apache.org/jira/browse/BEAM-10016 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Maximilian Michels >Priority: P2 > Fix For: 2.22.0 > > > Both beam_PostCommit_Java_PVR_Flink_Batch and > beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test > org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2. > SEVERE: Error in task code: CHAIN MapPartition (MapPartition at [6]{Values, > FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map > (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: > PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) > (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be > cast to [B > at > org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41) > at > org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56) > at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541) > at > org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109) > at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203) > at > org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9692) Clean Python DataflowRunner to use portable pipelines
[ https://issues.apache.org/jira/browse/BEAM-9692?focusedWorklogId=436101&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436101 ] ASF GitHub Bot logged work on BEAM-9692: Author: ASF GitHub Bot Created on: 21/May/20 16:57 Start Date: 21/May/20 16:57 Worklog Time Spent: 10m Work Description: robertwb merged pull request #11503: URL: https://github.com/apache/beam/pull/11503 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436101) Time Spent: 4.5h (was: 4h 20m) > Clean Python DataflowRunner to use portable pipelines > - > > Key: BEAM-9692 > URL: https://issues.apache.org/jira/browse/BEAM-9692 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: P2 > Time Spent: 4.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
[ https://issues.apache.org/jira/browse/BEAM-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113383#comment-17113383 ] Kyle Weaver commented on BEAM-10016: I'm pretty sure it's only failing for Beam 2.22.0 because it was added after the 2.21.0 release cut. I haven't tried it but I suspect the test would fail on previous Beam releases if backported. > Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2 > --- > > Key: BEAM-10016 > URL: https://issues.apache.org/jira/browse/BEAM-10016 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Maximilian Michels >Priority: P2 > Fix For: 2.22.0 > > > Both beam_PostCommit_Java_PVR_Flink_Batch and > beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test > org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2. > SEVERE: Error in task code: CHAIN MapPartition (MapPartition at [6]{Values, > FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map > (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: > PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) > (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be > cast to [B > at > org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41) > at > org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56) > at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541) > at > org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109) > at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203) > at > org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.
[ https://issues.apache.org/jira/browse/BEAM-9577?focusedWorklogId=436110&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436110 ] ASF GitHub Bot logged work on BEAM-9577: Author: ASF GitHub Bot Created on: 21/May/20 17:24 Start Date: 21/May/20 17:24 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #11521: URL: https://github.com/apache/beam/pull/11521#issuecomment-632237798 @robertwb I think this broke Java PVR Spark Batch. First failure is here: https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/2887/ Not sure if there is a jira already This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436110) Time Spent: 23h 10m (was: 23h) > Update artifact staging and retrieval protocols to be dependency aware. > --- > > Key: BEAM-9577 > URL: https://issues.apache.org/jira/browse/BEAM-9577 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: P2 > Time Spent: 23h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9971) beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)
[ https://issues.apache.org/jira/browse/BEAM-9971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113410#comment-17113410 ] Brian Hulette commented on BEAM-9971: - Pretty sure the culprit is https://github.com/apache/beam/pull/11521. This was part of the first failure: https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/2887/ cc: [~robertwb] > beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file) > -- > > Key: BEAM-9971 > URL: https://issues.apache.org/jira/browse/BEAM-9971 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: P2 > Labels: portability-spark > > This happens sporadically. One time the issue affected 14 tests; another time > it affected 112 tests. > java.lang.RuntimeException: The Runner experienced the following error during > execution: > java.io.FileNotFoundException: > /tmp/spark-0812a463-8d6b-4c97-be4b-de43baf67108/userFiles-b90ca2e1-2041-442d-ae78-c8e9c30bff49/beam-runners-spark-2.22.0-SNAPSHOT.jar > (No such file or directory) > at > org.apache.beam.runners.portability.JobServicePipelineResult.propagateErrors(JobServicePipelineResult.java:165) > at > org.apache.beam.runners.portability.JobServicePipelineResult.waitUntilFinish(JobServicePipelineResult.java:110) > at > org.apache.beam.runners.portability.testing.TestPortableRunner.run(TestPortableRunner.java:83) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317) > at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350) > at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331) > at > org.apache.beam.runners.core.metrics.MetricsPusherTest.pushesUserMetrics(MetricsPusherTest.java:70) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) > at org.junit.runners.ParentRunner.run(ParentRunner.java:412) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at sun.reflect.GeneratedMethodAccessor161.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDis
[jira] [Commented] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113411#comment-17113411 ] Jacob Ferriero commented on BEAM-9468: -- Yes! > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: P3 > Time Spent: 53.5h > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacob Ferriero updated BEAM-9468: - Fix Version/s: 2.22.0 > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: P3 > Fix For: 2.22.0 > > Time Spent: 53.5h > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9971) beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)
[ https://issues.apache.org/jira/browse/BEAM-9971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113412#comment-17113412 ] Brian Hulette commented on BEAM-9971: - This is happening on the release branch and really harms the signal of this test suite. Latest attempt had 134/204 tests fail: https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch_PR/79/testReport/ > beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file) > -- > > Key: BEAM-9971 > URL: https://issues.apache.org/jira/browse/BEAM-9971 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: P2 > Labels: portability-spark > > This happens sporadically. One time the issue affected 14 tests; another time > it affected 112 tests. > java.lang.RuntimeException: The Runner experienced the following error during > execution: > java.io.FileNotFoundException: > /tmp/spark-0812a463-8d6b-4c97-be4b-de43baf67108/userFiles-b90ca2e1-2041-442d-ae78-c8e9c30bff49/beam-runners-spark-2.22.0-SNAPSHOT.jar > (No such file or directory) > at > org.apache.beam.runners.portability.JobServicePipelineResult.propagateErrors(JobServicePipelineResult.java:165) > at > org.apache.beam.runners.portability.JobServicePipelineResult.waitUntilFinish(JobServicePipelineResult.java:110) > at > org.apache.beam.runners.portability.testing.TestPortableRunner.run(TestPortableRunner.java:83) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317) > at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350) > at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331) > at > org.apache.beam.runners.core.metrics.MetricsPusherTest.pushesUserMetrics(MetricsPusherTest.java:70) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) > at org.junit.runners.ParentRunner.run(ParentRunner.java:412) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at sun.reflect.GeneratedMethodAccessor161.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.Conte
[jira] [Resolved] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacob Ferriero resolved BEAM-9468. -- Resolution: Fixed > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: P3 > Fix For: 2.22.0 > > Time Spent: 53.5h > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9831) HL7v2IO Improvements
[ https://issues.apache.org/jira/browse/BEAM-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacob Ferriero resolved BEAM-9831. -- Fix Version/s: 2.22.0 Resolution: Fixed > HL7v2IO Improvements > > > Key: BEAM-9831 > URL: https://issues.apache.org/jira/browse/BEAM-9831 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: P2 > Fix For: 2.22.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > # HL7v2MessageCoder constructor should be public for use by end users > # Currently HL7v2IO.ListHL7v2Messages blocks on pagination through list > messages results before emitting any output data elements (due to high fan > out from a single input element). We should add early firings so that > downstream processing can proceed on early pages while later pages are still > being scrolled through. > # We should drop all output only fields of HL7v2Message and only keep data > and labels when calling ingestMessages, rather than expecting the user to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
[ https://issues.apache.org/jira/browse/BEAM-9856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacob Ferriero resolved BEAM-9856. -- Fix Version/s: 2.22.0 Resolution: Fixed > HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization > -- > > Key: BEAM-9856 > URL: https://issues.apache.org/jira/browse/BEAM-9856 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: P3 > Fix For: 2.22.0 > > Time Spent: 10.5h > Remaining Estimate: 0h > > Currently the List Messages API paginates through in a single ProcessElement > Call. > However we could get a restriction based on createTime using Messages.List > filter and orderby. > > This is inline with the future roadmap of HL7v2 bulk export API becomes > available that should allow splitting on (e.g. create time dimension). > Leveraging this bulk export might be a future optimization to explore. > > This could take one of two forms: > 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make > optimization the runner's problem, potentially unnecessarily complex for this > use case ) > 2. static splitting on some time partition e.g. finding the earliest > createTime and emitting a PCollection of 1 hour partitions and paginating > through each hour of data w/ in the time frame that the store spans, in a > separate ProcessElement. (easy to implement but will likely have hot keys / > stragglers based on "busy hours") > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.
[ https://issues.apache.org/jira/browse/BEAM-9577?focusedWorklogId=436113&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436113 ] ASF GitHub Bot logged work on BEAM-9577: Author: ASF GitHub Bot Created on: 21/May/20 17:31 Start Date: 21/May/20 17:31 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #11521: URL: https://github.com/apache/beam/pull/11521#issuecomment-632241122 [BEAM-9971](https://issues.apache.org/jira/browse/BEAM-9971) was filed to track This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436113) Time Spent: 23h 20m (was: 23h 10m) > Update artifact staging and retrieval protocols to be dependency aware. > --- > > Key: BEAM-9577 > URL: https://issues.apache.org/jira/browse/BEAM-9577 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: P2 > Time Spent: 23h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9971) beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)
[ https://issues.apache.org/jira/browse/BEAM-9971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette updated BEAM-9971: Fix Version/s: 2.22.0 > beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file) > -- > > Key: BEAM-9971 > URL: https://issues.apache.org/jira/browse/BEAM-9971 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: P2 > Labels: portability-spark > Fix For: 2.22.0 > > > This happens sporadically. One time the issue affected 14 tests; another time > it affected 112 tests. > java.lang.RuntimeException: The Runner experienced the following error during > execution: > java.io.FileNotFoundException: > /tmp/spark-0812a463-8d6b-4c97-be4b-de43baf67108/userFiles-b90ca2e1-2041-442d-ae78-c8e9c30bff49/beam-runners-spark-2.22.0-SNAPSHOT.jar > (No such file or directory) > at > org.apache.beam.runners.portability.JobServicePipelineResult.propagateErrors(JobServicePipelineResult.java:165) > at > org.apache.beam.runners.portability.JobServicePipelineResult.waitUntilFinish(JobServicePipelineResult.java:110) > at > org.apache.beam.runners.portability.testing.TestPortableRunner.run(TestPortableRunner.java:83) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317) > at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350) > at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331) > at > org.apache.beam.runners.core.metrics.MetricsPusherTest.pushesUserMetrics(MetricsPusherTest.java:70) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) > at org.junit.runners.ParentRunner.run(ParentRunner.java:412) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at sun.reflect.GeneratedMethodAccessor161.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) > at > org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) > at com.su
[jira] [Assigned] (BEAM-10055) Add --region to 3 of the python examples
[ https://issues.apache.org/jira/browse/BEAM-10055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Altay reassigned BEAM-10055: -- Assignee: Ted Romer > Add --region to 3 of the python examples > > > Key: BEAM-10055 > URL: https://issues.apache.org/jira/browse/BEAM-10055 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ted Romer >Assignee: Ted Romer >Priority: P3 > Original Estimate: 1h > Remaining Estimate: 1h > > Proposed fix: > {color:#FF}[https://github.com/tedromer/beam/compare/tedromer:ef811fe...tedromer:1f39865]{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10055) Add --region to 3 of the python examples
[ https://issues.apache.org/jira/browse/BEAM-10055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113421#comment-17113421 ] Kyle Weaver commented on BEAM-10055: Sorry I missed those. Please feel free to add me as a reviewer when you have a PR ready. > Add --region to 3 of the python examples > > > Key: BEAM-10055 > URL: https://issues.apache.org/jira/browse/BEAM-10055 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ted Romer >Assignee: Ted Romer >Priority: P3 > Original Estimate: 1h > Remaining Estimate: 1h > > Proposed fix: > {color:#FF}[https://github.com/tedromer/beam/compare/tedromer:ef811fe...tedromer:1f39865]{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9971) beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)
[ https://issues.apache.org/jira/browse/BEAM-9971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113420#comment-17113420 ] Brian Hulette commented on BEAM-9971: - Marking this as a 2.22 release blocker since it causes around 50% of the VR tests to flake. > beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file) > -- > > Key: BEAM-9971 > URL: https://issues.apache.org/jira/browse/BEAM-9971 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: P2 > Labels: portability-spark > Fix For: 2.22.0 > > > This happens sporadically. One time the issue affected 14 tests; another time > it affected 112 tests. > java.lang.RuntimeException: The Runner experienced the following error during > execution: > java.io.FileNotFoundException: > /tmp/spark-0812a463-8d6b-4c97-be4b-de43baf67108/userFiles-b90ca2e1-2041-442d-ae78-c8e9c30bff49/beam-runners-spark-2.22.0-SNAPSHOT.jar > (No such file or directory) > at > org.apache.beam.runners.portability.JobServicePipelineResult.propagateErrors(JobServicePipelineResult.java:165) > at > org.apache.beam.runners.portability.JobServicePipelineResult.waitUntilFinish(JobServicePipelineResult.java:110) > at > org.apache.beam.runners.portability.testing.TestPortableRunner.run(TestPortableRunner.java:83) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317) > at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350) > at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331) > at > org.apache.beam.runners.core.metrics.MetricsPusherTest.pushesUserMetrics(MetricsPusherTest.java:70) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) > at org.junit.runners.ParentRunner.run(ParentRunner.java:412) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at sun.reflect.GeneratedMethodAccessor161.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) > at > org.grad
[jira] [Commented] (BEAM-9971) beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file)
[ https://issues.apache.org/jira/browse/BEAM-9971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113425#comment-17113425 ] Kyle Weaver commented on BEAM-9971: --- Thanks for pointing out the Java dependencies change Brian -- I somehow completely missed that connection. I'll fix it as soon as I can. > beam_PostCommit_Java_PVR_Spark_Batch flakes (no such file) > -- > > Key: BEAM-9971 > URL: https://issues.apache.org/jira/browse/BEAM-9971 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: P2 > Labels: portability-spark > Fix For: 2.22.0 > > > This happens sporadically. One time the issue affected 14 tests; another time > it affected 112 tests. > java.lang.RuntimeException: The Runner experienced the following error during > execution: > java.io.FileNotFoundException: > /tmp/spark-0812a463-8d6b-4c97-be4b-de43baf67108/userFiles-b90ca2e1-2041-442d-ae78-c8e9c30bff49/beam-runners-spark-2.22.0-SNAPSHOT.jar > (No such file or directory) > at > org.apache.beam.runners.portability.JobServicePipelineResult.propagateErrors(JobServicePipelineResult.java:165) > at > org.apache.beam.runners.portability.JobServicePipelineResult.waitUntilFinish(JobServicePipelineResult.java:110) > at > org.apache.beam.runners.portability.testing.TestPortableRunner.run(TestPortableRunner.java:83) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317) > at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350) > at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331) > at > org.apache.beam.runners.core.metrics.MetricsPusherTest.pushesUserMetrics(MetricsPusherTest.java:70) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) > at org.junit.runners.ParentRunner.run(ParentRunner.java:412) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at sun.reflect.GeneratedMethodAccessor161.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoade
[jira] [Updated] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette updated BEAM-8889: Fix Version/s: 2.22.0 > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: P2 > Labels: gcs > Fix For: 2.22.0 > > Original Estimate: 168h > Time Spent: 36h 20m > Remaining Estimate: 131h 40m > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9825) Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll
[ https://issues.apache.org/jira/browse/BEAM-9825?focusedWorklogId=436120&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436120 ] ASF GitHub Bot logged work on BEAM-9825: Author: ASF GitHub Bot Created on: 21/May/20 17:46 Start Date: 21/May/20 17:46 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #11610: URL: https://github.com/apache/beam/pull/11610#issuecomment-632248973 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436120) Remaining Estimate: 86h (was: 86h 10m) Time Spent: 10h (was: 9h 50m) > Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll > -- > > Key: BEAM-9825 > URL: https://issues.apache.org/jira/browse/BEAM-9825 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Darshan Jani >Assignee: Darshan Jani >Priority: P2 > Original Estimate: 96h > Time Spent: 10h > Remaining Estimate: 86h > > I'd like to propose following new high-level transforms. > * Intersect > Compute the intersection between elements of two PCollection. > Given _leftCollection_ and _rightCollection_, this transform returns a > collection containing elements that common to both _leftCollection_ and > _rightCollection_ > > * Except > Compute the difference between elements of two PCollection. > Given _leftCollection_ and _rightCollection_, this transform returns a > collection containing elements that are in _leftCollection_ but not in > _rightCollection_ > * Union > Find the elements that are either of two PCollection. > Implement IntersetAll, ExceptAll and UnionAll variants of transforms. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9825) Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll
[ https://issues.apache.org/jira/browse/BEAM-9825?focusedWorklogId=436121&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436121 ] ASF GitHub Bot logged work on BEAM-9825: Author: ASF GitHub Bot Created on: 21/May/20 17:47 Start Date: 21/May/20 17:47 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #11610: URL: https://github.com/apache/beam/pull/11610#issuecomment-632249476 run Java Precommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436121) Remaining Estimate: 85h 50m (was: 86h) Time Spent: 10h 10m (was: 10h) > Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll > -- > > Key: BEAM-9825 > URL: https://issues.apache.org/jira/browse/BEAM-9825 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Darshan Jani >Assignee: Darshan Jani >Priority: P2 > Original Estimate: 96h > Time Spent: 10h 10m > Remaining Estimate: 85h 50m > > I'd like to propose following new high-level transforms. > * Intersect > Compute the intersection between elements of two PCollection. > Given _leftCollection_ and _rightCollection_, this transform returns a > collection containing elements that common to both _leftCollection_ and > _rightCollection_ > > * Except > Compute the difference between elements of two PCollection. > Given _leftCollection_ and _rightCollection_, this transform returns a > collection containing elements that are in _leftCollection_ but not in > _rightCollection_ > * Union > Find the elements that are either of two PCollection. > Implement IntersetAll, ExceptAll and UnionAll variants of transforms. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113432#comment-17113432 ] Brian Hulette commented on BEAM-8889: - Is this complete with https://github.com/apache/beam/pull/11651? > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: P2 > Labels: gcs > Fix For: 2.22.0 > > Original Estimate: 168h > Time Spent: 36h 20m > Remaining Estimate: 131h 40m > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9825) Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll
[ https://issues.apache.org/jira/browse/BEAM-9825?focusedWorklogId=436122&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436122 ] ASF GitHub Bot logged work on BEAM-9825: Author: ASF GitHub Bot Created on: 21/May/20 17:48 Start Date: 21/May/20 17:48 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #11610: URL: https://github.com/apache/beam/pull/11610#issuecomment-632249853 Tests retriggered. And it becomes a really big PR :-) Nice work! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436122) Remaining Estimate: 85h 40m (was: 85h 50m) Time Spent: 10h 20m (was: 10h 10m) > Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll > -- > > Key: BEAM-9825 > URL: https://issues.apache.org/jira/browse/BEAM-9825 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Darshan Jani >Assignee: Darshan Jani >Priority: P2 > Original Estimate: 96h > Time Spent: 10h 20m > Remaining Estimate: 85h 40m > > I'd like to propose following new high-level transforms. > * Intersect > Compute the intersection between elements of two PCollection. > Given _leftCollection_ and _rightCollection_, this transform returns a > collection containing elements that common to both _leftCollection_ and > _rightCollection_ > > * Except > Compute the difference between elements of two PCollection. > Given _leftCollection_ and _rightCollection_, this transform returns a > collection containing elements that are in _leftCollection_ but not in > _rightCollection_ > * Union > Find the elements that are either of two PCollection. > Implement IntersetAll, ExceptAll and UnionAll variants of transforms. -- This message was sent by Atlassian Jira (v8.3.4#803005)