[jira] [Reopened] (BEAM-5709) Tests in BeamFnControlServiceTest are flaky.
[ https://issues.apache.org/jira/browse/BEAM-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reopened BEAM-5709: --- > Tests in BeamFnControlServiceTest are flaky. > > > Key: BEAM-5709 > URL: https://issues.apache.org/jira/browse/BEAM-5709 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Minor > Fix For: Not applicable > > Time Spent: 1.5h > Remaining Estimate: 0h > > https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlServiceTest.java > Tests for BeamFnControlService are currently flaky. The test attempts to > verify that onCompleted was called on the mock streams, but that function > gets called on a separate thread, so occasionally the function will not have > been called yet, despite the server being shut down. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5709) Tests in BeamFnControlServiceTest are flaky.
[ https://issues.apache.org/jira/browse/BEAM-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles resolved BEAM-5709. --- Resolution: Fixed Fix Version/s: Not applicable > Tests in BeamFnControlServiceTest are flaky. > > > Key: BEAM-5709 > URL: https://issues.apache.org/jira/browse/BEAM-5709 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Minor > Fix For: Not applicable > > Time Spent: 1.5h > Remaining Estimate: 0h > > https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlServiceTest.java > Tests for BeamFnControlService are currently flaky. The test attempts to > verify that onCompleted was called on the mock streams, but that function > gets called on a separate thread, so occasionally the function will not have > been called yet, despite the server being shut down. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5098) Combine.Globally::asSingletonView clears side inputs
[ https://issues.apache.org/jira/browse/BEAM-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646829#comment-16646829 ] Kenneth Knowles commented on BEAM-5098: --- [~mpedersen] this one might be a quick fix. Would you be interested in contributing, perhaps? > Combine.Globally::asSingletonView clears side inputs > > > Key: BEAM-5098 > URL: https://issues.apache.org/jira/browse/BEAM-5098 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.5.0 >Reporter: Mike Pedersen >Assignee: Kenneth Knowles >Priority: Critical > Labels: beginner, starter > > It seems like calling .asSingletonView on Combine.Globally clears all side > inputs. Take this code for example: > > {code:java} > public class Main { > public static void main(String[] args) { > PipelineOptions options = PipelineOptionsFactory.create(); > Pipeline p = Pipeline.create(options); > PCollection a = p.apply(Create.of(1, 2, 3)); > PCollectionView b = > p.apply(Create.of(10)).apply(View.asSingleton()); > a > .apply(Combine.globally(new > CombineWithContext.CombineFnWithContext() { > @Override > public Integer > createAccumulator(CombineWithContext.Context c) { > return c.sideInput(b); > } > @Override > public Integer addInput(Integer accumulator, Integer > input, CombineWithContext.Context c) { > return accumulator + input; > } > @Override > public Integer mergeAccumulators(Iterable > accumulators, CombineWithContext.Context c) { > int sum = 0; > for (int i : accumulators) { > sum += i; > } > return sum; > } > @Override > public Integer extractOutput(Integer accumulator, > CombineWithContext.Context c) { > return accumulator; > } > @Override > public Integer defaultValue() { > return 0; > } > }).withSideInputs(b).asSingletonView()); > p.run().waitUntilFinish(); > } > }{code} > This fails with the following exception: > {code:java} > Exception in thread "main" > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.IllegalArgumentException: calling sideInput() with unknown view > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:349) > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:319) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:210) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:66) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297) > at Main.main(Main.java:287) > Caused by: java.lang.IllegalArgumentException: calling sideInput() with > unknown view > at > org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.sideInput(SimpleDoFnRunner.java:212) > at > org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.access$900(SimpleDoFnRunner.java:69) > at > org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner$DoFnProcessContext.sideInput(SimpleDoFnRunner.java:489) > at > org.apache.beam.sdk.transforms.Combine$GroupedValues$1$1.sideInput(Combine.java:2137) > at Main$1.createAccumulator(Main.java:258) > at Main$1.createAccumulator(Main.java:255) > at > org.apache.beam.sdk.transforms.CombineWithContext$CombineFnWithContext.apply(CombineWithContext.java:120) > at > org.apache.beam.sdk.transforms.Combine$GroupedValues$1.processElement(Combine.java:2129){code} > But if you change > {code:java} > .withSideInputs(b).asSingletonView()){code} > to > {code:java} > .withSideInputs(b)).apply(View.asSingleton()){code} > then it works just fine. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5098) Combine.Globally::asSingletonView clears side inputs
[ https://issues.apache.org/jira/browse/BEAM-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-5098: -- Component/s: (was: beam-model) sdk-java-core > Combine.Globally::asSingletonView clears side inputs > > > Key: BEAM-5098 > URL: https://issues.apache.org/jira/browse/BEAM-5098 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.5.0 >Reporter: Mike Pedersen >Assignee: Kenneth Knowles >Priority: Critical > Labels: beginner, starter > > It seems like calling .asSingletonView on Combine.Globally clears all side > inputs. Take this code for example: > > {code:java} > public class Main { > public static void main(String[] args) { > PipelineOptions options = PipelineOptionsFactory.create(); > Pipeline p = Pipeline.create(options); > PCollection a = p.apply(Create.of(1, 2, 3)); > PCollectionView b = > p.apply(Create.of(10)).apply(View.asSingleton()); > a > .apply(Combine.globally(new > CombineWithContext.CombineFnWithContext() { > @Override > public Integer > createAccumulator(CombineWithContext.Context c) { > return c.sideInput(b); > } > @Override > public Integer addInput(Integer accumulator, Integer > input, CombineWithContext.Context c) { > return accumulator + input; > } > @Override > public Integer mergeAccumulators(Iterable > accumulators, CombineWithContext.Context c) { > int sum = 0; > for (int i : accumulators) { > sum += i; > } > return sum; > } > @Override > public Integer extractOutput(Integer accumulator, > CombineWithContext.Context c) { > return accumulator; > } > @Override > public Integer defaultValue() { > return 0; > } > }).withSideInputs(b).asSingletonView()); > p.run().waitUntilFinish(); > } > }{code} > This fails with the following exception: > {code:java} > Exception in thread "main" > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.IllegalArgumentException: calling sideInput() with unknown view > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:349) > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:319) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:210) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:66) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297) > at Main.main(Main.java:287) > Caused by: java.lang.IllegalArgumentException: calling sideInput() with > unknown view > at > org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.sideInput(SimpleDoFnRunner.java:212) > at > org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.access$900(SimpleDoFnRunner.java:69) > at > org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner$DoFnProcessContext.sideInput(SimpleDoFnRunner.java:489) > at > org.apache.beam.sdk.transforms.Combine$GroupedValues$1$1.sideInput(Combine.java:2137) > at Main$1.createAccumulator(Main.java:258) > at Main$1.createAccumulator(Main.java:255) > at > org.apache.beam.sdk.transforms.CombineWithContext$CombineFnWithContext.apply(CombineWithContext.java:120) > at > org.apache.beam.sdk.transforms.Combine$GroupedValues$1.processElement(Combine.java:2129){code} > But if you change > {code:java} > .withSideInputs(b).asSingletonView()){code} > to > {code:java} > .withSideInputs(b)).apply(View.asSingleton()){code} > then it works just fine. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5098) Combine.Globally::asSingletonView clears side inputs
[ https://issues.apache.org/jira/browse/BEAM-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-5098: -- Labels: beginner starter (was: ) > Combine.Globally::asSingletonView clears side inputs > > > Key: BEAM-5098 > URL: https://issues.apache.org/jira/browse/BEAM-5098 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.5.0 >Reporter: Mike Pedersen >Assignee: Kenneth Knowles >Priority: Critical > Labels: beginner, starter > > It seems like calling .asSingletonView on Combine.Globally clears all side > inputs. Take this code for example: > > {code:java} > public class Main { > public static void main(String[] args) { > PipelineOptions options = PipelineOptionsFactory.create(); > Pipeline p = Pipeline.create(options); > PCollection a = p.apply(Create.of(1, 2, 3)); > PCollectionView b = > p.apply(Create.of(10)).apply(View.asSingleton()); > a > .apply(Combine.globally(new > CombineWithContext.CombineFnWithContext() { > @Override > public Integer > createAccumulator(CombineWithContext.Context c) { > return c.sideInput(b); > } > @Override > public Integer addInput(Integer accumulator, Integer > input, CombineWithContext.Context c) { > return accumulator + input; > } > @Override > public Integer mergeAccumulators(Iterable > accumulators, CombineWithContext.Context c) { > int sum = 0; > for (int i : accumulators) { > sum += i; > } > return sum; > } > @Override > public Integer extractOutput(Integer accumulator, > CombineWithContext.Context c) { > return accumulator; > } > @Override > public Integer defaultValue() { > return 0; > } > }).withSideInputs(b).asSingletonView()); > p.run().waitUntilFinish(); > } > }{code} > This fails with the following exception: > {code:java} > Exception in thread "main" > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.IllegalArgumentException: calling sideInput() with unknown view > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:349) > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:319) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:210) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:66) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297) > at Main.main(Main.java:287) > Caused by: java.lang.IllegalArgumentException: calling sideInput() with > unknown view > at > org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.sideInput(SimpleDoFnRunner.java:212) > at > org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.access$900(SimpleDoFnRunner.java:69) > at > org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner$DoFnProcessContext.sideInput(SimpleDoFnRunner.java:489) > at > org.apache.beam.sdk.transforms.Combine$GroupedValues$1$1.sideInput(Combine.java:2137) > at Main$1.createAccumulator(Main.java:258) > at Main$1.createAccumulator(Main.java:255) > at > org.apache.beam.sdk.transforms.CombineWithContext$CombineFnWithContext.apply(CombineWithContext.java:120) > at > org.apache.beam.sdk.transforms.Combine$GroupedValues$1.processElement(Combine.java:2129){code} > But if you change > {code:java} > .withSideInputs(b).asSingletonView()){code} > to > {code:java} > .withSideInputs(b)).apply(View.asSingleton()){code} > then it works just fine. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-3656) Port DatastoreV1Test off DoFnTester
[ https://issues.apache.org/jira/browse/BEAM-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-3656: - Assignee: Sergei Puzanov > Port DatastoreV1Test off DoFnTester > --- > > Key: BEAM-3656 > URL: https://issues.apache.org/jira/browse/BEAM-3656 > Project: Beam > Issue Type: Sub-task > Components: io-java-gcp >Reporter: Kenneth Knowles >Assignee: Sergei Puzanov >Priority: Major > Labels: beginner, newbie, starter > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3656) Port DatastoreV1Test off DoFnTester
[ https://issues.apache.org/jira/browse/BEAM-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646698#comment-16646698 ] Kenneth Knowles commented on BEAM-3656: --- Yes. Thanks! > Port DatastoreV1Test off DoFnTester > --- > > Key: BEAM-3656 > URL: https://issues.apache.org/jira/browse/BEAM-3656 > Project: Beam > Issue Type: Sub-task > Components: io-java-gcp >Reporter: Kenneth Knowles >Assignee: Sergei Puzanov >Priority: Major > Labels: beginner, newbie, starter > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5156) Apache Beam on dataflow runner can't find Tensorflow for workers
[ https://issues.apache.org/jira/browse/BEAM-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-5156: - Assignee: Ankur Goenka (was: Robert Bradshaw) > Apache Beam on dataflow runner can't find Tensorflow for workers > > > Key: BEAM-5156 > URL: https://issues.apache.org/jira/browse/BEAM-5156 > Project: Beam > Issue Type: Bug > Components: sdk-py-core > Environment: google cloud compute instance running linux >Reporter: Robert Bradshaw >Assignee: Ankur Goenka >Priority: Major > Fix For: 2.5.0, 2.6.0 > > > Adding serialized tensorflow model to apache beam pipeline with python sdk > but it can not find any version of tensorflow when applied to dataflow runner > although it is not a problem locally. Tried various versions of tensorflow > from 1.6 to 1.10. I thought it might be a conflicting package some where so I > removed all other packages and tried to just install tensorflow and same > problem. > Could not find a version that satisfies the requirement tensorflow==1.6.0 > (from -r reqtest.txt (line 59)) (from versions: )No matching distribution > found for tensorflow==1.6.0 (from -r reqtest.txt (line 59)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5156) Apache Beam on dataflow runner can't find Tensorflow for workers
[ https://issues.apache.org/jira/browse/BEAM-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-5156: -- Reporter: Thomas Johns (was: Robert Bradshaw) > Apache Beam on dataflow runner can't find Tensorflow for workers > > > Key: BEAM-5156 > URL: https://issues.apache.org/jira/browse/BEAM-5156 > Project: Beam > Issue Type: Bug > Components: sdk-py-core > Environment: google cloud compute instance running linux >Reporter: Thomas Johns >Assignee: Ankur Goenka >Priority: Major > Fix For: 2.5.0, 2.6.0 > > > Adding serialized tensorflow model to apache beam pipeline with python sdk > but it can not find any version of tensorflow when applied to dataflow runner > although it is not a problem locally. Tried various versions of tensorflow > from 1.6 to 1.10. I thought it might be a conflicting package some where so I > removed all other packages and tried to just install tensorflow and same > problem. > Could not find a version that satisfies the requirement tensorflow==1.6.0 > (from -r reqtest.txt (line 59)) (from versions: )No matching distribution > found for tensorflow==1.6.0 (from -r reqtest.txt (line 59)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5156) Apache Beam on dataflow runner can't find Tensorflow for workers
[ https://issues.apache.org/jira/browse/BEAM-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-5156: - Assignee: Robert Bradshaw (was: Kenneth Knowles) > Apache Beam on dataflow runner can't find Tensorflow for workers > > > Key: BEAM-5156 > URL: https://issues.apache.org/jira/browse/BEAM-5156 > Project: Beam > Issue Type: Bug > Components: sdk-py-core > Environment: google cloud compute instance running linux >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Fix For: 2.5.0, 2.6.0 > > > Adding serialized tensorflow model to apache beam pipeline with python sdk > but it can not find any version of tensorflow when applied to dataflow runner > although it is not a problem locally. Tried various versions of tensorflow > from 1.6 to 1.10. I thought it might be a conflicting package some where so I > removed all other packages and tried to just install tensorflow and same > problem. > Could not find a version that satisfies the requirement tensorflow==1.6.0 > (from -r reqtest.txt (line 59)) (from versions: )No matching distribution > found for tensorflow==1.6.0 (from -r reqtest.txt (line 59)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5162) Document Metrics API for users
[ https://issues.apache.org/jira/browse/BEAM-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-5162: -- Priority: Major (was: Minor) > Document Metrics API for users > -- > > Key: BEAM-5162 > URL: https://issues.apache.org/jira/browse/BEAM-5162 > Project: Beam > Issue Type: Task > Components: sdk-java-core, website >Reporter: Maximilian Michels >Assignee: Kenneth Knowles >Priority: Major > > The Metrics API is currently only documented in Beam's JavaDocs. A > complementary user documentation with examples would be desirable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5336) Support for a Dask runner
[ https://issues.apache.org/jira/browse/BEAM-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-5336: - Assignee: (was: Kenneth Knowles) > Support for a Dask runner > - > > Key: BEAM-5336 > URL: https://issues.apache.org/jira/browse/BEAM-5336 > Project: Beam > Issue Type: Wish > Components: runner-ideas > Environment: Python >Reporter: Georvic Tur >Priority: Trivial > > Adding support for a Dask runner is currently under consideration? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5336) Support for a Dask runner
[ https://issues.apache.org/jira/browse/BEAM-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645941#comment-16645941 ] Kenneth Knowles commented on BEAM-5336: --- Contributions welcome :) > Support for a Dask runner > - > > Key: BEAM-5336 > URL: https://issues.apache.org/jira/browse/BEAM-5336 > Project: Beam > Issue Type: Wish > Components: runner-ideas > Environment: Python >Reporter: Georvic Tur >Assignee: Kenneth Knowles >Priority: Trivial > > Adding support for a Dask runner is currently under consideration? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5162) Document Metrics API for users
[ https://issues.apache.org/jira/browse/BEAM-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645940#comment-16645940 ] Kenneth Knowles commented on BEAM-5162: --- I'm not sure of a good owner, but I bumped the priority. > Document Metrics API for users > -- > > Key: BEAM-5162 > URL: https://issues.apache.org/jira/browse/BEAM-5162 > Project: Beam > Issue Type: Task > Components: sdk-java-core, website >Reporter: Maximilian Michels >Priority: Major > > The Metrics API is currently only documented in Beam's JavaDocs. A > complementary user documentation with examples would be desirable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5162) Document Metrics API for users
[ https://issues.apache.org/jira/browse/BEAM-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-5162: - Assignee: (was: Kenneth Knowles) > Document Metrics API for users > -- > > Key: BEAM-5162 > URL: https://issues.apache.org/jira/browse/BEAM-5162 > Project: Beam > Issue Type: Task > Components: sdk-java-core, website >Reporter: Maximilian Michels >Priority: Major > > The Metrics API is currently only documented in Beam's JavaDocs. A > complementary user documentation with examples would be desirable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5098) Combine.Globally::asSingletonView clears side inputs
[ https://issues.apache.org/jira/browse/BEAM-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-5098: -- Priority: Critical (was: Major) > Combine.Globally::asSingletonView clears side inputs > > > Key: BEAM-5098 > URL: https://issues.apache.org/jira/browse/BEAM-5098 > Project: Beam > Issue Type: Bug > Components: beam-model >Affects Versions: 2.5.0 >Reporter: Mike Pedersen >Assignee: Kenneth Knowles >Priority: Critical > > It seems like calling .asSingletonView on Combine.Globally clears all side > inputs. Take this code for example: > > {code:java} > public class Main { > public static void main(String[] args) { > PipelineOptions options = PipelineOptionsFactory.create(); > Pipeline p = Pipeline.create(options); > PCollection a = p.apply(Create.of(1, 2, 3)); > PCollectionView b = > p.apply(Create.of(10)).apply(View.asSingleton()); > a > .apply(Combine.globally(new > CombineWithContext.CombineFnWithContext() { > @Override > public Integer > createAccumulator(CombineWithContext.Context c) { > return c.sideInput(b); > } > @Override > public Integer addInput(Integer accumulator, Integer > input, CombineWithContext.Context c) { > return accumulator + input; > } > @Override > public Integer mergeAccumulators(Iterable > accumulators, CombineWithContext.Context c) { > int sum = 0; > for (int i : accumulators) { > sum += i; > } > return sum; > } > @Override > public Integer extractOutput(Integer accumulator, > CombineWithContext.Context c) { > return accumulator; > } > @Override > public Integer defaultValue() { > return 0; > } > }).withSideInputs(b).asSingletonView()); > p.run().waitUntilFinish(); > } > }{code} > This fails with the following exception: > {code:java} > Exception in thread "main" > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.IllegalArgumentException: calling sideInput() with unknown view > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:349) > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:319) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:210) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:66) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297) > at Main.main(Main.java:287) > Caused by: java.lang.IllegalArgumentException: calling sideInput() with > unknown view > at > org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.sideInput(SimpleDoFnRunner.java:212) > at > org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.access$900(SimpleDoFnRunner.java:69) > at > org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner$DoFnProcessContext.sideInput(SimpleDoFnRunner.java:489) > at > org.apache.beam.sdk.transforms.Combine$GroupedValues$1$1.sideInput(Combine.java:2137) > at Main$1.createAccumulator(Main.java:258) > at Main$1.createAccumulator(Main.java:255) > at > org.apache.beam.sdk.transforms.CombineWithContext$CombineFnWithContext.apply(CombineWithContext.java:120) > at > org.apache.beam.sdk.transforms.Combine$GroupedValues$1.processElement(Combine.java:2129){code} > But if you change > {code:java} > .withSideInputs(b).asSingletonView()){code} > to > {code:java} > .withSideInputs(b)).apply(View.asSingleton()){code} > then it works just fine. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5156) Apache Beam on dataflow runner can't find Tensorflow for workers
[ https://issues.apache.org/jira/browse/BEAM-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-5156: -- Reporter: Robert Bradshaw (was: Thomas Johns) > Apache Beam on dataflow runner can't find Tensorflow for workers > > > Key: BEAM-5156 > URL: https://issues.apache.org/jira/browse/BEAM-5156 > Project: Beam > Issue Type: Bug > Components: sdk-py-core > Environment: google cloud compute instance running linux >Reporter: Robert Bradshaw >Assignee: Kenneth Knowles >Priority: Major > Fix For: 2.5.0, 2.6.0 > > > Adding serialized tensorflow model to apache beam pipeline with python sdk > but it can not find any version of tensorflow when applied to dataflow runner > although it is not a problem locally. Tried various versions of tensorflow > from 1.6 to 1.10. I thought it might be a conflicting package some where so I > removed all other packages and tried to just install tensorflow and same > problem. > Could not find a version that satisfies the requirement tensorflow==1.6.0 > (from -r reqtest.txt (line 59)) (from versions: )No matching distribution > found for tensorflow==1.6.0 (from -r reqtest.txt (line 59)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5156) Apache Beam on dataflow runner can't find Tensorflow for workers
[ https://issues.apache.org/jira/browse/BEAM-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-5156: -- Component/s: (was: beam-model) sdk-py-core > Apache Beam on dataflow runner can't find Tensorflow for workers > > > Key: BEAM-5156 > URL: https://issues.apache.org/jira/browse/BEAM-5156 > Project: Beam > Issue Type: Bug > Components: sdk-py-core > Environment: google cloud compute instance running linux >Reporter: Robert Bradshaw >Assignee: Kenneth Knowles >Priority: Major > Fix For: 2.5.0, 2.6.0 > > > Adding serialized tensorflow model to apache beam pipeline with python sdk > but it can not find any version of tensorflow when applied to dataflow runner > although it is not a problem locally. Tried various versions of tensorflow > from 1.6 to 1.10. I thought it might be a conflicting package some where so I > removed all other packages and tried to just install tensorflow and same > problem. > Could not find a version that satisfies the requirement tensorflow==1.6.0 > (from -r reqtest.txt (line 59)) (from versions: )No matching distribution > found for tensorflow==1.6.0 (from -r reqtest.txt (line 59)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-4845) Nexmark suites do not compile
[ https://issues.apache.org/jira/browse/BEAM-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles resolved BEAM-4845. --- Resolution: Fixed Fix Version/s: Not applicable > Nexmark suites do not compile > - > > Key: BEAM-4845 > URL: https://issues.apache.org/jira/browse/BEAM-4845 > Project: Beam > Issue Type: Bug > Components: examples-nexmark >Reporter: Lukasz Gajowy >Assignee: Kenneth Knowles >Priority: Major > Fix For: Not applicable > > Time Spent: 1h 40m > Remaining Estimate: 0h > > This is due to this PR: [https://github.com/apache/beam/pull/5341.] Some > interfaces/classes had their visibility changed and this caused nexmark to > fail at compilation phase. > [https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_Nexmark_Direct/112/console] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-2567) Port triggers design doc to a contributor technical reference
[ https://issues.apache.org/jira/browse/BEAM-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles closed BEAM-2567. - Resolution: Won't Fix Fix Version/s: Not applicable Sink triggers are the way to go - no value is proselytizing this. > Port triggers design doc to a contributor technical reference > - > > Key: BEAM-2567 > URL: https://issues.apache.org/jira/browse/BEAM-2567 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > Fix For: Not applicable > > > There is a fairly old doc at https://s.apache.org/beam-triggers that could be > a useful reference doc for contributors. Since we don't catalog these docs > anywhere, it should be surfaced in a useful form. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5176) FailOnWarnings behave differently between CLI and Intellij build
[ https://issues.apache.org/jira/browse/BEAM-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645909#comment-16645909 ] Kenneth Knowles commented on BEAM-5176: --- This is blocking me today so I will take it for a bit and report if I make any progress or if I have to give up. > FailOnWarnings behave differently between CLI and Intellij build > - > > Key: BEAM-5176 > URL: https://issues.apache.org/jira/browse/BEAM-5176 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Etienne Chauchot >Assignee: Kenneth Knowles >Priority: Major > > In command line the build passes but fails on the IDE because of warnings. > To make it pass I had to put false in failOnWarnings in ApplyJavaNature -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5176) FailOnWarnings behave differently between CLI and Intellij build
[ https://issues.apache.org/jira/browse/BEAM-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-5176: - Assignee: Kenneth Knowles (was: Luke Cwik) > FailOnWarnings behave differently between CLI and Intellij build > - > > Key: BEAM-5176 > URL: https://issues.apache.org/jira/browse/BEAM-5176 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Etienne Chauchot >Assignee: Kenneth Knowles >Priority: Major > > In command line the build passes but fails on the IDE because of warnings. > To make it pass I had to put false in failOnWarnings in ApplyJavaNature -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner
[ https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643951#comment-16643951 ] Kenneth Knowles edited comment on BEAM-5690 at 10/9/18 8:10 PM: Yes, that is the issue. Here is the thread: https://lists.apache.org/thread.html/5a0a4317e5bbb66bc3012704ae1b17cd6dd5b9cac51fb95365b95153@%3Cuser.beam.apache.org%3E was (Author: kenn): Yes, that is the issue. Here is the thread: lhttps://lists.apache.org/thread.html/5a0a4317e5bbb66bc3012704ae1b17cd6dd5b9cac51fb95365b95153@%3Cuser.beam.apache.org%3E > Issue with GroupByKey in BeamSql using SparkRunner > -- > > Key: BEAM-5690 > URL: https://issues.apache.org/jira/browse/BEAM-5690 > Project: Beam > Issue Type: Task > Components: runner-spark >Reporter: Kenneth Knowles >Priority: Major > > Reported on user@ > {quote}We are trying to setup a pipeline with using BeamSql and the trigger > used is default (AfterWatermark crosses the window). > Below is the pipeline: > >KafkaSource (KafkaIO) >---> Windowing (FixedWindow 1min) >---> BeamSql >---> KafkaSink (KafkaIO) > > We are using Spark Runner for this. > The BeamSql query is: > {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code} > We are grouping by Col3 which is a string. It can hold values string[0-9]. > > The records are getting emitted out at 1 min to kafka sink, but the output > record in kafka is not as expected. > Below is the output observed: (WST and WET are indicators for window start > time and window end time) > {code} > {"count_col1":1,"Col3":"string5","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":3,"Col3":"string7","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":2,"Col3":"string8","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":1,"Col3":"string2","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":1,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 0} > {code} > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner
[ https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643951#comment-16643951 ] Kenneth Knowles commented on BEAM-5690: --- Yes, that is the issue. Here is the thread: lhttps://lists.apache.org/thread.html/5a0a4317e5bbb66bc3012704ae1b17cd6dd5b9cac51fb95365b95153@%3Cuser.beam.apache.org%3E > Issue with GroupByKey in BeamSql using SparkRunner > -- > > Key: BEAM-5690 > URL: https://issues.apache.org/jira/browse/BEAM-5690 > Project: Beam > Issue Type: Task > Components: runner-spark >Reporter: Kenneth Knowles >Priority: Major > > Reported on user@ > {quote}We are trying to setup a pipeline with using BeamSql and the trigger > used is default (AfterWatermark crosses the window). > Below is the pipeline: > >KafkaSource (KafkaIO) >---> Windowing (FixedWindow 1min) >---> BeamSql >---> KafkaSink (KafkaIO) > > We are using Spark Runner for this. > The BeamSql query is: > {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code} > We are grouping by Col3 which is a string. It can hold values string[0-9]. > > The records are getting emitted out at 1 min to kafka sink, but the output > record in kafka is not as expected. > Below is the output observed: (WST and WET are indicators for window start > time and window end time) > {code} > {"count_col1":1,"Col3":"string5","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":3,"Col3":"string7","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":2,"Col3":"string8","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":1,"Col3":"string2","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":1,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 0} > {code} > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner
[ https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643682#comment-16643682 ] Kenneth Knowles commented on BEAM-5690: --- CC [~kedin] [~apilloud] [~xumingming] [~mingmxu] [~amaliujia] Since it is not reproduced in the Flink runner or Direct runner, the SQL implementation of GROUP BY is probably triggering some latent bug in the Spark runner's streaming mode. > Issue with GroupByKey in BeamSql using SparkRunner > -- > > Key: BEAM-5690 > URL: https://issues.apache.org/jira/browse/BEAM-5690 > Project: Beam > Issue Type: Task > Components: runner-spark >Reporter: Kenneth Knowles >Priority: Major > > Reported on user@ > {quote}We are trying to setup a pipeline with using BeamSql and the trigger > used is default (AfterWatermark crosses the window). > Below is the pipeline: > >KafkaSource (KafkaIO) >---> Windowing (FixedWindow 1min) >---> BeamSql >---> KafkaSink (KafkaIO) > > We are using Spark Runner for this. > The BeamSql query is: > {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code} > We are grouping by Col3 which is a string. It can hold values string[0-9]. > > The records are getting emitted out at 1 min to kafka sink, but the output > record in kafka is not as expected. > Below is the output observed: (WST and WET are indicators for window start > time and window end time) > {code} > {"count_col1":1,"Col3":"string5","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":3,"Col3":"string7","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":2,"Col3":"string8","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":1,"Col3":"string2","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":1,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 0} > {code} > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner
[ https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-5690: - Assignee: (was: Amit Sela) > Issue with GroupByKey in BeamSql using SparkRunner > -- > > Key: BEAM-5690 > URL: https://issues.apache.org/jira/browse/BEAM-5690 > Project: Beam > Issue Type: Task > Components: runner-spark >Reporter: Kenneth Knowles >Priority: Major > > Reported on user@ > {quote}We are trying to setup a pipeline with using BeamSql and the trigger > used is default (AfterWatermark crosses the window). > Below is the pipeline: > >KafkaSource (KafkaIO) >---> Windowing (FixedWindow 1min) >---> BeamSql >---> KafkaSink (KafkaIO) > > We are using Spark Runner for this. > The BeamSql query is: > {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code} > We are grouping by Col3 which is a string. It can hold values string[0-9]. > > The records are getting emitted out at 1 min to kafka sink, but the output > record in kafka is not as expected. > Below is the output observed: (WST and WET are indicators for window start > time and window end time) > {code} > {"count_col1":1,"Col3":"string5","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":3,"Col3":"string7","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":2,"Col3":"string8","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":1,"Col3":"string2","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":1,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 0} > {code} > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner
Kenneth Knowles created BEAM-5690: - Summary: Issue with GroupByKey in BeamSql using SparkRunner Key: BEAM-5690 URL: https://issues.apache.org/jira/browse/BEAM-5690 Project: Beam Issue Type: Task Components: runner-spark Reporter: Kenneth Knowles Assignee: Amit Sela Reported on user@ {quote}We are trying to setup a pipeline with using BeamSql and the trigger used is default (AfterWatermark crosses the window). Below is the pipeline: KafkaSource (KafkaIO) ---> Windowing (FixedWindow 1min) ---> BeamSql ---> KafkaSink (KafkaIO) We are using Spark Runner for this. The BeamSql query is: {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code} We are grouping by Col3 which is a string. It can hold values string[0-9]. The records are getting emitted out at 1 min to kafka sink, but the output record in kafka is not as expected. Below is the output observed: (WST and WET are indicators for window start time and window end time) {code} {"count_col1":1,"Col3":"string5","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":3,"Col3":"string7","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":2,"Col3":"string8","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":1,"Col3":"string2","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":1,"Col3":"string6","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 0} {code} {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5639) Add exception handling to single message transforms in Python SDK
[ https://issues.apache.org/jira/browse/BEAM-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-5639: - Assignee: Jeff Klukas > Add exception handling to single message transforms in Python SDK > - > > Key: BEAM-5639 > URL: https://issues.apache.org/jira/browse/BEAM-5639 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Jeff Klukas >Assignee: Jeff Klukas >Priority: Minor > > Add methods to python SDK Map and FlatMap that allow users to specify > expected exceptions and tuple tags to associate with the with collections of > the successfully and unsuccessfully processed elements. > See discussion on dev list: > [https://lists.apache.org/thread.html/936ed2a5f2c01be066fd903abf70130625e0b8cf4028c11b89b8b23f@%3Cdev.beam.apache.org%3E] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5638) Add exception handling to single message transforms in Java SDK
[ https://issues.apache.org/jira/browse/BEAM-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-5638: - Assignee: Jeff Klukas (was: Kenneth Knowles) > Add exception handling to single message transforms in Java SDK > --- > > Key: BEAM-5638 > URL: https://issues.apache.org/jira/browse/BEAM-5638 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Jeff Klukas >Assignee: Jeff Klukas >Priority: Minor > Original Estimate: 168h > Remaining Estimate: 168h > > Add methods to MapElements, FlatMapElements, and Filter that allow users to > specify expected exceptions and tuple tags to associate with the with > collections of the successfully and unsuccessfully processed elements. > See discussion on dev list: > https://lists.apache.org/thread.html/936ed2a5f2c01be066fd903abf70130625e0b8cf4028c11b89b8b23f@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5526) Make ByteBuddyDoFnInvokerFactory injection strategy configurable + drop the singleton
[ https://issues.apache.org/jira/browse/BEAM-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634737#comment-16634737 ] Kenneth Knowles commented on BEAM-5526: --- Do we have an umbrella ticket for Java 11? Is there a dev thread about tracking it? > Make ByteBuddyDoFnInvokerFactory injection strategy configurable + drop the > singleton > - > > Key: BEAM-5526 > URL: https://issues.apache.org/jira/browse/BEAM-5526 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Romain Manni-Bucau >Priority: Blocker > > org.apache.beam.sdk.transforms.reflect.DoFnInvokers + DoFnInvokerFactory > design is to be a SPI to let user plug their own bytecode manipulation > library, however in practise beam uses ByteBuddyDoFnInvokerFactory as a > singleton which makes all this design useless. > ByteBuddyDoFnInvokerFactory is also not configurable at all - typically the > injection strategy so it assumes it runs in an environment and on a JVM where > it will work - it does not on java 11 for instance. > This ticket is about fixing all these small inconsistency and blocker to tun > on java 11. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5526) Make ByteBuddyDoFnInvokerFactory injection strategy configurable + drop the singleton
[ https://issues.apache.org/jira/browse/BEAM-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634733#comment-16634733 ] Kenneth Knowles commented on BEAM-5526: --- [~romain.manni-bucau] you've marked this as a "blocker" priority. It is an implementation detail. {{DoFnInvoker}} and {{DoFnInvokerFactory}} are not user-facing APIs. The reason there is an interface is due to good practice (see Parnas https://www.win.tue.nl/~wstomv/edu/2ip30/references/criteria_for_modularization.pdf). You should put an interface in place when a significant decision is made, even if there is only one implementation. Here, the interface is around the decision of what library to use, just like you say. It is kind of a cool idea to implement some other DoFnInvoker factories for users, but then if the pipeline fails it is a user bug, not a Beam bug. That might get confusing. Have you opened this idea on the dev list? > Make ByteBuddyDoFnInvokerFactory injection strategy configurable + drop the > singleton > - > > Key: BEAM-5526 > URL: https://issues.apache.org/jira/browse/BEAM-5526 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Romain Manni-Bucau >Assignee: Kenneth Knowles >Priority: Blocker > > org.apache.beam.sdk.transforms.reflect.DoFnInvokers + DoFnInvokerFactory > design is to be a SPI to let user plug their own bytecode manipulation > library, however in practise beam uses ByteBuddyDoFnInvokerFactory as a > singleton which makes all this design useless. > ByteBuddyDoFnInvokerFactory is also not configurable at all - typically the > injection strategy so it assumes it runs in an environment and on a JVM where > it will work - it does not on java 11 for instance. > This ticket is about fixing all these small inconsistency and blocker to tun > on java 11. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5526) Make ByteBuddyDoFnInvokerFactory injection strategy configurable + drop the singleton
[ https://issues.apache.org/jira/browse/BEAM-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-5526: - Assignee: (was: Kenneth Knowles) > Make ByteBuddyDoFnInvokerFactory injection strategy configurable + drop the > singleton > - > > Key: BEAM-5526 > URL: https://issues.apache.org/jira/browse/BEAM-5526 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Romain Manni-Bucau >Priority: Blocker > > org.apache.beam.sdk.transforms.reflect.DoFnInvokers + DoFnInvokerFactory > design is to be a SPI to let user plug their own bytecode manipulation > library, however in practise beam uses ByteBuddyDoFnInvokerFactory as a > singleton which makes all this design useless. > ByteBuddyDoFnInvokerFactory is also not configurable at all - typically the > injection strategy so it assumes it runs in an environment and on a JVM where > it will work - it does not on java 11 for instance. > This ticket is about fixing all these small inconsistency and blocker to tun > on java 11. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5526) Make ByteBuddyDoFnInvokerFactory injection strategy configurable + drop the singleton
[ https://issues.apache.org/jira/browse/BEAM-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-5526: -- Issue Type: New Feature (was: Task) > Make ByteBuddyDoFnInvokerFactory injection strategy configurable + drop the > singleton > - > > Key: BEAM-5526 > URL: https://issues.apache.org/jira/browse/BEAM-5526 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Romain Manni-Bucau >Assignee: Kenneth Knowles >Priority: Blocker > > org.apache.beam.sdk.transforms.reflect.DoFnInvokers + DoFnInvokerFactory > design is to be a SPI to let user plug their own bytecode manipulation > library, however in practise beam uses ByteBuddyDoFnInvokerFactory as a > singleton which makes all this design useless. > ByteBuddyDoFnInvokerFactory is also not configurable at all - typically the > injection strategy so it assumes it runs in an environment and on a JVM where > it will work - it does not on java 11 for instance. > This ticket is about fixing all these small inconsistency and blocker to tun > on java 11. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5526) Make ByteBuddyDoFnInvokerFactory injection strategy configurable + drop the singleton
[ https://issues.apache.org/jira/browse/BEAM-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-5526: -- Issue Type: Improvement (was: New Feature) > Make ByteBuddyDoFnInvokerFactory injection strategy configurable + drop the > singleton > - > > Key: BEAM-5526 > URL: https://issues.apache.org/jira/browse/BEAM-5526 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Romain Manni-Bucau >Assignee: Kenneth Knowles >Priority: Blocker > > org.apache.beam.sdk.transforms.reflect.DoFnInvokers + DoFnInvokerFactory > design is to be a SPI to let user plug their own bytecode manipulation > library, however in practise beam uses ByteBuddyDoFnInvokerFactory as a > singleton which makes all this design useless. > ByteBuddyDoFnInvokerFactory is also not configurable at all - typically the > injection strategy so it assumes it runs in an environment and on a JVM where > it will work - it does not on java 11 for instance. > This ticket is about fixing all these small inconsistency and blocker to tun > on java 11. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4670) Re-enable checkstyle JavadocParagraph
[ https://issues.apache.org/jira/browse/BEAM-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-4670: -- Labels: easyfix newbie starter (was: ) > Re-enable checkstyle JavadocParagraph > - > > Key: BEAM-4670 > URL: https://issues.apache.org/jira/browse/BEAM-4670 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Kenneth Knowles >Priority: Major > Labels: easyfix, newbie, starter > > When {{spotless}} reformats javadoc, many of our javadoc comments are > revealed to be broken, missing paragraph tags. Rather than force them all to > be fixed in a single PR, we could disable the check and do it a few at a > time, since it is easy "idle hands" work. This JIRA tracks that (the actual > disabling is not yet committed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4439) Do not convert a join that we cannot support
[ https://issues.apache.org/jira/browse/BEAM-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-4439: -- Labels: newbie starter (was: ) > Do not convert a join that we cannot support > > > Key: BEAM-4439 > URL: https://issues.apache.org/jira/browse/BEAM-4439 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kenneth Knowles >Priority: Major > Labels: starter > > The BeamJoinRule matches all LogicalJoin operators. If we make it not match a > full join then Calcite should look for a different plan when it converts to > BeamLogicalConvention. That is a good step to make sure we can support things > and also force the logical optimizer to choose non-full joins. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4670) Re-enable checkstyle JavadocParagraph
[ https://issues.apache.org/jira/browse/BEAM-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-4670: - Assignee: (was: Kenneth Knowles) > Re-enable checkstyle JavadocParagraph > - > > Key: BEAM-4670 > URL: https://issues.apache.org/jira/browse/BEAM-4670 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Kenneth Knowles >Priority: Major > Labels: easyfix, newbie, starter > > When {{spotless}} reformats javadoc, many of our javadoc comments are > revealed to be broken, missing paragraph tags. Rather than force them all to > be fixed in a single PR, we could disable the check and do it a few at a > time, since it is easy "idle hands" work. This JIRA tracks that (the actual > disabling is not yet committed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4439) Do not convert a join that we cannot support
[ https://issues.apache.org/jira/browse/BEAM-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-4439: -- Labels: starter (was: newbie starter) > Do not convert a join that we cannot support > > > Key: BEAM-4439 > URL: https://issues.apache.org/jira/browse/BEAM-4439 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kenneth Knowles >Priority: Major > Labels: starter > > The BeamJoinRule matches all LogicalJoin operators. If we make it not match a > full join then Calcite should look for a different plan when it converts to > BeamLogicalConvention. That is a good step to make sure we can support things > and also force the logical optimizer to choose non-full joins. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4439) Do not convert a join that we cannot support
[ https://issues.apache.org/jira/browse/BEAM-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-4439: - Assignee: (was: Kenneth Knowles) > Do not convert a join that we cannot support > > > Key: BEAM-4439 > URL: https://issues.apache.org/jira/browse/BEAM-4439 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kenneth Knowles >Priority: Major > > The BeamJoinRule matches all LogicalJoin operators. If we make it not match a > full join then Calcite should look for a different plan when it converts to > BeamLogicalConvention. That is a good step to make sure we can support things > and also force the logical optimizer to choose non-full joins. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-4664) Website Staging is timing out at 30 minutes
[ https://issues.apache.org/jira/browse/BEAM-4664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles resolved BEAM-4664. --- Resolution: Fixed Fix Version/s: Not applicable > Website Staging is timing out at 30 minutes > --- > > Key: BEAM-4664 > URL: https://issues.apache.org/jira/browse/BEAM-4664 > Project: Beam > Issue Type: Bug > Components: build-system, website >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > Fix For: Not applicable > > Time Spent: 40m > Remaining Estimate: 0h > > We shouldn't be cutting it close. Let's double it and more while we figure > out how to stage less or more efficiently. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-3608) Pre-shade Guava for things we want to keep using
[ https://issues.apache.org/jira/browse/BEAM-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-3608: - Assignee: (was: Kenneth Knowles) > Pre-shade Guava for things we want to keep using > > > Key: BEAM-3608 > URL: https://issues.apache.org/jira/browse/BEAM-3608 > Project: Beam > Issue Type: Sub-task > Components: runner-core, sdk-java-core >Reporter: Kenneth Knowles >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Instead of shading as part of our build, we can shade before build so that it > is apparent when reading code, and in IDEs, that a particular class resides > in a hidden namespace. > {{import com.google.common.reflect.TypeToken}} > becomes something like > {{import org.apache.beam.private.guava21.com.google.common.reflect.TypeToken}} > So we can very trivially ban `org.apache.beam.private` from public APIs > unless they are annotated {{@Internal}}, and it makes sharing between our own > modules never get broken by shading again. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4702) After SQL GROUP BY the result should be globally windowed
[ https://issues.apache.org/jira/browse/BEAM-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-4702: - Assignee: (was: Kenneth Knowles) > After SQL GROUP BY the result should be globally windowed > - > > Key: BEAM-4702 > URL: https://issues.apache.org/jira/browse/BEAM-4702 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kenneth Knowles >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Beam SQL runs in two contexts: > 1. As a PTransform in a pipeline. A PTransform operates on a PCollection, > which is always implicitly windows and a PTransform should operate per-window > so it automatically works on bounded and unbounded data. This only works if > the query has no windowing operators, in which case the GROUP BY stuff> should operate per-window. > 2. As a top-level shell that starts and ends with SQL. In the relational > model there are no implicit windows. Calcite has some extensions for > windowing, but they manifest (IMO correctly) as just items in the GROUP BY > list. The output of the aggregation is "just rows" again. So it should be > globally windowed. > The problem is that this semantic fix makes it so we cannot join windowing > stream subqueries. Because we don't have retractions, we only support > GroupByKey-based equijoins over windowed streams, with the default trigger. > _These joins implicitly also join windows_. For example: > {code} > JOIN(left.id = right.id) > SELECT ... GROUP BY id, TUMBLE(1 hour) > SELECT ... GROUP BY id, TUMBLE(1 hour) > {code} > Semantically, there may be a joined row for 1:00pm on the left and 10:00pm on > the right. But by the time the right-hand row for 10:00pm shows up, the left > one may be GC'd. So this is implicitly, but nondeterministically, joining on > the window as well. Before this PR, we left the windowing strategies for left > and right in place, and asserted that they matched. > If we re-window into the global window always, there _are no windowed > streams_ so you just can't do stream joins. The solution is probably to track > which field of a stream is the window and allow joins which also explicitly > express the equijoin over the window field. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4598) Test datetime operators at the DSL level
[ https://issues.apache.org/jira/browse/BEAM-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-4598: - Assignee: (was: Kenneth Knowles) > Test datetime operators at the DSL level > > > Key: BEAM-4598 > URL: https://issues.apache.org/jira/browse/BEAM-4598 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kenneth Knowles >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4704) String operations yield incorrect results when executed through SQL shell
[ https://issues.apache.org/jira/browse/BEAM-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531674#comment-16531674 ] Kenneth Knowles commented on BEAM-4704: --- I'm going to leave this open, as we should either force our convention somehow and/or fix Calcite's implementation. > String operations yield incorrect results when executed through SQL shell > - > > Key: BEAM-4704 > URL: https://issues.apache.org/jira/browse/BEAM-4704 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > > {{TRIM}} is defined to trim _all_ the characters in the first string from the > string-to-be-trimmed. Calcite has an incorrect implementation of this. We use > our own fixed implementation. But when executed through the SQL shell, the > results do not match what we get from the PTransform path. Here two test > cases that pass on {{master}} but are incorrect in the shell: > {code:sql} > BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe'); > ++ > | EXPR$0 | > ++ > | hehe__hehe | > ++ > {code} > {code:sql} > BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe'); > ++ > | EXPR$0 | > ++ > | hehe__heh | > ++ > {code} > {code:sql} > BeamSQL> select TRIM(BOTH 'eh' FROM 'hehe__hehe'); > ++ > | EXPR$0 | > ++ > | hehe__heh | > ++ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4704) String operations yield incorrect results when executed through SQL shell
[ https://issues.apache.org/jira/browse/BEAM-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-4704: - Assignee: (was: Kenneth Knowles) > String operations yield incorrect results when executed through SQL shell > - > > Key: BEAM-4704 > URL: https://issues.apache.org/jira/browse/BEAM-4704 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kenneth Knowles >Priority: Major > > {{TRIM}} is defined to trim _all_ the characters in the first string from the > string-to-be-trimmed. Calcite has an incorrect implementation of this. We use > our own fixed implementation. But when executed through the SQL shell, the > results do not match what we get from the PTransform path. Here two test > cases that pass on {{master}} but are incorrect in the shell: > {code:sql} > BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe'); > ++ > | EXPR$0 | > ++ > | hehe__hehe | > ++ > {code} > {code:sql} > BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe'); > ++ > | EXPR$0 | > ++ > | hehe__heh | > ++ > {code} > {code:sql} > BeamSQL> select TRIM(BOTH 'eh' FROM 'hehe__hehe'); > ++ > | EXPR$0 | > ++ > | hehe__heh | > ++ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-2888) Runner Comparison / Capability Matrix revamp
[ https://issues.apache.org/jira/browse/BEAM-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531671#comment-16531671 ] Kenneth Knowles commented on BEAM-2888: --- [~griscz]] I will not be working on this for a while, I wanted to find it a good owner. It looks closely related to something you have assigned. > Runner Comparison / Capability Matrix revamp > > > Key: BEAM-2888 > URL: https://issues.apache.org/jira/browse/BEAM-2888 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Kenneth Knowles >Assignee: Griselda Cuevas Zambrano >Priority: Major > > Discussion: > https://lists.apache.org/thread.html/8aff7d70c254356f2dae3109fb605e0b60763602225a877d3dadf8b7@%3Cdev.beam.apache.org%3E > Summarizing that discussion, we have a lot of issues/wishes. Some can be > addressed as one-off and some need a unified reorganization of the runner > comparison. > Basic corrections: > - Remove rows that impossible to not support (ParDo) > - Remove rows where "support" doesn't really make sense (Composite > transforms) > - Deduplicate rows are actually the same model feature (all non-merging > windowing / all merging windowing) > - Clearly separate rows that represent optimizations (Combine) > - Correct rows in the wrong place (Timers are actually a "what...?" row) > - Separate or remove rows have not been designed ([Meta]Data driven > triggers, retractions) > - Rename rows with names that appear no where else (Timestamp control, which > is called a TimestampCombiner in Java) > - Switch to a more distinct color scheme for full/partial support (currently > just solid/faded colors) > - Switch to something clearer than "~" for partial support, versus ✘ and ✓ > for none and full. > - Correct Gearpump support for merging windows (see BEAM-2759) > - Correct Spark support for non-merging and merging windows (see BEAM-2499) > Minor rewrites: > - Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into one row > - Make sections as users see them, like "ParDo" / "side Inputs" not "What?" > / "side inputs" > - Add rows for non-model things, like portability framework support, metrics > backends, etc > Bigger rewrites: > - Add versioning to the comparison, as in BEAM-166 > - Find a way to fit in a plain English summary of runner's support in Beam. > It should come first, as it is what new users need before getting to details. > - Find a way to describe production readiness of runners and/or testimonials > of who is using it in production. > - Have a place to compare non-model differences between runners > Changes requiring engineering efforts: > - Gather and add quantitative runner metrics, perhaps Nexmark results for > mid-level, smaller benchmarks for measuring aspects of specific features, and > larger end-to-end benchmarks to get an idea how it might actually perform on > a use case > - Tighter coupling of the matrix portion of the comparison with tags on > ValidatesRunner tests > If you care to address some aspect of this, please reach out and/or just file > a subtask and address it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-2888) Runner Comparison / Capability Matrix revamp
[ https://issues.apache.org/jira/browse/BEAM-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531672#comment-16531672 ] Kenneth Knowles commented on BEAM-2888: --- Some of this can be distributed into subtasks, or [~melap] might have opinions about it. > Runner Comparison / Capability Matrix revamp > > > Key: BEAM-2888 > URL: https://issues.apache.org/jira/browse/BEAM-2888 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Kenneth Knowles >Assignee: Griselda Cuevas Zambrano >Priority: Major > > Discussion: > https://lists.apache.org/thread.html/8aff7d70c254356f2dae3109fb605e0b60763602225a877d3dadf8b7@%3Cdev.beam.apache.org%3E > Summarizing that discussion, we have a lot of issues/wishes. Some can be > addressed as one-off and some need a unified reorganization of the runner > comparison. > Basic corrections: > - Remove rows that impossible to not support (ParDo) > - Remove rows where "support" doesn't really make sense (Composite > transforms) > - Deduplicate rows are actually the same model feature (all non-merging > windowing / all merging windowing) > - Clearly separate rows that represent optimizations (Combine) > - Correct rows in the wrong place (Timers are actually a "what...?" row) > - Separate or remove rows have not been designed ([Meta]Data driven > triggers, retractions) > - Rename rows with names that appear no where else (Timestamp control, which > is called a TimestampCombiner in Java) > - Switch to a more distinct color scheme for full/partial support (currently > just solid/faded colors) > - Switch to something clearer than "~" for partial support, versus ✘ and ✓ > for none and full. > - Correct Gearpump support for merging windows (see BEAM-2759) > - Correct Spark support for non-merging and merging windows (see BEAM-2499) > Minor rewrites: > - Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into one row > - Make sections as users see them, like "ParDo" / "side Inputs" not "What?" > / "side inputs" > - Add rows for non-model things, like portability framework support, metrics > backends, etc > Bigger rewrites: > - Add versioning to the comparison, as in BEAM-166 > - Find a way to fit in a plain English summary of runner's support in Beam. > It should come first, as it is what new users need before getting to details. > - Find a way to describe production readiness of runners and/or testimonials > of who is using it in production. > - Have a place to compare non-model differences between runners > Changes requiring engineering efforts: > - Gather and add quantitative runner metrics, perhaps Nexmark results for > mid-level, smaller benchmarks for measuring aspects of specific features, and > larger end-to-end benchmarks to get an idea how it might actually perform on > a use case > - Tighter coupling of the matrix portion of the comparison with tags on > ValidatesRunner tests > If you care to address some aspect of this, please reach out and/or just file > a subtask and address it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-2888) Runner Comparison / Capability Matrix revamp
[ https://issues.apache.org/jira/browse/BEAM-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-2888: - Assignee: Griselda Cuevas Zambrano (was: Kenneth Knowles) > Runner Comparison / Capability Matrix revamp > > > Key: BEAM-2888 > URL: https://issues.apache.org/jira/browse/BEAM-2888 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Kenneth Knowles >Assignee: Griselda Cuevas Zambrano >Priority: Major > > Discussion: > https://lists.apache.org/thread.html/8aff7d70c254356f2dae3109fb605e0b60763602225a877d3dadf8b7@%3Cdev.beam.apache.org%3E > Summarizing that discussion, we have a lot of issues/wishes. Some can be > addressed as one-off and some need a unified reorganization of the runner > comparison. > Basic corrections: > - Remove rows that impossible to not support (ParDo) > - Remove rows where "support" doesn't really make sense (Composite > transforms) > - Deduplicate rows are actually the same model feature (all non-merging > windowing / all merging windowing) > - Clearly separate rows that represent optimizations (Combine) > - Correct rows in the wrong place (Timers are actually a "what...?" row) > - Separate or remove rows have not been designed ([Meta]Data driven > triggers, retractions) > - Rename rows with names that appear no where else (Timestamp control, which > is called a TimestampCombiner in Java) > - Switch to a more distinct color scheme for full/partial support (currently > just solid/faded colors) > - Switch to something clearer than "~" for partial support, versus ✘ and ✓ > for none and full. > - Correct Gearpump support for merging windows (see BEAM-2759) > - Correct Spark support for non-merging and merging windows (see BEAM-2499) > Minor rewrites: > - Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into one row > - Make sections as users see them, like "ParDo" / "side Inputs" not "What?" > / "side inputs" > - Add rows for non-model things, like portability framework support, metrics > backends, etc > Bigger rewrites: > - Add versioning to the comparison, as in BEAM-166 > - Find a way to fit in a plain English summary of runner's support in Beam. > It should come first, as it is what new users need before getting to details. > - Find a way to describe production readiness of runners and/or testimonials > of who is using it in production. > - Have a place to compare non-model differences between runners > Changes requiring engineering efforts: > - Gather and add quantitative runner metrics, perhaps Nexmark results for > mid-level, smaller benchmarks for measuring aspects of specific features, and > larger end-to-end benchmarks to get an idea how it might actually perform on > a use case > - Tighter coupling of the matrix portion of the comparison with tags on > ValidatesRunner tests > If you care to address some aspect of this, please reach out and/or just file > a subtask and address it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-4642) Allow setting PipelineOptions for JDBC connections
[ https://issues.apache.org/jira/browse/BEAM-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles resolved BEAM-4642. --- Resolution: Fixed Fix Version/s: 2.6.0 > Allow setting PipelineOptions for JDBC connections > -- > > Key: BEAM-4642 > URL: https://issues.apache.org/jira/browse/BEAM-4642 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > Fix For: 2.6.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently you can set pipeline options only through {{SET}} commands. It > would be convenient to set defaults in the JDBC connection string. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4667) Potential issue with QuantileStateCoder
[ https://issues.apache.org/jira/browse/BEAM-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-4667: - Assignee: Reuven Lax (was: Kenneth Knowles) > Potential issue with QuantileStateCoder > --- > > Key: BEAM-4667 > URL: https://issues.apache.org/jira/browse/BEAM-4667 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Zhiheng Huang >Assignee: Reuven Lax >Priority: Minor > > [https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateQuantiles.java#L687] > The line above encodes the QuantileState buffers.size() as if it's > numBuffers. This seems wrong since before buffers are full, buffers.size() is > not equal to numBuffers. One thing I suspect will happen is that, if we > serialize before buffer is full, it will effectively reduce the number of > buffers we maintain after deserialization. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4425) JDBC driver load errors :jdbc:calcite and :jdbc:beam
[ https://issues.apache.org/jira/browse/BEAM-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-4425: - Assignee: (was: Kenneth Knowles) > JDBC driver load errors :jdbc:calcite and :jdbc:beam > > > Key: BEAM-4425 > URL: https://issues.apache.org/jira/browse/BEAM-4425 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kenneth Knowles >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > These have different causes but the same fix - force load the JDBC driver > before it is used or otherwise factor code so it is not possible to hit this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4425) JDBC driver load errors :jdbc:calcite and :jdbc:beam
[ https://issues.apache.org/jira/browse/BEAM-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531665#comment-16531665 ] Kenneth Knowles commented on BEAM-4425: --- Is there a more broad fix or is just fixing it when it comes up what you have to do? > JDBC driver load errors :jdbc:calcite and :jdbc:beam > > > Key: BEAM-4425 > URL: https://issues.apache.org/jira/browse/BEAM-4425 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kenneth Knowles >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > These have different causes but the same fix - force load the JDBC driver > before it is used or otherwise factor code so it is not possible to hit this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4667) Potential issue with QuantileStateCoder
[ https://issues.apache.org/jira/browse/BEAM-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531666#comment-16531666 ] Kenneth Knowles commented on BEAM-4667: --- [~reuvenlax] any thoughts? > Potential issue with QuantileStateCoder > --- > > Key: BEAM-4667 > URL: https://issues.apache.org/jira/browse/BEAM-4667 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Zhiheng Huang >Assignee: Reuven Lax >Priority: Minor > > [https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateQuantiles.java#L687] > The line above encodes the QuantileState buffers.size() as if it's > numBuffers. This seems wrong since before buffers are full, buffers.size() is > not equal to numBuffers. One thing I suspect will happen is that, if we > serialize before buffer is full, it will effectively reduce the number of > buffers we maintain after deserialization. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4670) Re-enable checkstyle JavadocParagraph
[ https://issues.apache.org/jira/browse/BEAM-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-4670: - Assignee: (was: Kenneth Knowles) > Re-enable checkstyle JavadocParagraph > - > > Key: BEAM-4670 > URL: https://issues.apache.org/jira/browse/BEAM-4670 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Kenneth Knowles >Priority: Major > > When {{spotless}} reformats javadoc, many of our javadoc comments are > revealed to be broken, missing paragraph tags. Rather than force them all to > be fixed in a single PR, we could disable the check and do it a few at a > time, since it is easy "idle hands" work. This JIRA tracks that (the actual > disabling is not yet committed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4670) Re-enable checkstyle JavadocParagraph
[ https://issues.apache.org/jira/browse/BEAM-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-4670: - Assignee: Kenneth Knowles > Re-enable checkstyle JavadocParagraph > - > > Key: BEAM-4670 > URL: https://issues.apache.org/jira/browse/BEAM-4670 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > > When {{spotless}} reformats javadoc, many of our javadoc comments are > revealed to be broken, missing paragraph tags. Rather than force them all to > be fixed in a single PR, we could disable the check and do it a few at a > time, since it is easy "idle hands" work. This JIRA tracks that (the actual > disabling is not yet committed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-4697) Build broken
[ https://issues.apache.org/jira/browse/BEAM-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles resolved BEAM-4697. --- Resolution: Fixed Fix Version/s: Not applicable > Build broken > > > Key: BEAM-4697 > URL: https://issues.apache.org/jira/browse/BEAM-4697 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Fix For: Not applicable > > Time Spent: 0.5h > Remaining Estimate: 0h > > Caused by a race between two separate PRs being merged. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4669) Re-enable checkstyle LineLength
[ https://issues.apache.org/jira/browse/BEAM-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-4669: - Assignee: (was: Kenneth Knowles) > Re-enable checkstyle LineLength > --- > > Key: BEAM-4669 > URL: https://issues.apache.org/jira/browse/BEAM-4669 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Kenneth Knowles >Priority: Major > Labels: easyfix, newbie, starter > > When {{spotless}} re-indents, many strings that are broken across lines end > up exceeding 100 characters. Rather than force them all to be fixed in a > single PR, we could disable the check and do it a few at a time, since it is > easy "idle hands" work. This JIRA tracks that (the actual disabling is not > yet committed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4697) Build broken
[ https://issues.apache.org/jira/browse/BEAM-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-4697: - Assignee: Reuven Lax (was: Kenneth Knowles) > Build broken > > > Key: BEAM-4697 > URL: https://issues.apache.org/jira/browse/BEAM-4697 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Fix For: Not applicable > > Time Spent: 0.5h > Remaining Estimate: 0h > > Caused by a race between two separate PRs being merged. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4669) Re-enable checkstyle LineLength
[ https://issues.apache.org/jira/browse/BEAM-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-4669: -- Labels: easyfix newbie starter (was: ) > Re-enable checkstyle LineLength > --- > > Key: BEAM-4669 > URL: https://issues.apache.org/jira/browse/BEAM-4669 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Kenneth Knowles >Priority: Major > Labels: easyfix, newbie, starter > > When {{spotless}} re-indents, many strings that are broken across lines end > up exceeding 100 characters. Rather than force them all to be fixed in a > single PR, we could disable the check and do it a few at a time, since it is > easy "idle hands" work. This JIRA tracks that (the actual disabling is not > yet committed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-4701) Run SQL DSL-level tests through JDBC driver
[ https://issues.apache.org/jira/browse/BEAM-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles resolved BEAM-4701. --- Resolution: Fixed Fix Version/s: 2.6.0 > Run SQL DSL-level tests through JDBC driver > --- > > Key: BEAM-4701 > URL: https://issues.apache.org/jira/browse/BEAM-4701 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > Fix For: 2.6.0 > > > Currently there's a testing gap where tests at the DSL level mostly go > through the QueryTransform. So we can hit issues where we have a mismatch in > Calcite Avatica, which provides the JDBC layer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4700) JDBC driver cannot support TIMESTAMP data type
[ https://issues.apache.org/jira/browse/BEAM-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531663#comment-16531663 ] Kenneth Knowles commented on BEAM-4700: --- I think you know the most about Avatica and how we might work around this. > JDBC driver cannot support TIMESTAMP data type > -- > > Key: BEAM-4700 > URL: https://issues.apache.org/jira/browse/BEAM-4700 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Andrew Pilloud >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Avatica allows column representation to be customized, so a timestamp can be > stored as a variety of types. Joda ReadableInstant is none of these types: > https://github.com/apache/calcite-avatica/blob/acb675de97b9b0743c09368820a770e2ceda05f8/core/src/main/java/org/apache/calcite/avatica/util/AbstractCursor.java#L162 > By default, it seems to be configured to store {{TIMESTAMP}} columns as > {{long}} values. If you run the SQL shell and select a {{TIMESTAMP}} column, > you get: > {code} > ava.lang.ClassCastException: org.joda.time.Instant cannot be cast to > java.lang.Number > at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$NumberAccessor.getNumber(AbstractCursor.java:726) > at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$TimestampFromNumberAccessor.getString(AbstractCursor.java:1026) > at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.AvaticaResultSet.getString(AvaticaResultSet.java:225) > at sqlline.Rows$Row.(Rows.java:183) > {code} > So, essentially, Beam SQL Shell does not support timestamps. > We may be able to: > - override how the accessor for our existing storage is created > - configure what the column representation is (this doesn't really help, > since none of the choices are ours) > - convert timestamps to longs in BeamEnumerableConverter; not sure how many > conversions will be required here -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4700) JDBC driver cannot support TIMESTAMP data type
[ https://issues.apache.org/jira/browse/BEAM-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-4700: - Assignee: Andrew Pilloud (was: Kenneth Knowles) > JDBC driver cannot support TIMESTAMP data type > -- > > Key: BEAM-4700 > URL: https://issues.apache.org/jira/browse/BEAM-4700 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Andrew Pilloud >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Avatica allows column representation to be customized, so a timestamp can be > stored as a variety of types. Joda ReadableInstant is none of these types: > https://github.com/apache/calcite-avatica/blob/acb675de97b9b0743c09368820a770e2ceda05f8/core/src/main/java/org/apache/calcite/avatica/util/AbstractCursor.java#L162 > By default, it seems to be configured to store {{TIMESTAMP}} columns as > {{long}} values. If you run the SQL shell and select a {{TIMESTAMP}} column, > you get: > {code} > ava.lang.ClassCastException: org.joda.time.Instant cannot be cast to > java.lang.Number > at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$NumberAccessor.getNumber(AbstractCursor.java:726) > at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$TimestampFromNumberAccessor.getString(AbstractCursor.java:1026) > at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.AvaticaResultSet.getString(AvaticaResultSet.java:225) > at sqlline.Rows$Row.(Rows.java:183) > {code} > So, essentially, Beam SQL Shell does not support timestamps. > We may be able to: > - override how the accessor for our existing storage is created > - configure what the column representation is (this doesn't really help, > since none of the choices are ours) > - convert timestamps to longs in BeamEnumerableConverter; not sure how many > conversions will be required here -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-4714) Some DATETIME PLUS operators end up as ordinary PLUS and crash in accept()
[ https://issues.apache.org/jira/browse/BEAM-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles resolved BEAM-4714. --- Resolution: Fixed Fix Version/s: 2.6.0 > Some DATETIME PLUS operators end up as ordinary PLUS and crash in accept() > -- > > Key: BEAM-4714 > URL: https://issues.apache.org/jira/browse/BEAM-4714 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > Fix For: 2.6.0 > > Time Spent: 1h > Remaining Estimate: 0h > > In this case it was in HOP_END (but not HOP_START, interestingly enough). > Unclear why this manifests, but the "+" operator should just work for > datetime + interval for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4684) Support @RequiresStableInput on Dataflow runner in Java SDK
[ https://issues.apache.org/jira/browse/BEAM-4684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-4684: - Assignee: Yueyang Qiu (was: Kenneth Knowles) > Support @RequiresStableInput on Dataflow runner in Java SDK > --- > > Key: BEAM-4684 > URL: https://issues.apache.org/jira/browse/BEAM-4684 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > https://docs.google.com/document/d/117yRKbbcEdm3eIKB_26BHOJGmHSZl1YNoF0RqWGtqAM -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4547) implement sum0 aggregation function
[ https://issues.apache.org/jira/browse/BEAM-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531662#comment-16531662 ] Kenneth Knowles commented on BEAM-4547: --- I'm sure that either {{SUM}} or {{SUM0}} is incorrect, since we use the same implementation for both... I'm not sure it matters for Beam, but we should get testing for it all, including the edge cases. That can be separate. > implement sum0 aggregation function > --- > > Key: BEAM-4547 > URL: https://issues.apache.org/jira/browse/BEAM-4547 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kai Jiang >Assignee: Kenneth Knowles >Priority: Major > Fix For: 2.6.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-4547) implement sum0 aggregation function
[ https://issues.apache.org/jira/browse/BEAM-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles resolved BEAM-4547. --- Resolution: Fixed Fix Version/s: 2.6.0 > implement sum0 aggregation function > --- > > Key: BEAM-4547 > URL: https://issues.apache.org/jira/browse/BEAM-4547 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kai Jiang >Assignee: Kenneth Knowles >Priority: Major > Fix For: 2.6.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4721) WindowingStrategy, Window fields in the project as a trait of a Rel
Kenneth Knowles created BEAM-4721: - Summary: WindowingStrategy, Window fields in the project as a trait of a Rel Key: BEAM-4721 URL: https://issues.apache.org/jira/browse/BEAM-4721 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Kenneth Knowles Currently, WindowingStrategy is just a property of a PCollection but is unknown during planning. As with boundedness / unboundedness, this should be available statically so it can affect the plan. It may not be literally a WindowingStrategy object, but a SQL-level equivalent. One windowing-related thing that is interesting is which fields represent the window. If we know this, then we know we can support per-window equijoins as long as a join condition also does an equijoin on the window. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4720) Boundedness and Unboundedness as a trait of a Rel so we know during planning
Kenneth Knowles created BEAM-4720: - Summary: Boundedness and Unboundedness as a trait of a Rel so we know during planning Key: BEAM-4720 URL: https://issues.apache.org/jira/browse/BEAM-4720 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Kenneth Knowles Currently we do not know the boundedness or unboundedness of a collection until after planning is complete, so it cannot influence the plan. Instead, BeamJoinRel makes post-planning decisions about what sort of join to implement. We should move this into the planning phase. I think Calcite traits are the intended way to do this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4719) Enhanced LIMIT support
Kenneth Knowles created BEAM-4719: - Summary: Enhanced LIMIT support Key: BEAM-4719 URL: https://issues.apache.org/jira/browse/BEAM-4719 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Kenneth Knowles Currently, Beam SQL supports LIMIT in two ways: 1. Within a query, the results are subject to LIMIT. This works. 2. The shell knows to cancel a pipeline when the limit is reached, even if there is unfinished unbounded data. The canceling of a pipeline works via a basic pattern match against the query execution plan, checking a few child nodes of the BeamEnumerableConverter for a BeamSortRel without a collation. If it can figure out what the limit is for the outermost query, then it will cancel the pipeline. A more robust approach might be to use traits (or some other thorough analysis) to see if there is a known size for the outermost query. This would, for example, be unaffected by any number of layer of non-size-changing transformations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4714) Some DATETIME PLUS operators end up as ordinary PLUS and crash in accept()
[ https://issues.apache.org/jira/browse/BEAM-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-4714: -- Description: In this case it was in HOP_END (but not HOP_START, interestingly enough). Unclear why this manifests, but the "+" operator should just work for datetime + interval for this. > Some DATETIME PLUS operators end up as ordinary PLUS and crash in accept() > -- > > Key: BEAM-4714 > URL: https://issues.apache.org/jira/browse/BEAM-4714 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > > In this case it was in HOP_END (but not HOP_START, interestingly enough). > Unclear why this manifests, but the "+" operator should just work for > datetime + interval for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4714) Some DATETIME PLUS operators end up as ordinary PLUS and crash in accept()
Kenneth Knowles created BEAM-4714: - Summary: Some DATETIME PLUS operators end up as ordinary PLUS and crash in accept() Key: BEAM-4714 URL: https://issues.apache.org/jira/browse/BEAM-4714 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Kenneth Knowles Assignee: Kenneth Knowles -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4547) implement sum0 aggregation function
[ https://issues.apache.org/jira/browse/BEAM-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-4547: - Assignee: Kenneth Knowles (was: Kai Jiang) > implement sum0 aggregation function > --- > > Key: BEAM-4547 > URL: https://issues.apache.org/jira/browse/BEAM-4547 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kai Jiang >Assignee: Kenneth Knowles >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4700) JDBC driver cannot support TIMESTAMP data type
[ https://issues.apache.org/jira/browse/BEAM-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530355#comment-16530355 ] Kenneth Knowles commented on BEAM-4700: --- Discussion on CALCITE-2394 implies that it is necessary for us to always store as {{TIMESTAMP WITH TIME ZONE}} set to 0 since Avatica's SQL-to-JDBC treats it as "millis since time zone offset". > JDBC driver cannot support TIMESTAMP data type > -- > > Key: BEAM-4700 > URL: https://issues.apache.org/jira/browse/BEAM-4700 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Avatica allows column representation to be customized, so a timestamp can be > stored as a variety of types. Joda ReadableInstant is none of these types: > https://github.com/apache/calcite-avatica/blob/acb675de97b9b0743c09368820a770e2ceda05f8/core/src/main/java/org/apache/calcite/avatica/util/AbstractCursor.java#L162 > By default, it seems to be configured to store {{TIMESTAMP}} columns as > {{long}} values. If you run the SQL shell and select a {{TIMESTAMP}} column, > you get: > {code} > ava.lang.ClassCastException: org.joda.time.Instant cannot be cast to > java.lang.Number > at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$NumberAccessor.getNumber(AbstractCursor.java:726) > at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$TimestampFromNumberAccessor.getString(AbstractCursor.java:1026) > at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.AvaticaResultSet.getString(AvaticaResultSet.java:225) > at sqlline.Rows$Row.(Rows.java:183) > {code} > So, essentially, Beam SQL Shell does not support timestamps. > We may be able to: > - override how the accessor for our existing storage is created > - configure what the column representation is (this doesn't really help, > since none of the choices are ours) > - convert timestamps to longs in BeamEnumerableConverter; not sure how many > conversions will be required here -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-4626) Support text table format with a single column of the lines of the files
[ https://issues.apache.org/jira/browse/BEAM-4626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles resolved BEAM-4626. --- Resolution: Fixed Fix Version/s: 2.6.0 > Support text table format with a single column of the lines of the files > > > Key: BEAM-4626 > URL: https://issues.apache.org/jira/browse/BEAM-4626 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > Fix For: 2.6.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Today, SQL can read CSV and allows a {{format}} flag to control what CSV > variant is used. But to do easy things and write pure SQL jobs it would be > nice to just read the text file as a one-column table and do transformations > in SQL. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4713) DECIMAL support in RowJsonDeserializer
Kenneth Knowles created BEAM-4713: - Summary: DECIMAL support in RowJsonDeserializer Key: BEAM-4713 URL: https://issues.apache.org/jira/browse/BEAM-4713 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Kenneth Knowles Assignee: Rui Wang -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4702) After SQL GROUP BY the result should be globally windowed
[ https://issues.apache.org/jira/browse/BEAM-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-4702: -- Description: Beam SQL runs in two contexts: 1. As a PTransform in a pipeline. A PTransform operates on a PCollection, which is always implicitly windows and a PTransform should operate per-window so it automatically works on bounded and unbounded data. This only works if the query has no windowing operators, in which case the GROUP BY should operate per-window. 2. As a top-level shell that starts and ends with SQL. In the relational model there are no implicit windows. Calcite has some extensions for windowing, but they manifest (IMO correctly) as just items in the GROUP BY list. The output of the aggregation is "just rows" again. So it should be globally windowed. The problem is that this semantic fix makes it so we cannot join windowing stream subqueries. Because we don't have retractions, we only support GroupByKey-based equijoins over windowed streams, with the default trigger. _These joins implicitly also join windows_. For example: {code} JOIN(left.id = right.id) SELECT ... GROUP BY id, TUMBLE(1 hour) SELECT ... GROUP BY id, TUMBLE(1 hour) {code} Semantically, there may be a joined row for 1:00pm on the left and 10:00pm on the right. But by the time the right-hand row for 10:00pm shows up, the left one may be GC'd. So this is implicitly, but nondeterministically, joining on the window as well. Before this PR, we left the windowing strategies for left and right in place, and asserted that they matched. If we re-window into the global window always, there _are no windowed streams_ so you just can't do stream joins. The solution is probably to track which field of a stream is the window and allow joins which also explicitly express the equijoin over the window field. > After SQL GROUP BY the result should be globally windowed > - > > Key: BEAM-4702 > URL: https://issues.apache.org/jira/browse/BEAM-4702 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Beam SQL runs in two contexts: > 1. As a PTransform in a pipeline. A PTransform operates on a PCollection, > which is always implicitly windows and a PTransform should operate per-window > so it automatically works on bounded and unbounded data. This only works if > the query has no windowing operators, in which case the GROUP BY stuff> should operate per-window. > 2. As a top-level shell that starts and ends with SQL. In the relational > model there are no implicit windows. Calcite has some extensions for > windowing, but they manifest (IMO correctly) as just items in the GROUP BY > list. The output of the aggregation is "just rows" again. So it should be > globally windowed. > The problem is that this semantic fix makes it so we cannot join windowing > stream subqueries. Because we don't have retractions, we only support > GroupByKey-based equijoins over windowed streams, with the default trigger. > _These joins implicitly also join windows_. For example: > {code} > JOIN(left.id = right.id) > SELECT ... GROUP BY id, TUMBLE(1 hour) > SELECT ... GROUP BY id, TUMBLE(1 hour) > {code} > Semantically, there may be a joined row for 1:00pm on the left and 10:00pm on > the right. But by the time the right-hand row for 10:00pm shows up, the left > one may be GC'd. So this is implicitly, but nondeterministically, joining on > the window as well. Before this PR, we left the windowing strategies for left > and right in place, and asserted that they matched. > If we re-window into the global window always, there _are no windowed > streams_ so you just can't do stream joins. The solution is probably to track > which field of a stream is the window and allow joins which also explicitly > express the equijoin over the window field. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-4708) Runners-core javadoc broken, blocked snapshot release
[ https://issues.apache.org/jira/browse/BEAM-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles resolved BEAM-4708. --- Resolution: Fixed Fix Version/s: Not applicable > Runners-core javadoc broken, blocked snapshot release > - > > Key: BEAM-4708 > URL: https://issues.apache.org/jira/browse/BEAM-4708 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Blocker > Fix For: Not applicable > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4709) Javadoc build only tested during release
Kenneth Knowles created BEAM-4709: - Summary: Javadoc build only tested during release Key: BEAM-4709 URL: https://issues.apache.org/jira/browse/BEAM-4709 Project: Beam Issue Type: Bug Components: build-system Reporter: Kenneth Knowles As seen in breakage like BEAM-4708, the javadoc is only build as part of the publish process. This seems like something that could comfortably fit in a postcommit test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4708) Runners-core javadoc broken, blocked snapshot release
Kenneth Knowles created BEAM-4708: - Summary: Runners-core javadoc broken, blocked snapshot release Key: BEAM-4708 URL: https://issues.apache.org/jira/browse/BEAM-4708 Project: Beam Issue Type: Bug Components: runner-core Reporter: Kenneth Knowles Assignee: Kenneth Knowles -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4704) String operations yield incorrect results when executed through SQL shell
[ https://issues.apache.org/jira/browse/BEAM-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529370#comment-16529370 ] Kenneth Knowles commented on BEAM-4704: --- [~apilloud] wild guess here - the shell only invokes our code when the calling convention insists that it do so? > String operations yield incorrect results when executed through SQL shell > - > > Key: BEAM-4704 > URL: https://issues.apache.org/jira/browse/BEAM-4704 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > > {{TRIM}} is defined to trim _all_ the characters in the first string from the > string-to-be-trimmed. Calcite has an incorrect implementation of this. We use > our own fixed implementation. But when executed through the SQL shell, the > results do not match what we get from the PTransform path. Here two test > cases that pass on {{master}} but are incorrect in the shell: > {code:sql} > BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe'); > ++ > | EXPR$0 | > ++ > | hehe__hehe | > ++ > {code} > {code:sql} > BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe'); > ++ > | EXPR$0 | > ++ > | hehe__heh | > ++ > {code} > {code:sql} > BeamSQL> select TRIM(BOTH 'eh' FROM 'hehe__hehe'); > ++ > | EXPR$0 | > ++ > | hehe__heh | > ++ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-4704) String operations yield incorrect results when executed through SQL shell
[ https://issues.apache.org/jira/browse/BEAM-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529370#comment-16529370 ] Kenneth Knowles edited comment on BEAM-4704 at 7/2/18 4:14 AM: --- [~apilloud] -wild- confident guess here - the shell only invokes our code when the calling convention insists that it do so. was (Author: kenn): [~apilloud] wild guess here - the shell only invokes our code when the calling convention insists that it do so? > String operations yield incorrect results when executed through SQL shell > - > > Key: BEAM-4704 > URL: https://issues.apache.org/jira/browse/BEAM-4704 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > > {{TRIM}} is defined to trim _all_ the characters in the first string from the > string-to-be-trimmed. Calcite has an incorrect implementation of this. We use > our own fixed implementation. But when executed through the SQL shell, the > results do not match what we get from the PTransform path. Here two test > cases that pass on {{master}} but are incorrect in the shell: > {code:sql} > BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe'); > ++ > | EXPR$0 | > ++ > | hehe__hehe | > ++ > {code} > {code:sql} > BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe'); > ++ > | EXPR$0 | > ++ > | hehe__heh | > ++ > {code} > {code:sql} > BeamSQL> select TRIM(BOTH 'eh' FROM 'hehe__hehe'); > ++ > | EXPR$0 | > ++ > | hehe__heh | > ++ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4704) String operations yield incorrect results when executed through SQL shell
[ https://issues.apache.org/jira/browse/BEAM-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-4704: -- Description: {{TRIM}} is defined to trim _all_ the characters in the first string from the string-to-be-trimmed. Calcite has an incorrect implementation of this. We use our own fixed implementation. But when executed through the SQL shell, the results do not match what we get from the PTransform path. Here two test cases that pass on {{master}} but are incorrect in the shell: {code:sql} BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe'); ++ | EXPR$0 | ++ | hehe__hehe | ++ {code} {code:sql} BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe'); ++ | EXPR$0 | ++ | hehe__heh | ++ {code} {code:sql} BeamSQL> select TRIM(BOTH 'eh' FROM 'hehe__hehe'); ++ | EXPR$0 | ++ | hehe__heh | ++ {code} was: {{TRIM}} is defined to trim _all_ the characters in the first string from the string-to-be-trimmed. Calcite has an incorrect implementation of this. We use our own fixed implementation. But when executed through the SQL shell, the results do not match what we get from the PTransform path. Here are two test cases that pass on {{master}} but are incorrect in the shell: {code:java} BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe'); ++ | EXPR$0 | ++ | hehe__hehe | ++ {code} {code:java} BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe'); ++ | EXPR$0 | ++ | hehe__heh | ++ {code} > String operations yield incorrect results when executed through SQL shell > - > > Key: BEAM-4704 > URL: https://issues.apache.org/jira/browse/BEAM-4704 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > > {{TRIM}} is defined to trim _all_ the characters in the first string from the > string-to-be-trimmed. Calcite has an incorrect implementation of this. We use > our own fixed implementation. But when executed through the SQL shell, the > results do not match what we get from the PTransform path. Here two test > cases that pass on {{master}} but are incorrect in the shell: > {code:sql} > BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe'); > ++ > | EXPR$0 | > ++ > | hehe__hehe | > ++ > {code} > {code:sql} > BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe'); > ++ > | EXPR$0 | > ++ > | hehe__heh | > ++ > {code} > {code:sql} > BeamSQL> select TRIM(BOTH 'eh' FROM 'hehe__hehe'); > ++ > | EXPR$0 | > ++ > | hehe__heh | > ++ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4704) String operations yield incorrect results when executed through SQL shell
[ https://issues.apache.org/jira/browse/BEAM-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-4704: -- Description: {{TRIM}} is defined to trim _all_ the characters in the first string from the string-to-be-trimmed. Calcite has an incorrect implementation of this. We use our own fixed implementation. But when executed through the SQL shell, the results do not match what we get from the PTransform path. Here are two test cases that pass on {{master}} but are incorrect in the shell: {code:java} BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe'); ++ | EXPR$0 | ++ | hehe__hehe | ++ {code} {code:java} BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe'); ++ | EXPR$0 | ++ | hehe__heh | ++ {code} was: {{TRIM}} is defined to trim _all_ the characters in a string. Calcite has an incorrect implementation of this. We use our own fixed implementation. But when executed through the SQL shell, the results do not match what we get from the PTransform path. Here are two test cases that pass on {{master}} but are incorrect in the shell: {code} BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe'); ++ | EXPR$0 | ++ | hehe__hehe | ++ {code} {code} BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe'); ++ | EXPR$0 | ++ | hehe__heh | ++ {code} > String operations yield incorrect results when executed through SQL shell > - > > Key: BEAM-4704 > URL: https://issues.apache.org/jira/browse/BEAM-4704 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > > {{TRIM}} is defined to trim _all_ the characters in the first string from the > string-to-be-trimmed. Calcite has an incorrect implementation of this. We use > our own fixed implementation. But when executed through the SQL shell, the > results do not match what we get from the PTransform path. Here are two test > cases that pass on {{master}} but are incorrect in the shell: > {code:java} > BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe'); > ++ > | EXPR$0 | > ++ > | hehe__hehe | > ++ > {code} > {code:java} > BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe'); > ++ > | EXPR$0 | > ++ > | hehe__heh | > ++ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4704) String operations yield incorrect results when executed through SQL shell
[ https://issues.apache.org/jira/browse/BEAM-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-4704: -- Issue Type: Bug (was: New Feature) > String operations yield incorrect results when executed through SQL shell > - > > Key: BEAM-4704 > URL: https://issues.apache.org/jira/browse/BEAM-4704 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > > {{TRIM}} is defined to trim _all_ the characters in a string. Calcite has an > incorrect implementation of this. We use our own fixed implementation. But > when executed through the SQL shell, the results do not match what we get > from the PTransform path. Here are two test cases that pass on {{master}} but > are incorrect in the shell: > {code} > BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe'); > ++ > | EXPR$0 | > ++ > | hehe__hehe | > ++ > {code} > {code} > BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe'); > ++ > | EXPR$0 | > ++ > | hehe__heh | > ++ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4704) String operations yield incorrect results when executed through SQL shell
Kenneth Knowles created BEAM-4704: - Summary: String operations yield incorrect results when executed through SQL shell Key: BEAM-4704 URL: https://issues.apache.org/jira/browse/BEAM-4704 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Kenneth Knowles Assignee: Kenneth Knowles {{TRIM}} is defined to trim _all_ the characters in a string. Calcite has an incorrect implementation of this. We use our own fixed implementation. But when executed through the SQL shell, the results do not match what we get from the PTransform path. Here are two test cases that pass on {{master}} but are incorrect in the shell: {code} BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe'); ++ | EXPR$0 | ++ | hehe__hehe | ++ {code} {code} BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe'); ++ | EXPR$0 | ++ | hehe__heh | ++ {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4703) SQL operators should have overloading resolved statically
Kenneth Knowles created BEAM-4703: - Summary: SQL operators should have overloading resolved statically Key: BEAM-4703 URL: https://issues.apache.org/jira/browse/BEAM-4703 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Kenneth Knowles Currently, operators like {{>, }}{{<=}}, and others dynamically dispatch on the type of their operands. But SQL is statically typed, and we should both validate the types (if Calcite doesn't do it for us) and gain performance by avoiding this dispatch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4702) After SQL GROUP BY the result should be globally windowed
Kenneth Knowles created BEAM-4702: - Summary: After SQL GROUP BY the result should be globally windowed Key: BEAM-4702 URL: https://issues.apache.org/jira/browse/BEAM-4702 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Kenneth Knowles Assignee: Kenneth Knowles -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4700) JDBC driver cannot support TIMESTAMP data type
[ https://issues.apache.org/jira/browse/BEAM-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529250#comment-16529250 ] Kenneth Knowles commented on BEAM-4700: --- Downgrading to "Major" to get some better test coverage. > JDBC driver cannot support TIMESTAMP data type > -- > > Key: BEAM-4700 > URL: https://issues.apache.org/jira/browse/BEAM-4700 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Avatica allows column representation to be customized, so a timestamp can be > stored as a variety of types. Joda ReadableInstant is none of these types: > https://github.com/apache/calcite-avatica/blob/acb675de97b9b0743c09368820a770e2ceda05f8/core/src/main/java/org/apache/calcite/avatica/util/AbstractCursor.java#L162 > By default, it seems to be configured to store {{TIMESTAMP}} columns as > {{long}} values. If you run the SQL shell and select a {{TIMESTAMP}} column, > you get: > {code} > ava.lang.ClassCastException: org.joda.time.Instant cannot be cast to > java.lang.Number > at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$NumberAccessor.getNumber(AbstractCursor.java:726) > at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$TimestampFromNumberAccessor.getString(AbstractCursor.java:1026) > at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.AvaticaResultSet.getString(AvaticaResultSet.java:225) > at sqlline.Rows$Row.(Rows.java:183) > {code} > So, essentially, Beam SQL Shell does not support timestamps. > We may be able to: > - override how the accessor for our existing storage is created > - configure what the column representation is (this doesn't really help, > since none of the choices are ours) > - convert timestamps to longs in BeamEnumerableConverter; not sure how many > conversions will be required here -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4700) JDBC driver cannot support TIMESTAMP data type
[ https://issues.apache.org/jira/browse/BEAM-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-4700: -- Priority: Major (was: Blocker) > JDBC driver cannot support TIMESTAMP data type > -- > > Key: BEAM-4700 > URL: https://issues.apache.org/jira/browse/BEAM-4700 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Avatica allows column representation to be customized, so a timestamp can be > stored as a variety of types. Joda ReadableInstant is none of these types: > https://github.com/apache/calcite-avatica/blob/acb675de97b9b0743c09368820a770e2ceda05f8/core/src/main/java/org/apache/calcite/avatica/util/AbstractCursor.java#L162 > By default, it seems to be configured to store {{TIMESTAMP}} columns as > {{long}} values. If you run the SQL shell and select a {{TIMESTAMP}} column, > you get: > {code} > ava.lang.ClassCastException: org.joda.time.Instant cannot be cast to > java.lang.Number > at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$NumberAccessor.getNumber(AbstractCursor.java:726) > at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$TimestampFromNumberAccessor.getString(AbstractCursor.java:1026) > at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.AvaticaResultSet.getString(AvaticaResultSet.java:225) > at sqlline.Rows$Row.(Rows.java:183) > {code} > So, essentially, Beam SQL Shell does not support timestamps. > We may be able to: > - override how the accessor for our existing storage is created > - configure what the column representation is (this doesn't really help, > since none of the choices are ours) > - convert timestamps to longs in BeamEnumerableConverter; not sure how many > conversions will be required here -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4701) Run SQL DSL-level tests through JDBC driver
Kenneth Knowles created BEAM-4701: - Summary: Run SQL DSL-level tests through JDBC driver Key: BEAM-4701 URL: https://issues.apache.org/jira/browse/BEAM-4701 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Kenneth Knowles Assignee: Kenneth Knowles Currently there's a testing gap where tests at the DSL level mostly go through the QueryTransform. So we can hit issues where we have a mismatch in Calcite Avatica, which provides the JDBC layer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4700) JDBC driver cannot support TIMESTAMP data type
[ https://issues.apache.org/jira/browse/BEAM-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-4700: -- Issue Type: Bug (was: New Feature) > JDBC driver cannot support TIMESTAMP data type > -- > > Key: BEAM-4700 > URL: https://issues.apache.org/jira/browse/BEAM-4700 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Blocker > > Avatica allows column representation to be customized, so a timestamp can be > stored as a variety of types. Joda ReadableInstant is none of these types: > https://github.com/apache/calcite-avatica/blob/acb675de97b9b0743c09368820a770e2ceda05f8/core/src/main/java/org/apache/calcite/avatica/util/AbstractCursor.java#L162 > By default, it seems to be configured to store {{TIMESTAMP}} columns as > {{long}} values. If you run the SQL shell and select a {{TIMESTAMP}} column, > you get: > {code} > ava.lang.ClassCastException: org.joda.time.Instant cannot be cast to > java.lang.Number > at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$NumberAccessor.getNumber(AbstractCursor.java:726) > at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$TimestampFromNumberAccessor.getString(AbstractCursor.java:1026) > at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.AvaticaResultSet.getString(AvaticaResultSet.java:225) > at sqlline.Rows$Row.(Rows.java:183) > {code} > So, essentially, Beam SQL Shell does not support timestamps. > We may be able to: > - override how the accessor for our existing storage is created > - configure what the column representation is (this doesn't really help, > since none of the choices are ours) > - convert timestamps to longs in BeamEnumerableConverter; not sure how many > conversions will be required here -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4700) JDBC driver cannot support TIMESTAMP data type
Kenneth Knowles created BEAM-4700: - Summary: JDBC driver cannot support TIMESTAMP data type Key: BEAM-4700 URL: https://issues.apache.org/jira/browse/BEAM-4700 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Kenneth Knowles Assignee: Kenneth Knowles Avatica allows column representation to be customized, so a timestamp can be stored as a variety of types. Joda ReadableInstant is none of these types: https://github.com/apache/calcite-avatica/blob/acb675de97b9b0743c09368820a770e2ceda05f8/core/src/main/java/org/apache/calcite/avatica/util/AbstractCursor.java#L162 By default, it seems to be configured to store {{TIMESTAMP}} columns as {{long}} values. If you run the SQL shell and select a {{TIMESTAMP}} column, you get: {code} ava.lang.ClassCastException: org.joda.time.Instant cannot be cast to java.lang.Number at org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$NumberAccessor.getNumber(AbstractCursor.java:726) at org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$TimestampFromNumberAccessor.getString(AbstractCursor.java:1026) at org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.AvaticaResultSet.getString(AvaticaResultSet.java:225) at sqlline.Rows$Row.(Rows.java:183) {code} So, essentially, Beam SQL Shell does not support timestamps. We may be able to: - override how the accessor for our existing storage is created - configure what the column representation is (this doesn't really help, since none of the choices are ours) - convert timestamps to longs in BeamEnumerableConverter; not sure how many conversions will be required here -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4699) BeamFileSystemArtifactServicesTest.putArtifactsSingleSmallFileTest flake
Kenneth Knowles created BEAM-4699: - Summary: BeamFileSystemArtifactServicesTest.putArtifactsSingleSmallFileTest flake Key: BEAM-4699 URL: https://issues.apache.org/jira/browse/BEAM-4699 Project: Beam Issue Type: New Feature Components: runner-core Reporter: Kenneth Knowles Assignee: Henning Rohde I've seen a few transient failures from {{BeamFileSystemArtifactServicesTest}}. I don't recall if they are all {{putArtifactsSingleSmallFileTest}} or how often they occur. https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/ {code} java.io.FileNotFoundException: /tmp/junit8499382858780569091/staging/123/artifacts/artifact_c147efcfc2d7ea666a9e4f5187b115c90903f0fc896a56df9a6ef5d8f3fc9f31 (No such file or directory) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4698) SchemaTest not sickbayed for Samza runner
Kenneth Knowles created BEAM-4698: - Summary: SchemaTest not sickbayed for Samza runner Key: BEAM-4698 URL: https://issues.apache.org/jira/browse/BEAM-4698 Project: Beam Issue Type: New Feature Components: runner-samza Reporter: Kenneth Knowles Assignee: Reuven Lax https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/ This failed the SchemaTest, which we know all runners will fail since it uses a different entry point for {{SimpleDoFnRunner}}. It should be excluded from their test suites. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4695) Add Samza to "works with" on the Beam site landing page
Kenneth Knowles created BEAM-4695: - Summary: Add Samza to "works with" on the Beam site landing page Key: BEAM-4695 URL: https://issues.apache.org/jira/browse/BEAM-4695 Project: Beam Issue Type: New Feature Components: website Reporter: Kenneth Knowles Assignee: Xinyu Liu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-4652) PubsubIO: create subscription on different project than the topic
[ https://issues.apache.org/jira/browse/BEAM-4652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles resolved BEAM-4652. --- Resolution: Fixed Fix Version/s: 2.6.0 > PubsubIO: create subscription on different project than the topic > - > > Key: BEAM-4652 > URL: https://issues.apache.org/jira/browse/BEAM-4652 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Critical > Fix For: 2.6.0 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > If you try to read a public pubsub topic in the DirectRunner, it will fail > with 403 when trying to create a subscription. This is because it tries to > create a subscription on the shared public data set. > There is an example used in > https://github.com/googlecodelabs/cloud-dataflow-nyc-taxi-tycoon and the > dataset is {{projects/pubsub-public-data/topics/taxirides-realtime}}. I > discovered that I could not read this in the DirectRunner even though the > codelab works. But that 1.x codelab also does not work in the > InProcessPipelineRunner, so it has been broken all along. > So you cannot read public data or any other read-only data using PubsubIO. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4622) Many Beam SQL expressions never have their validation called
[ https://issues.apache.org/jira/browse/BEAM-4622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-4622: - Assignee: Alexey Romanenko > Many Beam SQL expressions never have their validation called > > > Key: BEAM-4622 > URL: https://issues.apache.org/jira/browse/BEAM-4622 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Alexey Romanenko >Priority: Major > Labels: easyfix, newbie, starter > > In {{BeamSqlFnExecutor}} there is a pattern where first the returned > expression is assigned to a variable {{ret}} and then after a giant switch > statement the validation is invoked. But there are many code paths that just > call {{return}} and skip validation. This should be refactored so it is > impossible to short-circuit on accident like this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4689) Dataflow cannot deserialize SplittableParDo DoFns
[ https://issues.apache.org/jira/browse/BEAM-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-4689: -- Summary: Dataflow cannot deserialize SplittableParDo DoFns (was: Dataflow postcommit broken) > Dataflow cannot deserialize SplittableParDo DoFns > - > > Key: BEAM-4689 > URL: https://issues.apache.org/jira/browse/BEAM-4689 > Project: Beam > Issue Type: New Feature > Components: runner-dataflow >Reporter: Kenneth Knowles >Assignee: Eugene Kirpichov >Priority: Blocker > > The Dataflow postcommit is broken in a way that seems real and user-impacting: > https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/testReport/junit/org.apache.beam.sdk.transforms/SplittableDoFnTest/testSideInput/ > {code} > Caused by: java.lang.IllegalArgumentException: unable to deserialize > Serialized DoFnInfo > ... > Caused by: java.io.InvalidClassException: > org.apache.beam.runners.core.construction.SplittableParDo$RandomUniqueKeyFn; > local class incompatible: stream classdesc serialVersionUID = > 6068396661487412884, local class serialVersionUID = -617521663543732196 > {code} > This means that the worker is using a version of the class from its own > classpath, not the version from the user's staged pipeline. It implies that > the worker is not shading runners-core-construction. Because that is where a > ton of utility DoFns live, it is critical that it be shaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4689) Dataflow postcommit broken
[ https://issues.apache.org/jira/browse/BEAM-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-4689: - Assignee: Eugene Kirpichov (was: Thomas Groh) > Dataflow postcommit broken > -- > > Key: BEAM-4689 > URL: https://issues.apache.org/jira/browse/BEAM-4689 > Project: Beam > Issue Type: New Feature > Components: runner-dataflow >Reporter: Kenneth Knowles >Assignee: Eugene Kirpichov >Priority: Blocker > > The Dataflow postcommit is broken in a way that seems real and user-impacting: > https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/testReport/junit/org.apache.beam.sdk.transforms/SplittableDoFnTest/testSideInput/ > {code} > Caused by: java.lang.IllegalArgumentException: unable to deserialize > Serialized DoFnInfo > ... > Caused by: java.io.InvalidClassException: > org.apache.beam.runners.core.construction.SplittableParDo$RandomUniqueKeyFn; > local class incompatible: stream classdesc serialVersionUID = > 6068396661487412884, local class serialVersionUID = -617521663543732196 > {code} > This means that the worker is using a version of the class from its own > classpath, not the version from the user's staged pipeline. It implies that > the worker is not shading runners-core-construction. Because that is where a > ton of utility DoFns live, it is critical that it be shaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4689) Dataflow postcommit broken
Kenneth Knowles created BEAM-4689: - Summary: Dataflow postcommit broken Key: BEAM-4689 URL: https://issues.apache.org/jira/browse/BEAM-4689 Project: Beam Issue Type: New Feature Components: runner-dataflow Reporter: Kenneth Knowles Assignee: Thomas Groh The Dataflow postcommit is broken in a way that seems real and user-impacting: https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/testReport/junit/org.apache.beam.sdk.transforms/SplittableDoFnTest/testSideInput/ {code} Caused by: java.lang.IllegalArgumentException: unable to deserialize Serialized DoFnInfo ... Caused by: java.io.InvalidClassException: org.apache.beam.runners.core.construction.SplittableParDo$RandomUniqueKeyFn; local class incompatible: stream classdesc serialVersionUID = 6068396661487412884, local class serialVersionUID = -617521663543732196 {code} This means that the worker is using a version of the class from its own classpath, not the version from the user's staged pipeline. It implies that the worker is not shading runners-core-construction. Because that is where a ton of utility DoFns live, it is critical that it be shaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005)