[jira] [Reopened] (BEAM-5709) Tests in BeamFnControlServiceTest are flaky.

2018-10-12 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reopened BEAM-5709:
---

> Tests in BeamFnControlServiceTest are flaky.
> 
>
> Key: BEAM-5709
> URL: https://issues.apache.org/jira/browse/BEAM-5709
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlServiceTest.java
> Tests for BeamFnControlService are currently flaky. The test attempts to 
> verify that onCompleted was called on the mock streams, but that function 
> gets called on a separate thread, so occasionally the function will not have 
> been called yet, despite the server being shut down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5709) Tests in BeamFnControlServiceTest are flaky.

2018-10-12 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-5709.
---
   Resolution: Fixed
Fix Version/s: Not applicable

> Tests in BeamFnControlServiceTest are flaky.
> 
>
> Key: BEAM-5709
> URL: https://issues.apache.org/jira/browse/BEAM-5709
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlServiceTest.java
> Tests for BeamFnControlService are currently flaky. The test attempts to 
> verify that onCompleted was called on the mock streams, but that function 
> gets called on a separate thread, so occasionally the function will not have 
> been called yet, despite the server being shut down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5098) Combine.Globally::asSingletonView clears side inputs

2018-10-11 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646829#comment-16646829
 ] 

Kenneth Knowles commented on BEAM-5098:
---

[~mpedersen] this one might be a quick fix. Would you be interested in 
contributing, perhaps?

> Combine.Globally::asSingletonView clears side inputs
> 
>
> Key: BEAM-5098
> URL: https://issues.apache.org/jira/browse/BEAM-5098
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.5.0
>Reporter: Mike Pedersen
>Assignee: Kenneth Knowles
>Priority: Critical
>  Labels: beginner, starter
>
> It seems like calling .asSingletonView on Combine.Globally clears all side 
> inputs. Take this code for example:
>  
> {code:java}
> public class Main {
> public static void main(String[] args) {
>     PipelineOptions options = PipelineOptionsFactory.create();
>     Pipeline p = Pipeline.create(options);
>     PCollection a = p.apply(Create.of(1, 2, 3));
>     PCollectionView b = 
> p.apply(Create.of(10)).apply(View.asSingleton());
>     a
>     .apply(Combine.globally(new 
> CombineWithContext.CombineFnWithContext() {
>     @Override
>     public Integer 
> createAccumulator(CombineWithContext.Context c) {
>     return c.sideInput(b);
>     }
>     @Override
>     public Integer addInput(Integer accumulator, Integer 
> input, CombineWithContext.Context c) {
>     return accumulator + input;
>     }
>     @Override
>     public Integer mergeAccumulators(Iterable 
> accumulators, CombineWithContext.Context c) {
>     int sum = 0;
>     for (int i : accumulators) {
>     sum += i;
>     }
>     return sum;
>     }
>     @Override
>     public Integer extractOutput(Integer accumulator, 
> CombineWithContext.Context c) {
>     return accumulator;
>     }
>     @Override
>     public Integer defaultValue() {
>     return 0;
>     }
>     }).withSideInputs(b).asSingletonView());
>     p.run().waitUntilFinish();
>     }
> }{code}
> This fails with the following exception:
> {code:java}
> Exception in thread "main" 
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.IllegalArgumentException: calling sideInput() with unknown view
>     at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:349)
>     at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:319)
>     at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:210)
>     at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:66)
>     at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
>     at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
>     at Main.main(Main.java:287)
> Caused by: java.lang.IllegalArgumentException: calling sideInput() with 
> unknown view
>     at 
> org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.sideInput(SimpleDoFnRunner.java:212)
>     at 
> org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.access$900(SimpleDoFnRunner.java:69)
>     at 
> org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner$DoFnProcessContext.sideInput(SimpleDoFnRunner.java:489)
>     at 
> org.apache.beam.sdk.transforms.Combine$GroupedValues$1$1.sideInput(Combine.java:2137)
>     at Main$1.createAccumulator(Main.java:258)
>     at Main$1.createAccumulator(Main.java:255)
>     at 
> org.apache.beam.sdk.transforms.CombineWithContext$CombineFnWithContext.apply(CombineWithContext.java:120)
>     at 
> org.apache.beam.sdk.transforms.Combine$GroupedValues$1.processElement(Combine.java:2129){code}
> But if you change
> {code:java}
> .withSideInputs(b).asSingletonView()){code}
> to
> {code:java}
> .withSideInputs(b)).apply(View.asSingleton()){code}
> then it works just fine.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5098) Combine.Globally::asSingletonView clears side inputs

2018-10-11 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-5098:
--
Component/s: (was: beam-model)
 sdk-java-core

> Combine.Globally::asSingletonView clears side inputs
> 
>
> Key: BEAM-5098
> URL: https://issues.apache.org/jira/browse/BEAM-5098
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.5.0
>Reporter: Mike Pedersen
>Assignee: Kenneth Knowles
>Priority: Critical
>  Labels: beginner, starter
>
> It seems like calling .asSingletonView on Combine.Globally clears all side 
> inputs. Take this code for example:
>  
> {code:java}
> public class Main {
> public static void main(String[] args) {
>     PipelineOptions options = PipelineOptionsFactory.create();
>     Pipeline p = Pipeline.create(options);
>     PCollection a = p.apply(Create.of(1, 2, 3));
>     PCollectionView b = 
> p.apply(Create.of(10)).apply(View.asSingleton());
>     a
>     .apply(Combine.globally(new 
> CombineWithContext.CombineFnWithContext() {
>     @Override
>     public Integer 
> createAccumulator(CombineWithContext.Context c) {
>     return c.sideInput(b);
>     }
>     @Override
>     public Integer addInput(Integer accumulator, Integer 
> input, CombineWithContext.Context c) {
>     return accumulator + input;
>     }
>     @Override
>     public Integer mergeAccumulators(Iterable 
> accumulators, CombineWithContext.Context c) {
>     int sum = 0;
>     for (int i : accumulators) {
>     sum += i;
>     }
>     return sum;
>     }
>     @Override
>     public Integer extractOutput(Integer accumulator, 
> CombineWithContext.Context c) {
>     return accumulator;
>     }
>     @Override
>     public Integer defaultValue() {
>     return 0;
>     }
>     }).withSideInputs(b).asSingletonView());
>     p.run().waitUntilFinish();
>     }
> }{code}
> This fails with the following exception:
> {code:java}
> Exception in thread "main" 
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.IllegalArgumentException: calling sideInput() with unknown view
>     at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:349)
>     at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:319)
>     at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:210)
>     at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:66)
>     at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
>     at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
>     at Main.main(Main.java:287)
> Caused by: java.lang.IllegalArgumentException: calling sideInput() with 
> unknown view
>     at 
> org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.sideInput(SimpleDoFnRunner.java:212)
>     at 
> org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.access$900(SimpleDoFnRunner.java:69)
>     at 
> org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner$DoFnProcessContext.sideInput(SimpleDoFnRunner.java:489)
>     at 
> org.apache.beam.sdk.transforms.Combine$GroupedValues$1$1.sideInput(Combine.java:2137)
>     at Main$1.createAccumulator(Main.java:258)
>     at Main$1.createAccumulator(Main.java:255)
>     at 
> org.apache.beam.sdk.transforms.CombineWithContext$CombineFnWithContext.apply(CombineWithContext.java:120)
>     at 
> org.apache.beam.sdk.transforms.Combine$GroupedValues$1.processElement(Combine.java:2129){code}
> But if you change
> {code:java}
> .withSideInputs(b).asSingletonView()){code}
> to
> {code:java}
> .withSideInputs(b)).apply(View.asSingleton()){code}
> then it works just fine.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5098) Combine.Globally::asSingletonView clears side inputs

2018-10-11 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-5098:
--
Labels: beginner starter  (was: )

> Combine.Globally::asSingletonView clears side inputs
> 
>
> Key: BEAM-5098
> URL: https://issues.apache.org/jira/browse/BEAM-5098
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.5.0
>Reporter: Mike Pedersen
>Assignee: Kenneth Knowles
>Priority: Critical
>  Labels: beginner, starter
>
> It seems like calling .asSingletonView on Combine.Globally clears all side 
> inputs. Take this code for example:
>  
> {code:java}
> public class Main {
> public static void main(String[] args) {
>     PipelineOptions options = PipelineOptionsFactory.create();
>     Pipeline p = Pipeline.create(options);
>     PCollection a = p.apply(Create.of(1, 2, 3));
>     PCollectionView b = 
> p.apply(Create.of(10)).apply(View.asSingleton());
>     a
>     .apply(Combine.globally(new 
> CombineWithContext.CombineFnWithContext() {
>     @Override
>     public Integer 
> createAccumulator(CombineWithContext.Context c) {
>     return c.sideInput(b);
>     }
>     @Override
>     public Integer addInput(Integer accumulator, Integer 
> input, CombineWithContext.Context c) {
>     return accumulator + input;
>     }
>     @Override
>     public Integer mergeAccumulators(Iterable 
> accumulators, CombineWithContext.Context c) {
>     int sum = 0;
>     for (int i : accumulators) {
>     sum += i;
>     }
>     return sum;
>     }
>     @Override
>     public Integer extractOutput(Integer accumulator, 
> CombineWithContext.Context c) {
>     return accumulator;
>     }
>     @Override
>     public Integer defaultValue() {
>     return 0;
>     }
>     }).withSideInputs(b).asSingletonView());
>     p.run().waitUntilFinish();
>     }
> }{code}
> This fails with the following exception:
> {code:java}
> Exception in thread "main" 
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.IllegalArgumentException: calling sideInput() with unknown view
>     at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:349)
>     at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:319)
>     at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:210)
>     at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:66)
>     at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
>     at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
>     at Main.main(Main.java:287)
> Caused by: java.lang.IllegalArgumentException: calling sideInput() with 
> unknown view
>     at 
> org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.sideInput(SimpleDoFnRunner.java:212)
>     at 
> org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.access$900(SimpleDoFnRunner.java:69)
>     at 
> org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner$DoFnProcessContext.sideInput(SimpleDoFnRunner.java:489)
>     at 
> org.apache.beam.sdk.transforms.Combine$GroupedValues$1$1.sideInput(Combine.java:2137)
>     at Main$1.createAccumulator(Main.java:258)
>     at Main$1.createAccumulator(Main.java:255)
>     at 
> org.apache.beam.sdk.transforms.CombineWithContext$CombineFnWithContext.apply(CombineWithContext.java:120)
>     at 
> org.apache.beam.sdk.transforms.Combine$GroupedValues$1.processElement(Combine.java:2129){code}
> But if you change
> {code:java}
> .withSideInputs(b).asSingletonView()){code}
> to
> {code:java}
> .withSideInputs(b)).apply(View.asSingleton()){code}
> then it works just fine.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3656) Port DatastoreV1Test off DoFnTester

2018-10-11 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-3656:
-

Assignee: Sergei Puzanov

> Port DatastoreV1Test off DoFnTester
> ---
>
> Key: BEAM-3656
> URL: https://issues.apache.org/jira/browse/BEAM-3656
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Kenneth Knowles
>Assignee: Sergei Puzanov
>Priority: Major
>  Labels: beginner, newbie, starter
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3656) Port DatastoreV1Test off DoFnTester

2018-10-11 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646698#comment-16646698
 ] 

Kenneth Knowles commented on BEAM-3656:
---

Yes. Thanks!

> Port DatastoreV1Test off DoFnTester
> ---
>
> Key: BEAM-3656
> URL: https://issues.apache.org/jira/browse/BEAM-3656
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Kenneth Knowles
>Assignee: Sergei Puzanov
>Priority: Major
>  Labels: beginner, newbie, starter
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5156) Apache Beam on dataflow runner can't find Tensorflow for workers

2018-10-10 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-5156:
-

Assignee: Ankur Goenka  (was: Robert Bradshaw)

> Apache Beam on dataflow runner can't find Tensorflow for workers
> 
>
> Key: BEAM-5156
> URL: https://issues.apache.org/jira/browse/BEAM-5156
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
> Environment: google cloud compute instance running linux
>Reporter: Robert Bradshaw
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.5.0, 2.6.0
>
>
> Adding serialized tensorflow model to apache beam pipeline with python sdk 
> but it can not find any version of tensorflow when applied to dataflow runner 
> although it is not a problem locally. Tried various versions of tensorflow 
> from 1.6 to 1.10. I thought it might be a conflicting package some where so I 
> removed all other packages and tried to just install tensorflow and same 
> problem.
> Could not find a version that satisfies the requirement tensorflow==1.6.0 
> (from -r reqtest.txt (line 59)) (from versions: )No matching distribution 
> found for tensorflow==1.6.0 (from -r reqtest.txt (line 59))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5156) Apache Beam on dataflow runner can't find Tensorflow for workers

2018-10-10 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-5156:
--
Reporter: Thomas Johns  (was: Robert Bradshaw)

> Apache Beam on dataflow runner can't find Tensorflow for workers
> 
>
> Key: BEAM-5156
> URL: https://issues.apache.org/jira/browse/BEAM-5156
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
> Environment: google cloud compute instance running linux
>Reporter: Thomas Johns
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.5.0, 2.6.0
>
>
> Adding serialized tensorflow model to apache beam pipeline with python sdk 
> but it can not find any version of tensorflow when applied to dataflow runner 
> although it is not a problem locally. Tried various versions of tensorflow 
> from 1.6 to 1.10. I thought it might be a conflicting package some where so I 
> removed all other packages and tried to just install tensorflow and same 
> problem.
> Could not find a version that satisfies the requirement tensorflow==1.6.0 
> (from -r reqtest.txt (line 59)) (from versions: )No matching distribution 
> found for tensorflow==1.6.0 (from -r reqtest.txt (line 59))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5156) Apache Beam on dataflow runner can't find Tensorflow for workers

2018-10-10 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-5156:
-

Assignee: Robert Bradshaw  (was: Kenneth Knowles)

> Apache Beam on dataflow runner can't find Tensorflow for workers
> 
>
> Key: BEAM-5156
> URL: https://issues.apache.org/jira/browse/BEAM-5156
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
> Environment: google cloud compute instance running linux
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: 2.5.0, 2.6.0
>
>
> Adding serialized tensorflow model to apache beam pipeline with python sdk 
> but it can not find any version of tensorflow when applied to dataflow runner 
> although it is not a problem locally. Tried various versions of tensorflow 
> from 1.6 to 1.10. I thought it might be a conflicting package some where so I 
> removed all other packages and tried to just install tensorflow and same 
> problem.
> Could not find a version that satisfies the requirement tensorflow==1.6.0 
> (from -r reqtest.txt (line 59)) (from versions: )No matching distribution 
> found for tensorflow==1.6.0 (from -r reqtest.txt (line 59))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5162) Document Metrics API for users

2018-10-10 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-5162:
--
Priority: Major  (was: Minor)

> Document Metrics API for users
> --
>
> Key: BEAM-5162
> URL: https://issues.apache.org/jira/browse/BEAM-5162
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, website
>Reporter: Maximilian Michels
>Assignee: Kenneth Knowles
>Priority: Major
>
> The Metrics API is currently only documented in Beam's JavaDocs. A 
> complementary user documentation with examples would be desirable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5336) Support for a Dask runner

2018-10-10 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-5336:
-

Assignee: (was: Kenneth Knowles)

> Support for a Dask runner
> -
>
> Key: BEAM-5336
> URL: https://issues.apache.org/jira/browse/BEAM-5336
> Project: Beam
>  Issue Type: Wish
>  Components: runner-ideas
> Environment: Python
>Reporter: Georvic Tur
>Priority: Trivial
>
> Adding support for a Dask runner is currently under consideration?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5336) Support for a Dask runner

2018-10-10 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645941#comment-16645941
 ] 

Kenneth Knowles commented on BEAM-5336:
---

Contributions welcome :)

> Support for a Dask runner
> -
>
> Key: BEAM-5336
> URL: https://issues.apache.org/jira/browse/BEAM-5336
> Project: Beam
>  Issue Type: Wish
>  Components: runner-ideas
> Environment: Python
>Reporter: Georvic Tur
>Assignee: Kenneth Knowles
>Priority: Trivial
>
> Adding support for a Dask runner is currently under consideration?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5162) Document Metrics API for users

2018-10-10 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645940#comment-16645940
 ] 

Kenneth Knowles commented on BEAM-5162:
---

I'm not sure of a good owner, but I bumped the priority.

> Document Metrics API for users
> --
>
> Key: BEAM-5162
> URL: https://issues.apache.org/jira/browse/BEAM-5162
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, website
>Reporter: Maximilian Michels
>Priority: Major
>
> The Metrics API is currently only documented in Beam's JavaDocs. A 
> complementary user documentation with examples would be desirable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5162) Document Metrics API for users

2018-10-10 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-5162:
-

Assignee: (was: Kenneth Knowles)

> Document Metrics API for users
> --
>
> Key: BEAM-5162
> URL: https://issues.apache.org/jira/browse/BEAM-5162
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, website
>Reporter: Maximilian Michels
>Priority: Major
>
> The Metrics API is currently only documented in Beam's JavaDocs. A 
> complementary user documentation with examples would be desirable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5098) Combine.Globally::asSingletonView clears side inputs

2018-10-10 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-5098:
--
Priority: Critical  (was: Major)

> Combine.Globally::asSingletonView clears side inputs
> 
>
> Key: BEAM-5098
> URL: https://issues.apache.org/jira/browse/BEAM-5098
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Affects Versions: 2.5.0
>Reporter: Mike Pedersen
>Assignee: Kenneth Knowles
>Priority: Critical
>
> It seems like calling .asSingletonView on Combine.Globally clears all side 
> inputs. Take this code for example:
>  
> {code:java}
> public class Main {
> public static void main(String[] args) {
>     PipelineOptions options = PipelineOptionsFactory.create();
>     Pipeline p = Pipeline.create(options);
>     PCollection a = p.apply(Create.of(1, 2, 3));
>     PCollectionView b = 
> p.apply(Create.of(10)).apply(View.asSingleton());
>     a
>     .apply(Combine.globally(new 
> CombineWithContext.CombineFnWithContext() {
>     @Override
>     public Integer 
> createAccumulator(CombineWithContext.Context c) {
>     return c.sideInput(b);
>     }
>     @Override
>     public Integer addInput(Integer accumulator, Integer 
> input, CombineWithContext.Context c) {
>     return accumulator + input;
>     }
>     @Override
>     public Integer mergeAccumulators(Iterable 
> accumulators, CombineWithContext.Context c) {
>     int sum = 0;
>     for (int i : accumulators) {
>     sum += i;
>     }
>     return sum;
>     }
>     @Override
>     public Integer extractOutput(Integer accumulator, 
> CombineWithContext.Context c) {
>     return accumulator;
>     }
>     @Override
>     public Integer defaultValue() {
>     return 0;
>     }
>     }).withSideInputs(b).asSingletonView());
>     p.run().waitUntilFinish();
>     }
> }{code}
> This fails with the following exception:
> {code:java}
> Exception in thread "main" 
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.IllegalArgumentException: calling sideInput() with unknown view
>     at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:349)
>     at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:319)
>     at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:210)
>     at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:66)
>     at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
>     at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
>     at Main.main(Main.java:287)
> Caused by: java.lang.IllegalArgumentException: calling sideInput() with 
> unknown view
>     at 
> org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.sideInput(SimpleDoFnRunner.java:212)
>     at 
> org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.access$900(SimpleDoFnRunner.java:69)
>     at 
> org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner$DoFnProcessContext.sideInput(SimpleDoFnRunner.java:489)
>     at 
> org.apache.beam.sdk.transforms.Combine$GroupedValues$1$1.sideInput(Combine.java:2137)
>     at Main$1.createAccumulator(Main.java:258)
>     at Main$1.createAccumulator(Main.java:255)
>     at 
> org.apache.beam.sdk.transforms.CombineWithContext$CombineFnWithContext.apply(CombineWithContext.java:120)
>     at 
> org.apache.beam.sdk.transforms.Combine$GroupedValues$1.processElement(Combine.java:2129){code}
> But if you change
> {code:java}
> .withSideInputs(b).asSingletonView()){code}
> to
> {code:java}
> .withSideInputs(b)).apply(View.asSingleton()){code}
> then it works just fine.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5156) Apache Beam on dataflow runner can't find Tensorflow for workers

2018-10-10 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-5156:
--
Reporter: Robert Bradshaw  (was: Thomas Johns)

> Apache Beam on dataflow runner can't find Tensorflow for workers
> 
>
> Key: BEAM-5156
> URL: https://issues.apache.org/jira/browse/BEAM-5156
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
> Environment: google cloud compute instance running linux
>Reporter: Robert Bradshaw
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.5.0, 2.6.0
>
>
> Adding serialized tensorflow model to apache beam pipeline with python sdk 
> but it can not find any version of tensorflow when applied to dataflow runner 
> although it is not a problem locally. Tried various versions of tensorflow 
> from 1.6 to 1.10. I thought it might be a conflicting package some where so I 
> removed all other packages and tried to just install tensorflow and same 
> problem.
> Could not find a version that satisfies the requirement tensorflow==1.6.0 
> (from -r reqtest.txt (line 59)) (from versions: )No matching distribution 
> found for tensorflow==1.6.0 (from -r reqtest.txt (line 59))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5156) Apache Beam on dataflow runner can't find Tensorflow for workers

2018-10-10 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-5156:
--
Component/s: (was: beam-model)
 sdk-py-core

> Apache Beam on dataflow runner can't find Tensorflow for workers
> 
>
> Key: BEAM-5156
> URL: https://issues.apache.org/jira/browse/BEAM-5156
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
> Environment: google cloud compute instance running linux
>Reporter: Robert Bradshaw
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.5.0, 2.6.0
>
>
> Adding serialized tensorflow model to apache beam pipeline with python sdk 
> but it can not find any version of tensorflow when applied to dataflow runner 
> although it is not a problem locally. Tried various versions of tensorflow 
> from 1.6 to 1.10. I thought it might be a conflicting package some where so I 
> removed all other packages and tried to just install tensorflow and same 
> problem.
> Could not find a version that satisfies the requirement tensorflow==1.6.0 
> (from -r reqtest.txt (line 59)) (from versions: )No matching distribution 
> found for tensorflow==1.6.0 (from -r reqtest.txt (line 59))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4845) Nexmark suites do not compile

2018-10-10 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-4845.
---
   Resolution: Fixed
Fix Version/s: Not applicable

> Nexmark suites do not compile
> -
>
> Key: BEAM-4845
> URL: https://issues.apache.org/jira/browse/BEAM-4845
> Project: Beam
>  Issue Type: Bug
>  Components: examples-nexmark
>Reporter: Lukasz Gajowy
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This is due to this PR: [https://github.com/apache/beam/pull/5341.] Some 
> interfaces/classes had their visibility changed and this caused nexmark to 
> fail at compilation phase. 
> [https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_Nexmark_Direct/112/console]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2567) Port triggers design doc to a contributor technical reference

2018-10-10 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles closed BEAM-2567.
-
   Resolution: Won't Fix
Fix Version/s: Not applicable

Sink triggers are the way to go - no value is proselytizing this.

> Port triggers design doc to a contributor technical reference
> -
>
> Key: BEAM-2567
> URL: https://issues.apache.org/jira/browse/BEAM-2567
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: Not applicable
>
>
> There is a fairly old doc at https://s.apache.org/beam-triggers that could be 
> a useful reference doc for contributors. Since we don't catalog these docs 
> anywhere, it should be surfaced in a useful form.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5176) FailOnWarnings behave differently between CLI and Intellij build

2018-10-10 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645909#comment-16645909
 ] 

Kenneth Knowles commented on BEAM-5176:
---

This is blocking me today so I will take it for a bit and report if I make any 
progress or if I have to give up.

> FailOnWarnings behave differently between CLI and Intellij build 
> -
>
> Key: BEAM-5176
> URL: https://issues.apache.org/jira/browse/BEAM-5176
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Etienne Chauchot
>Assignee: Kenneth Knowles
>Priority: Major
>
>  In command line the build passes but fails on the IDE because of warnings. 
> To make it pass I had to put false in failOnWarnings in ApplyJavaNature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5176) FailOnWarnings behave differently between CLI and Intellij build

2018-10-10 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-5176:
-

Assignee: Kenneth Knowles  (was: Luke Cwik)

> FailOnWarnings behave differently between CLI and Intellij build 
> -
>
> Key: BEAM-5176
> URL: https://issues.apache.org/jira/browse/BEAM-5176
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Etienne Chauchot
>Assignee: Kenneth Knowles
>Priority: Major
>
>  In command line the build passes but fails on the IDE because of warnings. 
> To make it pass I had to put false in failOnWarnings in ApplyJavaNature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2018-10-09 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643951#comment-16643951
 ] 

Kenneth Knowles edited comment on BEAM-5690 at 10/9/18 8:10 PM:


Yes, that is the issue. Here is the thread: 
https://lists.apache.org/thread.html/5a0a4317e5bbb66bc3012704ae1b17cd6dd5b9cac51fb95365b95153@%3Cuser.beam.apache.org%3E


was (Author: kenn):
Yes, that is the issue. Here is the thread: 
lhttps://lists.apache.org/thread.html/5a0a4317e5bbb66bc3012704ae1b17cd6dd5b9cac51fb95365b95153@%3Cuser.beam.apache.org%3E

> Issue with GroupByKey in BeamSql using SparkRunner
> --
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Priority: Major
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>KafkaSource (KafkaIO) 
>---> Windowing (FixedWindow 1min)
>---> BeamSql
>---> KafkaSink (KafkaIO)
>  
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>  
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00 0}
> {code}
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2018-10-09 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643951#comment-16643951
 ] 

Kenneth Knowles commented on BEAM-5690:
---

Yes, that is the issue. Here is the thread: 
lhttps://lists.apache.org/thread.html/5a0a4317e5bbb66bc3012704ae1b17cd6dd5b9cac51fb95365b95153@%3Cuser.beam.apache.org%3E

> Issue with GroupByKey in BeamSql using SparkRunner
> --
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Priority: Major
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>KafkaSource (KafkaIO) 
>---> Windowing (FixedWindow 1min)
>---> BeamSql
>---> KafkaSink (KafkaIO)
>  
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>  
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00 0}
> {code}
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2018-10-09 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643682#comment-16643682
 ] 

Kenneth Knowles commented on BEAM-5690:
---

CC [~kedin] [~apilloud] [~xumingming] [~mingmxu] [~amaliujia]

Since it is not reproduced in the Flink runner or Direct runner, the SQL 
implementation of GROUP BY is probably triggering some latent bug in the Spark 
runner's streaming mode.

> Issue with GroupByKey in BeamSql using SparkRunner
> --
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Priority: Major
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>KafkaSource (KafkaIO) 
>---> Windowing (FixedWindow 1min)
>---> BeamSql
>---> KafkaSink (KafkaIO)
>  
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>  
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00 0}
> {code}
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2018-10-09 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-5690:
-

Assignee: (was: Amit Sela)

> Issue with GroupByKey in BeamSql using SparkRunner
> --
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Priority: Major
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>KafkaSource (KafkaIO) 
>---> Windowing (FixedWindow 1min)
>---> BeamSql
>---> KafkaSink (KafkaIO)
>  
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>  
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00 0}
> {code}
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2018-10-09 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-5690:
-

 Summary: Issue with GroupByKey in BeamSql using SparkRunner
 Key: BEAM-5690
 URL: https://issues.apache.org/jira/browse/BEAM-5690
 Project: Beam
  Issue Type: Task
  Components: runner-spark
Reporter: Kenneth Knowles
Assignee: Amit Sela


Reported on user@

{quote}We are trying to setup a pipeline with using BeamSql and the trigger 
used is default (AfterWatermark crosses the window). 
Below is the pipeline:
  
   KafkaSource (KafkaIO) 
   ---> Windowing (FixedWindow 1min)
   ---> BeamSql
   ---> KafkaSink (KafkaIO)
 
We are using Spark Runner for this. 
The BeamSql query is:
{code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}

We are grouping by Col3 which is a string. It can hold values string[0-9]. 
 
The records are getting emitted out at 1 min to kafka sink, but the output 
record in kafka is not as expected.
Below is the output observed: (WST and WET are indicators for window start time 
and window end time)

{code}
{"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00 0}
{code}
{quote}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5639) Add exception handling to single message transforms in Python SDK

2018-10-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-5639:
-

Assignee: Jeff Klukas

> Add exception handling to single message transforms in Python SDK
> -
>
> Key: BEAM-5639
> URL: https://issues.apache.org/jira/browse/BEAM-5639
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Jeff Klukas
>Assignee: Jeff Klukas
>Priority: Minor
>
> Add methods to python SDK Map and FlatMap that allow users to specify 
> expected exceptions and tuple tags to associate with the with collections of 
> the successfully and unsuccessfully processed elements.
> See discussion on dev list:
> [https://lists.apache.org/thread.html/936ed2a5f2c01be066fd903abf70130625e0b8cf4028c11b89b8b23f@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5638) Add exception handling to single message transforms in Java SDK

2018-10-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-5638:
-

Assignee: Jeff Klukas  (was: Kenneth Knowles)

> Add exception handling to single message transforms in Java SDK
> ---
>
> Key: BEAM-5638
> URL: https://issues.apache.org/jira/browse/BEAM-5638
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jeff Klukas
>Assignee: Jeff Klukas
>Priority: Minor
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Add methods to MapElements, FlatMapElements, and Filter that allow users to 
> specify expected exceptions and tuple tags to associate with the with 
> collections of the successfully and unsuccessfully processed elements.
> See discussion on dev list:
> https://lists.apache.org/thread.html/936ed2a5f2c01be066fd903abf70130625e0b8cf4028c11b89b8b23f@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5526) Make ByteBuddyDoFnInvokerFactory injection strategy configurable + drop the singleton

2018-10-01 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634737#comment-16634737
 ] 

Kenneth Knowles commented on BEAM-5526:
---

Do we have an umbrella ticket for Java 11? Is there a dev thread about tracking 
it?

> Make ByteBuddyDoFnInvokerFactory injection strategy configurable + drop the 
> singleton
> -
>
> Key: BEAM-5526
> URL: https://issues.apache.org/jira/browse/BEAM-5526
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Romain Manni-Bucau
>Priority: Blocker
>
> org.apache.beam.sdk.transforms.reflect.DoFnInvokers + DoFnInvokerFactory 
> design is to be a SPI to let user plug their own bytecode manipulation 
> library, however in practise beam uses ByteBuddyDoFnInvokerFactory as a 
> singleton which makes all this design useless.
> ByteBuddyDoFnInvokerFactory is also not configurable at all - typically the 
> injection strategy so it assumes it runs in an environment and on a JVM where 
> it will work - it does not on java 11 for instance.
> This ticket is about fixing all these small inconsistency and blocker to tun 
> on java 11.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5526) Make ByteBuddyDoFnInvokerFactory injection strategy configurable + drop the singleton

2018-10-01 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634733#comment-16634733
 ] 

Kenneth Knowles commented on BEAM-5526:
---

[~romain.manni-bucau] you've marked this as a "blocker" priority. It is an 
implementation detail. {{DoFnInvoker}} and {{DoFnInvokerFactory}} are not 
user-facing APIs. The reason there is an interface is due to good practice (see 
Parnas 
https://www.win.tue.nl/~wstomv/edu/2ip30/references/criteria_for_modularization.pdf).
 You should put an interface in place when a significant decision is made, even 
if there is only one implementation. Here, the interface is around the decision 
of what library to use, just like you say.

It is kind of a cool idea to implement some other DoFnInvoker factories for 
users, but then if the pipeline fails it is a user bug, not a Beam bug. That 
might get confusing. Have you opened this idea on the dev list?

> Make ByteBuddyDoFnInvokerFactory injection strategy configurable + drop the 
> singleton
> -
>
> Key: BEAM-5526
> URL: https://issues.apache.org/jira/browse/BEAM-5526
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Kenneth Knowles
>Priority: Blocker
>
> org.apache.beam.sdk.transforms.reflect.DoFnInvokers + DoFnInvokerFactory 
> design is to be a SPI to let user plug their own bytecode manipulation 
> library, however in practise beam uses ByteBuddyDoFnInvokerFactory as a 
> singleton which makes all this design useless.
> ByteBuddyDoFnInvokerFactory is also not configurable at all - typically the 
> injection strategy so it assumes it runs in an environment and on a JVM where 
> it will work - it does not on java 11 for instance.
> This ticket is about fixing all these small inconsistency and blocker to tun 
> on java 11.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5526) Make ByteBuddyDoFnInvokerFactory injection strategy configurable + drop the singleton

2018-10-01 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-5526:
-

Assignee: (was: Kenneth Knowles)

> Make ByteBuddyDoFnInvokerFactory injection strategy configurable + drop the 
> singleton
> -
>
> Key: BEAM-5526
> URL: https://issues.apache.org/jira/browse/BEAM-5526
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Romain Manni-Bucau
>Priority: Blocker
>
> org.apache.beam.sdk.transforms.reflect.DoFnInvokers + DoFnInvokerFactory 
> design is to be a SPI to let user plug their own bytecode manipulation 
> library, however in practise beam uses ByteBuddyDoFnInvokerFactory as a 
> singleton which makes all this design useless.
> ByteBuddyDoFnInvokerFactory is also not configurable at all - typically the 
> injection strategy so it assumes it runs in an environment and on a JVM where 
> it will work - it does not on java 11 for instance.
> This ticket is about fixing all these small inconsistency and blocker to tun 
> on java 11.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5526) Make ByteBuddyDoFnInvokerFactory injection strategy configurable + drop the singleton

2018-10-01 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-5526:
--
Issue Type: New Feature  (was: Task)

> Make ByteBuddyDoFnInvokerFactory injection strategy configurable + drop the 
> singleton
> -
>
> Key: BEAM-5526
> URL: https://issues.apache.org/jira/browse/BEAM-5526
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Kenneth Knowles
>Priority: Blocker
>
> org.apache.beam.sdk.transforms.reflect.DoFnInvokers + DoFnInvokerFactory 
> design is to be a SPI to let user plug their own bytecode manipulation 
> library, however in practise beam uses ByteBuddyDoFnInvokerFactory as a 
> singleton which makes all this design useless.
> ByteBuddyDoFnInvokerFactory is also not configurable at all - typically the 
> injection strategy so it assumes it runs in an environment and on a JVM where 
> it will work - it does not on java 11 for instance.
> This ticket is about fixing all these small inconsistency and blocker to tun 
> on java 11.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5526) Make ByteBuddyDoFnInvokerFactory injection strategy configurable + drop the singleton

2018-10-01 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-5526:
--
Issue Type: Improvement  (was: New Feature)

> Make ByteBuddyDoFnInvokerFactory injection strategy configurable + drop the 
> singleton
> -
>
> Key: BEAM-5526
> URL: https://issues.apache.org/jira/browse/BEAM-5526
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Kenneth Knowles
>Priority: Blocker
>
> org.apache.beam.sdk.transforms.reflect.DoFnInvokers + DoFnInvokerFactory 
> design is to be a SPI to let user plug their own bytecode manipulation 
> library, however in practise beam uses ByteBuddyDoFnInvokerFactory as a 
> singleton which makes all this design useless.
> ByteBuddyDoFnInvokerFactory is also not configurable at all - typically the 
> injection strategy so it assumes it runs in an environment and on a JVM where 
> it will work - it does not on java 11 for instance.
> This ticket is about fixing all these small inconsistency and blocker to tun 
> on java 11.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4670) Re-enable checkstyle JavadocParagraph

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-4670:
--
Labels: easyfix newbie starter  (was: )

> Re-enable checkstyle JavadocParagraph
> -
>
> Key: BEAM-4670
> URL: https://issues.apache.org/jira/browse/BEAM-4670
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: easyfix, newbie, starter
>
> When {{spotless}} reformats javadoc, many of our javadoc comments are 
> revealed to be broken, missing paragraph tags. Rather than force them all to 
> be fixed in a single PR, we could disable the check and do it a few at a 
> time, since it is easy "idle hands" work. This JIRA tracks that (the actual 
> disabling is not yet committed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4439) Do not convert a join that we cannot support

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-4439:
--
Labels: newbie starter  (was: )

> Do not convert a join that we cannot support
> 
>
> Key: BEAM-4439
> URL: https://issues.apache.org/jira/browse/BEAM-4439
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: starter
>
> The BeamJoinRule matches all LogicalJoin operators. If we make it not match a 
> full join then Calcite should look for a different plan when it converts to 
> BeamLogicalConvention. That is a good step to make sure we can support things 
> and also force the logical optimizer to choose non-full joins.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4670) Re-enable checkstyle JavadocParagraph

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4670:
-

Assignee: (was: Kenneth Knowles)

> Re-enable checkstyle JavadocParagraph
> -
>
> Key: BEAM-4670
> URL: https://issues.apache.org/jira/browse/BEAM-4670
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: easyfix, newbie, starter
>
> When {{spotless}} reformats javadoc, many of our javadoc comments are 
> revealed to be broken, missing paragraph tags. Rather than force them all to 
> be fixed in a single PR, we could disable the check and do it a few at a 
> time, since it is easy "idle hands" work. This JIRA tracks that (the actual 
> disabling is not yet committed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4439) Do not convert a join that we cannot support

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-4439:
--
Labels: starter  (was: newbie starter)

> Do not convert a join that we cannot support
> 
>
> Key: BEAM-4439
> URL: https://issues.apache.org/jira/browse/BEAM-4439
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: starter
>
> The BeamJoinRule matches all LogicalJoin operators. If we make it not match a 
> full join then Calcite should look for a different plan when it converts to 
> BeamLogicalConvention. That is a good step to make sure we can support things 
> and also force the logical optimizer to choose non-full joins.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4439) Do not convert a join that we cannot support

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4439:
-

Assignee: (was: Kenneth Knowles)

> Do not convert a join that we cannot support
> 
>
> Key: BEAM-4439
> URL: https://issues.apache.org/jira/browse/BEAM-4439
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Priority: Major
>
> The BeamJoinRule matches all LogicalJoin operators. If we make it not match a 
> full join then Calcite should look for a different plan when it converts to 
> BeamLogicalConvention. That is a good step to make sure we can support things 
> and also force the logical optimizer to choose non-full joins.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4664) Website Staging is timing out at 30 minutes

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-4664.
---
   Resolution: Fixed
Fix Version/s: Not applicable

> Website Staging is timing out at 30 minutes
> ---
>
> Key: BEAM-4664
> URL: https://issues.apache.org/jira/browse/BEAM-4664
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, website
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We shouldn't be cutting it close. Let's double it and more while we figure 
> out how to stage less or more efficiently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3608) Pre-shade Guava for things we want to keep using

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-3608:
-

Assignee: (was: Kenneth Knowles)

> Pre-shade Guava for things we want to keep using
> 
>
> Key: BEAM-3608
> URL: https://issues.apache.org/jira/browse/BEAM-3608
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core, sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Instead of shading as part of our build, we can shade before build so that it 
> is apparent when reading code, and in IDEs, that a particular class resides 
> in a hidden namespace.
> {{import com.google.common.reflect.TypeToken}}
> becomes something like
> {{import org.apache.beam.private.guava21.com.google.common.reflect.TypeToken}}
> So we can very trivially ban `org.apache.beam.private` from public APIs 
> unless they are annotated {{@Internal}}, and it makes sharing between our own 
> modules never get broken by shading again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4702) After SQL GROUP BY the result should be globally windowed

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4702:
-

Assignee: (was: Kenneth Knowles)

> After SQL GROUP BY  the result should be globally windowed
> -
>
> Key: BEAM-4702
> URL: https://issues.apache.org/jira/browse/BEAM-4702
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Beam SQL runs in two contexts:
> 1. As a PTransform in a pipeline. A PTransform operates on a PCollection, 
> which is always implicitly windows and a PTransform should operate per-window 
> so it automatically works on bounded and unbounded data. This only works if 
> the query has no windowing operators, in which case the GROUP BY  stuff> should operate per-window.
> 2. As a top-level shell that starts and ends with SQL. In the relational 
> model there are no implicit windows. Calcite has some extensions for 
> windowing, but they manifest (IMO correctly) as just items in the GROUP BY 
> list. The output of the aggregation is "just rows" again. So it should be 
> globally windowed.
> The problem is that this semantic fix makes it so we cannot join windowing 
> stream subqueries. Because we don't have retractions, we only support 
> GroupByKey-based equijoins over windowed streams, with the default trigger. 
> _These joins implicitly also join windows_. For example:
> {code}
> JOIN(left.id = right.id)
>   SELECT ... GROUP BY id, TUMBLE(1 hour)
>   SELECT ... GROUP BY id, TUMBLE(1 hour)  
> {code}
> Semantically, there may be a joined row for 1:00pm on the left and 10:00pm on 
> the right. But by the time the right-hand row for 10:00pm shows up, the left 
> one may be GC'd. So this is implicitly, but nondeterministically, joining on 
> the window as well. Before this PR, we left the windowing strategies for left 
> and right in place, and asserted that they matched.
> If we re-window into the global window always, there _are no windowed 
> streams_ so you just can't do stream joins. The solution is probably to track 
> which field of a stream is the window and allow joins which also explicitly 
> express the equijoin over the window field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4598) Test datetime operators at the DSL level

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4598:
-

Assignee: (was: Kenneth Knowles)

> Test datetime operators at the DSL level
> 
>
> Key: BEAM-4598
> URL: https://issues.apache.org/jira/browse/BEAM-4598
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4704) String operations yield incorrect results when executed through SQL shell

2018-07-03 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531674#comment-16531674
 ] 

Kenneth Knowles commented on BEAM-4704:
---

I'm going to leave this open, as we should either force our convention somehow 
and/or fix Calcite's implementation.

> String operations yield incorrect results when executed through SQL shell
> -
>
> Key: BEAM-4704
> URL: https://issues.apache.org/jira/browse/BEAM-4704
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>
> {{TRIM}} is defined to trim _all_ the characters in the first string from the 
> string-to-be-trimmed. Calcite has an incorrect implementation of this. We use 
> our own fixed implementation. But when executed through the SQL shell, the 
> results do not match what we get from the PTransform path. Here two test 
> cases that pass on {{master}} but are incorrect in the shell:
> {code:sql}
> BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe');
> ++
> | EXPR$0 |
> ++
> | hehe__hehe |
> ++
> {code}
> {code:sql}
> BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe');
> ++
> |   EXPR$0   |
> ++
> | hehe__heh  |
> ++
> {code}
> {code:sql}
> BeamSQL> select TRIM(BOTH 'eh' FROM 'hehe__hehe');
> ++
> |   EXPR$0   |
> ++
> | hehe__heh  |
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4704) String operations yield incorrect results when executed through SQL shell

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4704:
-

Assignee: (was: Kenneth Knowles)

> String operations yield incorrect results when executed through SQL shell
> -
>
> Key: BEAM-4704
> URL: https://issues.apache.org/jira/browse/BEAM-4704
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Priority: Major
>
> {{TRIM}} is defined to trim _all_ the characters in the first string from the 
> string-to-be-trimmed. Calcite has an incorrect implementation of this. We use 
> our own fixed implementation. But when executed through the SQL shell, the 
> results do not match what we get from the PTransform path. Here two test 
> cases that pass on {{master}} but are incorrect in the shell:
> {code:sql}
> BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe');
> ++
> | EXPR$0 |
> ++
> | hehe__hehe |
> ++
> {code}
> {code:sql}
> BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe');
> ++
> |   EXPR$0   |
> ++
> | hehe__heh  |
> ++
> {code}
> {code:sql}
> BeamSQL> select TRIM(BOTH 'eh' FROM 'hehe__hehe');
> ++
> |   EXPR$0   |
> ++
> | hehe__heh  |
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-2888) Runner Comparison / Capability Matrix revamp

2018-07-03 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531671#comment-16531671
 ] 

Kenneth Knowles commented on BEAM-2888:
---

[~griscz]] I will not be working on this for a while, I wanted to find it a 
good owner. It looks closely related to something you have assigned.

> Runner Comparison / Capability Matrix revamp
> 
>
> Key: BEAM-2888
> URL: https://issues.apache.org/jira/browse/BEAM-2888
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Griselda Cuevas Zambrano
>Priority: Major
>
> Discussion: 
> https://lists.apache.org/thread.html/8aff7d70c254356f2dae3109fb605e0b60763602225a877d3dadf8b7@%3Cdev.beam.apache.org%3E
> Summarizing that discussion, we have a lot of issues/wishes. Some can be 
> addressed as one-off and some need a unified reorganization of the runner 
> comparison.
> Basic corrections:
>  - Remove rows that impossible to not support (ParDo)
>  - Remove rows where "support" doesn't really make sense (Composite 
> transforms)
>  - Deduplicate rows are actually the same model feature (all non-merging 
> windowing / all merging windowing)
>  - Clearly separate rows that represent optimizations (Combine)
>  - Correct rows in the wrong place (Timers are actually a "what...?" row)
>  - Separate or remove rows have not been designed ([Meta]Data driven 
> triggers, retractions)
>  - Rename rows with names that appear no where else (Timestamp control, which 
> is called a TimestampCombiner in Java)
>  - Switch to a more distinct color scheme for full/partial support (currently 
> just solid/faded colors)
>  - Switch to something clearer than "~" for partial support, versus ✘ and ✓ 
> for none and full.
>  - Correct Gearpump support for merging windows (see BEAM-2759)
>  - Correct Spark support for non-merging and merging windows (see BEAM-2499)
> Minor rewrites:
>  - Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into one row
>  - Make sections as users see them, like "ParDo" / "side Inputs" not "What?" 
> / "side inputs"
>  - Add rows for non-model things, like portability framework support, metrics 
> backends, etc
> Bigger rewrites:
>  - Add versioning to the comparison, as in BEAM-166
>  - Find a way to fit in a plain English summary of runner's support in Beam. 
> It should come first, as it is what new users need before getting to details.
>  - Find a way to describe production readiness of runners and/or testimonials 
> of who is using it in production.
>  - Have a place to compare non-model differences between runners
> Changes requiring engineering efforts:
>  - Gather and add quantitative runner metrics, perhaps Nexmark results for 
> mid-level, smaller benchmarks for measuring aspects of specific features, and 
> larger end-to-end benchmarks to get an idea how it might actually perform on 
> a use case
>  - Tighter coupling of the matrix portion of the comparison with tags on 
> ValidatesRunner tests
> If you care to address some aspect of this, please reach out and/or just file 
> a subtask and address it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-2888) Runner Comparison / Capability Matrix revamp

2018-07-03 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531672#comment-16531672
 ] 

Kenneth Knowles commented on BEAM-2888:
---

Some of this can be distributed into subtasks, or [~melap] might have opinions 
about it.

> Runner Comparison / Capability Matrix revamp
> 
>
> Key: BEAM-2888
> URL: https://issues.apache.org/jira/browse/BEAM-2888
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Griselda Cuevas Zambrano
>Priority: Major
>
> Discussion: 
> https://lists.apache.org/thread.html/8aff7d70c254356f2dae3109fb605e0b60763602225a877d3dadf8b7@%3Cdev.beam.apache.org%3E
> Summarizing that discussion, we have a lot of issues/wishes. Some can be 
> addressed as one-off and some need a unified reorganization of the runner 
> comparison.
> Basic corrections:
>  - Remove rows that impossible to not support (ParDo)
>  - Remove rows where "support" doesn't really make sense (Composite 
> transforms)
>  - Deduplicate rows are actually the same model feature (all non-merging 
> windowing / all merging windowing)
>  - Clearly separate rows that represent optimizations (Combine)
>  - Correct rows in the wrong place (Timers are actually a "what...?" row)
>  - Separate or remove rows have not been designed ([Meta]Data driven 
> triggers, retractions)
>  - Rename rows with names that appear no where else (Timestamp control, which 
> is called a TimestampCombiner in Java)
>  - Switch to a more distinct color scheme for full/partial support (currently 
> just solid/faded colors)
>  - Switch to something clearer than "~" for partial support, versus ✘ and ✓ 
> for none and full.
>  - Correct Gearpump support for merging windows (see BEAM-2759)
>  - Correct Spark support for non-merging and merging windows (see BEAM-2499)
> Minor rewrites:
>  - Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into one row
>  - Make sections as users see them, like "ParDo" / "side Inputs" not "What?" 
> / "side inputs"
>  - Add rows for non-model things, like portability framework support, metrics 
> backends, etc
> Bigger rewrites:
>  - Add versioning to the comparison, as in BEAM-166
>  - Find a way to fit in a plain English summary of runner's support in Beam. 
> It should come first, as it is what new users need before getting to details.
>  - Find a way to describe production readiness of runners and/or testimonials 
> of who is using it in production.
>  - Have a place to compare non-model differences between runners
> Changes requiring engineering efforts:
>  - Gather and add quantitative runner metrics, perhaps Nexmark results for 
> mid-level, smaller benchmarks for measuring aspects of specific features, and 
> larger end-to-end benchmarks to get an idea how it might actually perform on 
> a use case
>  - Tighter coupling of the matrix portion of the comparison with tags on 
> ValidatesRunner tests
> If you care to address some aspect of this, please reach out and/or just file 
> a subtask and address it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-2888) Runner Comparison / Capability Matrix revamp

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-2888:
-

Assignee: Griselda Cuevas Zambrano  (was: Kenneth Knowles)

> Runner Comparison / Capability Matrix revamp
> 
>
> Key: BEAM-2888
> URL: https://issues.apache.org/jira/browse/BEAM-2888
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Griselda Cuevas Zambrano
>Priority: Major
>
> Discussion: 
> https://lists.apache.org/thread.html/8aff7d70c254356f2dae3109fb605e0b60763602225a877d3dadf8b7@%3Cdev.beam.apache.org%3E
> Summarizing that discussion, we have a lot of issues/wishes. Some can be 
> addressed as one-off and some need a unified reorganization of the runner 
> comparison.
> Basic corrections:
>  - Remove rows that impossible to not support (ParDo)
>  - Remove rows where "support" doesn't really make sense (Composite 
> transforms)
>  - Deduplicate rows are actually the same model feature (all non-merging 
> windowing / all merging windowing)
>  - Clearly separate rows that represent optimizations (Combine)
>  - Correct rows in the wrong place (Timers are actually a "what...?" row)
>  - Separate or remove rows have not been designed ([Meta]Data driven 
> triggers, retractions)
>  - Rename rows with names that appear no where else (Timestamp control, which 
> is called a TimestampCombiner in Java)
>  - Switch to a more distinct color scheme for full/partial support (currently 
> just solid/faded colors)
>  - Switch to something clearer than "~" for partial support, versus ✘ and ✓ 
> for none and full.
>  - Correct Gearpump support for merging windows (see BEAM-2759)
>  - Correct Spark support for non-merging and merging windows (see BEAM-2499)
> Minor rewrites:
>  - Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into one row
>  - Make sections as users see them, like "ParDo" / "side Inputs" not "What?" 
> / "side inputs"
>  - Add rows for non-model things, like portability framework support, metrics 
> backends, etc
> Bigger rewrites:
>  - Add versioning to the comparison, as in BEAM-166
>  - Find a way to fit in a plain English summary of runner's support in Beam. 
> It should come first, as it is what new users need before getting to details.
>  - Find a way to describe production readiness of runners and/or testimonials 
> of who is using it in production.
>  - Have a place to compare non-model differences between runners
> Changes requiring engineering efforts:
>  - Gather and add quantitative runner metrics, perhaps Nexmark results for 
> mid-level, smaller benchmarks for measuring aspects of specific features, and 
> larger end-to-end benchmarks to get an idea how it might actually perform on 
> a use case
>  - Tighter coupling of the matrix portion of the comparison with tags on 
> ValidatesRunner tests
> If you care to address some aspect of this, please reach out and/or just file 
> a subtask and address it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4642) Allow setting PipelineOptions for JDBC connections

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-4642.
---
   Resolution: Fixed
Fix Version/s: 2.6.0

> Allow setting PipelineOptions for JDBC connections
> --
>
> Key: BEAM-4642
> URL: https://issues.apache.org/jira/browse/BEAM-4642
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.6.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently you can set pipeline options only through {{SET}} commands. It 
> would be convenient to set defaults in the JDBC connection string.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4667) Potential issue with QuantileStateCoder

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4667:
-

Assignee: Reuven Lax  (was: Kenneth Knowles)

> Potential issue with QuantileStateCoder
> ---
>
> Key: BEAM-4667
> URL: https://issues.apache.org/jira/browse/BEAM-4667
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Zhiheng Huang
>Assignee: Reuven Lax
>Priority: Minor
>
> [https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateQuantiles.java#L687]
> The line above encodes the QuantileState buffers.size() as if it's 
> numBuffers. This seems wrong since before buffers are full, buffers.size() is 
> not equal to numBuffers. One thing I suspect will happen is that, if we 
> serialize before buffer is full, it will effectively reduce the number of 
> buffers we maintain after deserialization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4425) JDBC driver load errors :jdbc:calcite and :jdbc:beam

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4425:
-

Assignee: (was: Kenneth Knowles)

> JDBC driver load errors :jdbc:calcite and :jdbc:beam
> 
>
> Key: BEAM-4425
> URL: https://issues.apache.org/jira/browse/BEAM-4425
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> These have different causes but the same fix - force load the JDBC driver 
> before it is used or otherwise factor code so it is not possible to hit this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4425) JDBC driver load errors :jdbc:calcite and :jdbc:beam

2018-07-03 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531665#comment-16531665
 ] 

Kenneth Knowles commented on BEAM-4425:
---

Is there a more broad fix or is just fixing it when it comes up what you have 
to do?

> JDBC driver load errors :jdbc:calcite and :jdbc:beam
> 
>
> Key: BEAM-4425
> URL: https://issues.apache.org/jira/browse/BEAM-4425
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> These have different causes but the same fix - force load the JDBC driver 
> before it is used or otherwise factor code so it is not possible to hit this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4667) Potential issue with QuantileStateCoder

2018-07-03 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531666#comment-16531666
 ] 

Kenneth Knowles commented on BEAM-4667:
---

[~reuvenlax] any thoughts?

> Potential issue with QuantileStateCoder
> ---
>
> Key: BEAM-4667
> URL: https://issues.apache.org/jira/browse/BEAM-4667
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Zhiheng Huang
>Assignee: Reuven Lax
>Priority: Minor
>
> [https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateQuantiles.java#L687]
> The line above encodes the QuantileState buffers.size() as if it's 
> numBuffers. This seems wrong since before buffers are full, buffers.size() is 
> not equal to numBuffers. One thing I suspect will happen is that, if we 
> serialize before buffer is full, it will effectively reduce the number of 
> buffers we maintain after deserialization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4670) Re-enable checkstyle JavadocParagraph

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4670:
-

Assignee: (was: Kenneth Knowles)

> Re-enable checkstyle JavadocParagraph
> -
>
> Key: BEAM-4670
> URL: https://issues.apache.org/jira/browse/BEAM-4670
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kenneth Knowles
>Priority: Major
>
> When {{spotless}} reformats javadoc, many of our javadoc comments are 
> revealed to be broken, missing paragraph tags. Rather than force them all to 
> be fixed in a single PR, we could disable the check and do it a few at a 
> time, since it is easy "idle hands" work. This JIRA tracks that (the actual 
> disabling is not yet committed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4670) Re-enable checkstyle JavadocParagraph

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4670:
-

Assignee: Kenneth Knowles

> Re-enable checkstyle JavadocParagraph
> -
>
> Key: BEAM-4670
> URL: https://issues.apache.org/jira/browse/BEAM-4670
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>
> When {{spotless}} reformats javadoc, many of our javadoc comments are 
> revealed to be broken, missing paragraph tags. Rather than force them all to 
> be fixed in a single PR, we could disable the check and do it a few at a 
> time, since it is easy "idle hands" work. This JIRA tracks that (the actual 
> disabling is not yet committed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4697) Build broken

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-4697.
---
   Resolution: Fixed
Fix Version/s: Not applicable

> Build broken
> 
>
> Key: BEAM-4697
> URL: https://issues.apache.org/jira/browse/BEAM-4697
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Caused by a race between two separate PRs being merged.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4669) Re-enable checkstyle LineLength

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4669:
-

Assignee: (was: Kenneth Knowles)

> Re-enable checkstyle LineLength
> ---
>
> Key: BEAM-4669
> URL: https://issues.apache.org/jira/browse/BEAM-4669
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: easyfix, newbie, starter
>
> When {{spotless}} re-indents, many strings that are broken across lines end 
> up exceeding 100 characters. Rather than force them all to be fixed in a 
> single PR, we could disable the check and do it a few at a time, since it is 
> easy "idle hands" work. This JIRA tracks that (the actual disabling is not 
> yet committed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4697) Build broken

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4697:
-

Assignee: Reuven Lax  (was: Kenneth Knowles)

> Build broken
> 
>
> Key: BEAM-4697
> URL: https://issues.apache.org/jira/browse/BEAM-4697
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Caused by a race between two separate PRs being merged.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4669) Re-enable checkstyle LineLength

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-4669:
--
Labels: easyfix newbie starter  (was: )

> Re-enable checkstyle LineLength
> ---
>
> Key: BEAM-4669
> URL: https://issues.apache.org/jira/browse/BEAM-4669
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: easyfix, newbie, starter
>
> When {{spotless}} re-indents, many strings that are broken across lines end 
> up exceeding 100 characters. Rather than force them all to be fixed in a 
> single PR, we could disable the check and do it a few at a time, since it is 
> easy "idle hands" work. This JIRA tracks that (the actual disabling is not 
> yet committed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4701) Run SQL DSL-level tests through JDBC driver

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-4701.
---
   Resolution: Fixed
Fix Version/s: 2.6.0

> Run SQL DSL-level tests through JDBC driver
> ---
>
> Key: BEAM-4701
> URL: https://issues.apache.org/jira/browse/BEAM-4701
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.6.0
>
>
> Currently there's a testing gap where tests at the DSL level mostly go 
> through the QueryTransform. So we can hit issues where we have a mismatch in 
> Calcite Avatica, which provides the JDBC layer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4700) JDBC driver cannot support TIMESTAMP data type

2018-07-03 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531663#comment-16531663
 ] 

Kenneth Knowles commented on BEAM-4700:
---

I think you know the most about Avatica and how we might work around this.

> JDBC driver cannot support TIMESTAMP data type
> --
>
> Key: BEAM-4700
> URL: https://issues.apache.org/jira/browse/BEAM-4700
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Avatica allows column representation to be customized, so a timestamp can be 
> stored as a variety of types. Joda ReadableInstant is none of these types: 
> https://github.com/apache/calcite-avatica/blob/acb675de97b9b0743c09368820a770e2ceda05f8/core/src/main/java/org/apache/calcite/avatica/util/AbstractCursor.java#L162
> By default, it seems to be configured to store {{TIMESTAMP}} columns as 
> {{long}} values. If you run the SQL shell and select a {{TIMESTAMP}} column, 
> you get:
> {code}
> ava.lang.ClassCastException: org.joda.time.Instant cannot be cast to 
> java.lang.Number
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$NumberAccessor.getNumber(AbstractCursor.java:726)
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$TimestampFromNumberAccessor.getString(AbstractCursor.java:1026)
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.AvaticaResultSet.getString(AvaticaResultSet.java:225)
> at sqlline.Rows$Row.(Rows.java:183)
> {code}
> So, essentially, Beam SQL Shell does not support timestamps.
> We may be able to:
>  - override how the accessor for our existing storage is created
>  - configure what the column representation is (this doesn't really help, 
> since none of the choices are ours)
>  - convert timestamps to longs in BeamEnumerableConverter; not sure how many 
> conversions will be required here



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4700) JDBC driver cannot support TIMESTAMP data type

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4700:
-

Assignee: Andrew Pilloud  (was: Kenneth Knowles)

> JDBC driver cannot support TIMESTAMP data type
> --
>
> Key: BEAM-4700
> URL: https://issues.apache.org/jira/browse/BEAM-4700
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Avatica allows column representation to be customized, so a timestamp can be 
> stored as a variety of types. Joda ReadableInstant is none of these types: 
> https://github.com/apache/calcite-avatica/blob/acb675de97b9b0743c09368820a770e2ceda05f8/core/src/main/java/org/apache/calcite/avatica/util/AbstractCursor.java#L162
> By default, it seems to be configured to store {{TIMESTAMP}} columns as 
> {{long}} values. If you run the SQL shell and select a {{TIMESTAMP}} column, 
> you get:
> {code}
> ava.lang.ClassCastException: org.joda.time.Instant cannot be cast to 
> java.lang.Number
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$NumberAccessor.getNumber(AbstractCursor.java:726)
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$TimestampFromNumberAccessor.getString(AbstractCursor.java:1026)
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.AvaticaResultSet.getString(AvaticaResultSet.java:225)
> at sqlline.Rows$Row.(Rows.java:183)
> {code}
> So, essentially, Beam SQL Shell does not support timestamps.
> We may be able to:
>  - override how the accessor for our existing storage is created
>  - configure what the column representation is (this doesn't really help, 
> since none of the choices are ours)
>  - convert timestamps to longs in BeamEnumerableConverter; not sure how many 
> conversions will be required here



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4714) Some DATETIME PLUS operators end up as ordinary PLUS and crash in accept()

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-4714.
---
   Resolution: Fixed
Fix Version/s: 2.6.0

> Some DATETIME PLUS operators end up as ordinary PLUS and crash in accept()
> --
>
> Key: BEAM-4714
> URL: https://issues.apache.org/jira/browse/BEAM-4714
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.6.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In this case it was in HOP_END (but not HOP_START, interestingly enough). 
> Unclear why this manifests, but the "+" operator should just work for 
> datetime + interval for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4684) Support @RequiresStableInput on Dataflow runner in Java SDK

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4684:
-

Assignee: Yueyang Qiu  (was: Kenneth Knowles)

> Support @RequiresStableInput on Dataflow runner in Java SDK
> ---
>
> Key: BEAM-4684
> URL: https://issues.apache.org/jira/browse/BEAM-4684
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> https://docs.google.com/document/d/117yRKbbcEdm3eIKB_26BHOJGmHSZl1YNoF0RqWGtqAM



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4547) implement sum0 aggregation function

2018-07-03 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531662#comment-16531662
 ] 

Kenneth Knowles commented on BEAM-4547:
---

I'm sure that either {{SUM}} or {{SUM0}} is incorrect, since we use the same 
implementation for both... I'm not sure it matters for Beam, but we should get 
testing for it all, including the edge cases. That can be separate.

> implement sum0 aggregation function
> ---
>
> Key: BEAM-4547
> URL: https://issues.apache.org/jira/browse/BEAM-4547
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kai Jiang
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.6.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4547) implement sum0 aggregation function

2018-07-03 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-4547.
---
   Resolution: Fixed
Fix Version/s: 2.6.0

> implement sum0 aggregation function
> ---
>
> Key: BEAM-4547
> URL: https://issues.apache.org/jira/browse/BEAM-4547
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kai Jiang
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.6.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4721) WindowingStrategy, Window fields in the project as a trait of a Rel

2018-07-02 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-4721:
-

 Summary: WindowingStrategy, Window fields in the project as a 
trait of a Rel
 Key: BEAM-4721
 URL: https://issues.apache.org/jira/browse/BEAM-4721
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Kenneth Knowles


Currently, WindowingStrategy is just a property of a PCollection but is unknown 
during planning. As with boundedness / unboundedness, this should be available 
statically so it can affect the plan. It may not be literally a 
WindowingStrategy object, but a SQL-level equivalent.

One windowing-related thing that is interesting is which fields represent the 
window. If we know this, then we know we can support per-window equijoins as 
long as a join condition also does an equijoin on the window.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4720) Boundedness and Unboundedness as a trait of a Rel so we know during planning

2018-07-02 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-4720:
-

 Summary: Boundedness and Unboundedness as a trait of a Rel so we 
know during planning
 Key: BEAM-4720
 URL: https://issues.apache.org/jira/browse/BEAM-4720
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Kenneth Knowles


Currently we do not know the boundedness or unboundedness of a collection until 
after planning is complete, so it cannot influence the plan. Instead, 
BeamJoinRel makes post-planning decisions about what sort of join to implement. 
We should move this into the planning phase. I think Calcite traits are the 
intended way to do this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4719) Enhanced LIMIT support

2018-07-02 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-4719:
-

 Summary: Enhanced LIMIT support
 Key: BEAM-4719
 URL: https://issues.apache.org/jira/browse/BEAM-4719
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Kenneth Knowles


Currently, Beam SQL supports LIMIT in two ways:

1. Within a query, the results are subject to LIMIT. This works.
2. The shell knows to cancel a pipeline when the limit is reached, even if 
there is unfinished unbounded data.

The canceling of a pipeline works via a basic pattern match against the query 
execution plan, checking a few child nodes of the BeamEnumerableConverter for a 
BeamSortRel without a collation. If it can figure out what the limit is for the 
outermost query, then it will cancel the pipeline.

A more robust approach might be to use traits (or some other thorough analysis) 
to see if there is a known size for the outermost query. This would, for 
example, be unaffected by any number of layer of non-size-changing 
transformations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4714) Some DATETIME PLUS operators end up as ordinary PLUS and crash in accept()

2018-07-02 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-4714:
--
Description: In this case it was in HOP_END (but not HOP_START, 
interestingly enough). Unclear why this manifests, but the "+" operator should 
just work for datetime + interval for this.

> Some DATETIME PLUS operators end up as ordinary PLUS and crash in accept()
> --
>
> Key: BEAM-4714
> URL: https://issues.apache.org/jira/browse/BEAM-4714
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>
> In this case it was in HOP_END (but not HOP_START, interestingly enough). 
> Unclear why this manifests, but the "+" operator should just work for 
> datetime + interval for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4714) Some DATETIME PLUS operators end up as ordinary PLUS and crash in accept()

2018-07-02 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-4714:
-

 Summary: Some DATETIME PLUS operators end up as ordinary PLUS and 
crash in accept()
 Key: BEAM-4714
 URL: https://issues.apache.org/jira/browse/BEAM-4714
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Kenneth Knowles
Assignee: Kenneth Knowles






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4547) implement sum0 aggregation function

2018-07-02 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4547:
-

Assignee: Kenneth Knowles  (was: Kai Jiang)

> implement sum0 aggregation function
> ---
>
> Key: BEAM-4547
> URL: https://issues.apache.org/jira/browse/BEAM-4547
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kai Jiang
>Assignee: Kenneth Knowles
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4700) JDBC driver cannot support TIMESTAMP data type

2018-07-02 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530355#comment-16530355
 ] 

Kenneth Knowles commented on BEAM-4700:
---

Discussion on CALCITE-2394 implies that it is necessary for us to always store 
as {{TIMESTAMP WITH TIME ZONE}} set to 0 since Avatica's SQL-to-JDBC treats it 
as "millis since time zone offset".

> JDBC driver cannot support TIMESTAMP data type
> --
>
> Key: BEAM-4700
> URL: https://issues.apache.org/jira/browse/BEAM-4700
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Avatica allows column representation to be customized, so a timestamp can be 
> stored as a variety of types. Joda ReadableInstant is none of these types: 
> https://github.com/apache/calcite-avatica/blob/acb675de97b9b0743c09368820a770e2ceda05f8/core/src/main/java/org/apache/calcite/avatica/util/AbstractCursor.java#L162
> By default, it seems to be configured to store {{TIMESTAMP}} columns as 
> {{long}} values. If you run the SQL shell and select a {{TIMESTAMP}} column, 
> you get:
> {code}
> ava.lang.ClassCastException: org.joda.time.Instant cannot be cast to 
> java.lang.Number
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$NumberAccessor.getNumber(AbstractCursor.java:726)
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$TimestampFromNumberAccessor.getString(AbstractCursor.java:1026)
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.AvaticaResultSet.getString(AvaticaResultSet.java:225)
> at sqlline.Rows$Row.(Rows.java:183)
> {code}
> So, essentially, Beam SQL Shell does not support timestamps.
> We may be able to:
>  - override how the accessor for our existing storage is created
>  - configure what the column representation is (this doesn't really help, 
> since none of the choices are ours)
>  - convert timestamps to longs in BeamEnumerableConverter; not sure how many 
> conversions will be required here



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4626) Support text table format with a single column of the lines of the files

2018-07-02 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-4626.
---
   Resolution: Fixed
Fix Version/s: 2.6.0

> Support text table format with a single column of the lines of the files
> 
>
> Key: BEAM-4626
> URL: https://issues.apache.org/jira/browse/BEAM-4626
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.6.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Today, SQL can read CSV and allows a {{format}} flag to control what CSV 
> variant is used. But to do easy things and write pure SQL jobs it would be 
> nice to just read the text file as a one-column table and do transformations 
> in SQL.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4713) DECIMAL support in RowJsonDeserializer

2018-07-02 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-4713:
-

 Summary: DECIMAL support in RowJsonDeserializer
 Key: BEAM-4713
 URL: https://issues.apache.org/jira/browse/BEAM-4713
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Kenneth Knowles
Assignee: Rui Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4702) After SQL GROUP BY the result should be globally windowed

2018-07-02 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-4702:
--
Description: 
Beam SQL runs in two contexts:

1. As a PTransform in a pipeline. A PTransform operates on a PCollection, which 
is always implicitly windows and a PTransform should operate per-window so it 
automatically works on bounded and unbounded data. This only works if the query 
has no windowing operators, in which case the GROUP BY  
should operate per-window.
2. As a top-level shell that starts and ends with SQL. In the relational model 
there are no implicit windows. Calcite has some extensions for windowing, but 
they manifest (IMO correctly) as just items in the GROUP BY list. The output of 
the aggregation is "just rows" again. So it should be globally windowed.

The problem is that this semantic fix makes it so we cannot join windowing 
stream subqueries. Because we don't have retractions, we only support 
GroupByKey-based equijoins over windowed streams, with the default trigger. 
_These joins implicitly also join windows_. For example:

{code}
JOIN(left.id = right.id)
  SELECT ... GROUP BY id, TUMBLE(1 hour)
  SELECT ... GROUP BY id, TUMBLE(1 hour)  
{code}

Semantically, there may be a joined row for 1:00pm on the left and 10:00pm on 
the right. But by the time the right-hand row for 10:00pm shows up, the left 
one may be GC'd. So this is implicitly, but nondeterministically, joining on 
the window as well. Before this PR, we left the windowing strategies for left 
and right in place, and asserted that they matched.

If we re-window into the global window always, there _are no windowed streams_ 
so you just can't do stream joins. The solution is probably to track which 
field of a stream is the window and allow joins which also explicitly express 
the equijoin over the window field.

> After SQL GROUP BY  the result should be globally windowed
> -
>
> Key: BEAM-4702
> URL: https://issues.apache.org/jira/browse/BEAM-4702
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Beam SQL runs in two contexts:
> 1. As a PTransform in a pipeline. A PTransform operates on a PCollection, 
> which is always implicitly windows and a PTransform should operate per-window 
> so it automatically works on bounded and unbounded data. This only works if 
> the query has no windowing operators, in which case the GROUP BY  stuff> should operate per-window.
> 2. As a top-level shell that starts and ends with SQL. In the relational 
> model there are no implicit windows. Calcite has some extensions for 
> windowing, but they manifest (IMO correctly) as just items in the GROUP BY 
> list. The output of the aggregation is "just rows" again. So it should be 
> globally windowed.
> The problem is that this semantic fix makes it so we cannot join windowing 
> stream subqueries. Because we don't have retractions, we only support 
> GroupByKey-based equijoins over windowed streams, with the default trigger. 
> _These joins implicitly also join windows_. For example:
> {code}
> JOIN(left.id = right.id)
>   SELECT ... GROUP BY id, TUMBLE(1 hour)
>   SELECT ... GROUP BY id, TUMBLE(1 hour)  
> {code}
> Semantically, there may be a joined row for 1:00pm on the left and 10:00pm on 
> the right. But by the time the right-hand row for 10:00pm shows up, the left 
> one may be GC'd. So this is implicitly, but nondeterministically, joining on 
> the window as well. Before this PR, we left the windowing strategies for left 
> and right in place, and asserted that they matched.
> If we re-window into the global window always, there _are no windowed 
> streams_ so you just can't do stream joins. The solution is probably to track 
> which field of a stream is the window and allow joins which also explicitly 
> express the equijoin over the window field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4708) Runners-core javadoc broken, blocked snapshot release

2018-07-02 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-4708.
---
   Resolution: Fixed
Fix Version/s: Not applicable

> Runners-core javadoc broken, blocked snapshot release
> -
>
> Key: BEAM-4708
> URL: https://issues.apache.org/jira/browse/BEAM-4708
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4709) Javadoc build only tested during release

2018-07-02 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-4709:
-

 Summary: Javadoc build only tested during release
 Key: BEAM-4709
 URL: https://issues.apache.org/jira/browse/BEAM-4709
 Project: Beam
  Issue Type: Bug
  Components: build-system
Reporter: Kenneth Knowles


As seen in breakage like BEAM-4708, the javadoc is only build as part of the 
publish process. This seems like something that could comfortably fit in a 
postcommit test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4708) Runners-core javadoc broken, blocked snapshot release

2018-07-02 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-4708:
-

 Summary: Runners-core javadoc broken, blocked snapshot release
 Key: BEAM-4708
 URL: https://issues.apache.org/jira/browse/BEAM-4708
 Project: Beam
  Issue Type: Bug
  Components: runner-core
Reporter: Kenneth Knowles
Assignee: Kenneth Knowles






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4704) String operations yield incorrect results when executed through SQL shell

2018-07-01 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529370#comment-16529370
 ] 

Kenneth Knowles commented on BEAM-4704:
---

[~apilloud] wild guess here - the shell only invokes our code when the calling 
convention insists that it do so?

> String operations yield incorrect results when executed through SQL shell
> -
>
> Key: BEAM-4704
> URL: https://issues.apache.org/jira/browse/BEAM-4704
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>
> {{TRIM}} is defined to trim _all_ the characters in the first string from the 
> string-to-be-trimmed. Calcite has an incorrect implementation of this. We use 
> our own fixed implementation. But when executed through the SQL shell, the 
> results do not match what we get from the PTransform path. Here two test 
> cases that pass on {{master}} but are incorrect in the shell:
> {code:sql}
> BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe');
> ++
> | EXPR$0 |
> ++
> | hehe__hehe |
> ++
> {code}
> {code:sql}
> BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe');
> ++
> |   EXPR$0   |
> ++
> | hehe__heh  |
> ++
> {code}
> {code:sql}
> BeamSQL> select TRIM(BOTH 'eh' FROM 'hehe__hehe');
> ++
> |   EXPR$0   |
> ++
> | hehe__heh  |
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-4704) String operations yield incorrect results when executed through SQL shell

2018-07-01 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529370#comment-16529370
 ] 

Kenneth Knowles edited comment on BEAM-4704 at 7/2/18 4:14 AM:
---

[~apilloud] -wild- confident guess here - the shell only invokes our code when 
the calling convention insists that it do so.


was (Author: kenn):
[~apilloud] wild guess here - the shell only invokes our code when the calling 
convention insists that it do so?

> String operations yield incorrect results when executed through SQL shell
> -
>
> Key: BEAM-4704
> URL: https://issues.apache.org/jira/browse/BEAM-4704
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>
> {{TRIM}} is defined to trim _all_ the characters in the first string from the 
> string-to-be-trimmed. Calcite has an incorrect implementation of this. We use 
> our own fixed implementation. But when executed through the SQL shell, the 
> results do not match what we get from the PTransform path. Here two test 
> cases that pass on {{master}} but are incorrect in the shell:
> {code:sql}
> BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe');
> ++
> | EXPR$0 |
> ++
> | hehe__hehe |
> ++
> {code}
> {code:sql}
> BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe');
> ++
> |   EXPR$0   |
> ++
> | hehe__heh  |
> ++
> {code}
> {code:sql}
> BeamSQL> select TRIM(BOTH 'eh' FROM 'hehe__hehe');
> ++
> |   EXPR$0   |
> ++
> | hehe__heh  |
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4704) String operations yield incorrect results when executed through SQL shell

2018-07-01 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-4704:
--
Description: 
{{TRIM}} is defined to trim _all_ the characters in the first string from the 
string-to-be-trimmed. Calcite has an incorrect implementation of this. We use 
our own fixed implementation. But when executed through the SQL shell, the 
results do not match what we get from the PTransform path. Here two test cases 
that pass on {{master}} but are incorrect in the shell:
{code:sql}
BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe');
++
| EXPR$0 |
++
| hehe__hehe |
++
{code}
{code:sql}
BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe');
++
|   EXPR$0   |
++
| hehe__heh  |
++
{code}

{code:sql}
BeamSQL> select TRIM(BOTH 'eh' FROM 'hehe__hehe');
++
|   EXPR$0   |
++
| hehe__heh  |
++
{code}

  was:
{{TRIM}} is defined to trim _all_ the characters in the first string from the 
string-to-be-trimmed. Calcite has an incorrect implementation of this. We use 
our own fixed implementation. But when executed through the SQL shell, the 
results do not match what we get from the PTransform path. Here are two test 
cases that pass on {{master}} but are incorrect in the shell:
{code:java}
BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe');
++
| EXPR$0 |
++
| hehe__hehe |
++
{code}
{code:java}
BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe');
++
|   EXPR$0   |
++
| hehe__heh  |
++
{code}


> String operations yield incorrect results when executed through SQL shell
> -
>
> Key: BEAM-4704
> URL: https://issues.apache.org/jira/browse/BEAM-4704
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>
> {{TRIM}} is defined to trim _all_ the characters in the first string from the 
> string-to-be-trimmed. Calcite has an incorrect implementation of this. We use 
> our own fixed implementation. But when executed through the SQL shell, the 
> results do not match what we get from the PTransform path. Here two test 
> cases that pass on {{master}} but are incorrect in the shell:
> {code:sql}
> BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe');
> ++
> | EXPR$0 |
> ++
> | hehe__hehe |
> ++
> {code}
> {code:sql}
> BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe');
> ++
> |   EXPR$0   |
> ++
> | hehe__heh  |
> ++
> {code}
> {code:sql}
> BeamSQL> select TRIM(BOTH 'eh' FROM 'hehe__hehe');
> ++
> |   EXPR$0   |
> ++
> | hehe__heh  |
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4704) String operations yield incorrect results when executed through SQL shell

2018-07-01 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-4704:
--
Description: 
{{TRIM}} is defined to trim _all_ the characters in the first string from the 
string-to-be-trimmed. Calcite has an incorrect implementation of this. We use 
our own fixed implementation. But when executed through the SQL shell, the 
results do not match what we get from the PTransform path. Here are two test 
cases that pass on {{master}} but are incorrect in the shell:
{code:java}
BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe');
++
| EXPR$0 |
++
| hehe__hehe |
++
{code}
{code:java}
BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe');
++
|   EXPR$0   |
++
| hehe__heh  |
++
{code}

  was:
{{TRIM}} is defined to trim _all_ the characters in a string. Calcite has an 
incorrect implementation of this. We use our own fixed implementation. But when 
executed through the SQL shell, the results do not match what we get from the 
PTransform path. Here are two test cases that pass on {{master}} but are 
incorrect in the shell:

{code}
BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe');
++
| EXPR$0 |
++
| hehe__hehe |
++
{code}

{code}
BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe');
++
|   EXPR$0   |
++
| hehe__heh  |
++
{code}


> String operations yield incorrect results when executed through SQL shell
> -
>
> Key: BEAM-4704
> URL: https://issues.apache.org/jira/browse/BEAM-4704
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>
> {{TRIM}} is defined to trim _all_ the characters in the first string from the 
> string-to-be-trimmed. Calcite has an incorrect implementation of this. We use 
> our own fixed implementation. But when executed through the SQL shell, the 
> results do not match what we get from the PTransform path. Here are two test 
> cases that pass on {{master}} but are incorrect in the shell:
> {code:java}
> BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe');
> ++
> | EXPR$0 |
> ++
> | hehe__hehe |
> ++
> {code}
> {code:java}
> BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe');
> ++
> |   EXPR$0   |
> ++
> | hehe__heh  |
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4704) String operations yield incorrect results when executed through SQL shell

2018-07-01 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-4704:
--
Issue Type: Bug  (was: New Feature)

> String operations yield incorrect results when executed through SQL shell
> -
>
> Key: BEAM-4704
> URL: https://issues.apache.org/jira/browse/BEAM-4704
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>
> {{TRIM}} is defined to trim _all_ the characters in a string. Calcite has an 
> incorrect implementation of this. We use our own fixed implementation. But 
> when executed through the SQL shell, the results do not match what we get 
> from the PTransform path. Here are two test cases that pass on {{master}} but 
> are incorrect in the shell:
> {code}
> BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe');
> ++
> | EXPR$0 |
> ++
> | hehe__hehe |
> ++
> {code}
> {code}
> BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe');
> ++
> |   EXPR$0   |
> ++
> | hehe__heh  |
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4704) String operations yield incorrect results when executed through SQL shell

2018-07-01 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-4704:
-

 Summary: String operations yield incorrect results when executed 
through SQL shell
 Key: BEAM-4704
 URL: https://issues.apache.org/jira/browse/BEAM-4704
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Kenneth Knowles
Assignee: Kenneth Knowles


{{TRIM}} is defined to trim _all_ the characters in a string. Calcite has an 
incorrect implementation of this. We use our own fixed implementation. But when 
executed through the SQL shell, the results do not match what we get from the 
PTransform path. Here are two test cases that pass on {{master}} but are 
incorrect in the shell:

{code}
BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe');
++
| EXPR$0 |
++
| hehe__hehe |
++
{code}

{code}
BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe');
++
|   EXPR$0   |
++
| hehe__heh  |
++
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4703) SQL operators should have overloading resolved statically

2018-07-01 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-4703:
-

 Summary: SQL operators should have overloading resolved statically
 Key: BEAM-4703
 URL: https://issues.apache.org/jira/browse/BEAM-4703
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Kenneth Knowles


Currently, operators like {{>, }}{{<=}}, and others dynamically dispatch on the 
type of their operands. But SQL is statically typed, and we should both 
validate the types (if Calcite doesn't do it for us) and gain performance by 
avoiding this dispatch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4702) After SQL GROUP BY the result should be globally windowed

2018-07-01 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-4702:
-

 Summary: After SQL GROUP BY  the result should be 
globally windowed
 Key: BEAM-4702
 URL: https://issues.apache.org/jira/browse/BEAM-4702
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Kenneth Knowles
Assignee: Kenneth Knowles






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4700) JDBC driver cannot support TIMESTAMP data type

2018-07-01 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529250#comment-16529250
 ] 

Kenneth Knowles commented on BEAM-4700:
---

Downgrading to "Major" to get some better test coverage.

> JDBC driver cannot support TIMESTAMP data type
> --
>
> Key: BEAM-4700
> URL: https://issues.apache.org/jira/browse/BEAM-4700
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Avatica allows column representation to be customized, so a timestamp can be 
> stored as a variety of types. Joda ReadableInstant is none of these types: 
> https://github.com/apache/calcite-avatica/blob/acb675de97b9b0743c09368820a770e2ceda05f8/core/src/main/java/org/apache/calcite/avatica/util/AbstractCursor.java#L162
> By default, it seems to be configured to store {{TIMESTAMP}} columns as 
> {{long}} values. If you run the SQL shell and select a {{TIMESTAMP}} column, 
> you get:
> {code}
> ava.lang.ClassCastException: org.joda.time.Instant cannot be cast to 
> java.lang.Number
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$NumberAccessor.getNumber(AbstractCursor.java:726)
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$TimestampFromNumberAccessor.getString(AbstractCursor.java:1026)
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.AvaticaResultSet.getString(AvaticaResultSet.java:225)
> at sqlline.Rows$Row.(Rows.java:183)
> {code}
> So, essentially, Beam SQL Shell does not support timestamps.
> We may be able to:
>  - override how the accessor for our existing storage is created
>  - configure what the column representation is (this doesn't really help, 
> since none of the choices are ours)
>  - convert timestamps to longs in BeamEnumerableConverter; not sure how many 
> conversions will be required here



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4700) JDBC driver cannot support TIMESTAMP data type

2018-07-01 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-4700:
--
Priority: Major  (was: Blocker)

> JDBC driver cannot support TIMESTAMP data type
> --
>
> Key: BEAM-4700
> URL: https://issues.apache.org/jira/browse/BEAM-4700
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Avatica allows column representation to be customized, so a timestamp can be 
> stored as a variety of types. Joda ReadableInstant is none of these types: 
> https://github.com/apache/calcite-avatica/blob/acb675de97b9b0743c09368820a770e2ceda05f8/core/src/main/java/org/apache/calcite/avatica/util/AbstractCursor.java#L162
> By default, it seems to be configured to store {{TIMESTAMP}} columns as 
> {{long}} values. If you run the SQL shell and select a {{TIMESTAMP}} column, 
> you get:
> {code}
> ava.lang.ClassCastException: org.joda.time.Instant cannot be cast to 
> java.lang.Number
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$NumberAccessor.getNumber(AbstractCursor.java:726)
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$TimestampFromNumberAccessor.getString(AbstractCursor.java:1026)
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.AvaticaResultSet.getString(AvaticaResultSet.java:225)
> at sqlline.Rows$Row.(Rows.java:183)
> {code}
> So, essentially, Beam SQL Shell does not support timestamps.
> We may be able to:
>  - override how the accessor for our existing storage is created
>  - configure what the column representation is (this doesn't really help, 
> since none of the choices are ours)
>  - convert timestamps to longs in BeamEnumerableConverter; not sure how many 
> conversions will be required here



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4701) Run SQL DSL-level tests through JDBC driver

2018-07-01 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-4701:
-

 Summary: Run SQL DSL-level tests through JDBC driver
 Key: BEAM-4701
 URL: https://issues.apache.org/jira/browse/BEAM-4701
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Kenneth Knowles
Assignee: Kenneth Knowles


Currently there's a testing gap where tests at the DSL level mostly go through 
the QueryTransform. So we can hit issues where we have a mismatch in Calcite 
Avatica, which provides the JDBC layer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4700) JDBC driver cannot support TIMESTAMP data type

2018-06-30 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-4700:
--
Issue Type: Bug  (was: New Feature)

> JDBC driver cannot support TIMESTAMP data type
> --
>
> Key: BEAM-4700
> URL: https://issues.apache.org/jira/browse/BEAM-4700
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Blocker
>
> Avatica allows column representation to be customized, so a timestamp can be 
> stored as a variety of types. Joda ReadableInstant is none of these types: 
> https://github.com/apache/calcite-avatica/blob/acb675de97b9b0743c09368820a770e2ceda05f8/core/src/main/java/org/apache/calcite/avatica/util/AbstractCursor.java#L162
> By default, it seems to be configured to store {{TIMESTAMP}} columns as 
> {{long}} values. If you run the SQL shell and select a {{TIMESTAMP}} column, 
> you get:
> {code}
> ava.lang.ClassCastException: org.joda.time.Instant cannot be cast to 
> java.lang.Number
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$NumberAccessor.getNumber(AbstractCursor.java:726)
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$TimestampFromNumberAccessor.getString(AbstractCursor.java:1026)
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.AvaticaResultSet.getString(AvaticaResultSet.java:225)
> at sqlline.Rows$Row.(Rows.java:183)
> {code}
> So, essentially, Beam SQL Shell does not support timestamps.
> We may be able to:
>  - override how the accessor for our existing storage is created
>  - configure what the column representation is (this doesn't really help, 
> since none of the choices are ours)
>  - convert timestamps to longs in BeamEnumerableConverter; not sure how many 
> conversions will be required here



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4700) JDBC driver cannot support TIMESTAMP data type

2018-06-30 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-4700:
-

 Summary: JDBC driver cannot support TIMESTAMP data type
 Key: BEAM-4700
 URL: https://issues.apache.org/jira/browse/BEAM-4700
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Kenneth Knowles
Assignee: Kenneth Knowles


Avatica allows column representation to be customized, so a timestamp can be 
stored as a variety of types. Joda ReadableInstant is none of these types: 
https://github.com/apache/calcite-avatica/blob/acb675de97b9b0743c09368820a770e2ceda05f8/core/src/main/java/org/apache/calcite/avatica/util/AbstractCursor.java#L162

By default, it seems to be configured to store {{TIMESTAMP}} columns as 
{{long}} values. If you run the SQL shell and select a {{TIMESTAMP}} column, 
you get:

{code}
ava.lang.ClassCastException: org.joda.time.Instant cannot be cast to 
java.lang.Number
at 
org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$NumberAccessor.getNumber(AbstractCursor.java:726)
at 
org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.util.AbstractCursor$TimestampFromNumberAccessor.getString(AbstractCursor.java:1026)
at 
org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.avatica.AvaticaResultSet.getString(AvaticaResultSet.java:225)
at sqlline.Rows$Row.(Rows.java:183)
{code}

So, essentially, Beam SQL Shell does not support timestamps.

We may be able to:

 - override how the accessor for our existing storage is created
 - configure what the column representation is (this doesn't really help, since 
none of the choices are ours)
 - convert timestamps to longs in BeamEnumerableConverter; not sure how many 
conversions will be required here



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4699) BeamFileSystemArtifactServicesTest.putArtifactsSingleSmallFileTest flake

2018-06-30 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-4699:
-

 Summary: 
BeamFileSystemArtifactServicesTest.putArtifactsSingleSmallFileTest flake
 Key: BEAM-4699
 URL: https://issues.apache.org/jira/browse/BEAM-4699
 Project: Beam
  Issue Type: New Feature
  Components: runner-core
Reporter: Kenneth Knowles
Assignee: Henning Rohde


I've seen a few transient failures from {{BeamFileSystemArtifactServicesTest}}. 
I don't recall if they are all {{putArtifactsSingleSmallFileTest}} or how often 
they occur.

https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/

{code}
java.io.FileNotFoundException: 
/tmp/junit8499382858780569091/staging/123/artifacts/artifact_c147efcfc2d7ea666a9e4f5187b115c90903f0fc896a56df9a6ef5d8f3fc9f31
 (No such file or directory)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4698) SchemaTest not sickbayed for Samza runner

2018-06-30 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-4698:
-

 Summary: SchemaTest not sickbayed for Samza runner
 Key: BEAM-4698
 URL: https://issues.apache.org/jira/browse/BEAM-4698
 Project: Beam
  Issue Type: New Feature
  Components: runner-samza
Reporter: Kenneth Knowles
Assignee: Reuven Lax


https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/

This failed the SchemaTest, which we know all runners will fail since it uses a 
different entry point for {{SimpleDoFnRunner}}. It should be excluded from 
their test suites.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4695) Add Samza to "works with" on the Beam site landing page

2018-06-29 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-4695:
-

 Summary: Add Samza to "works with" on the Beam site landing page
 Key: BEAM-4695
 URL: https://issues.apache.org/jira/browse/BEAM-4695
 Project: Beam
  Issue Type: New Feature
  Components: website
Reporter: Kenneth Knowles
Assignee: Xinyu Liu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4652) PubsubIO: create subscription on different project than the topic

2018-06-29 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-4652.
---
   Resolution: Fixed
Fix Version/s: 2.6.0

> PubsubIO: create subscription on different project than the topic
> -
>
> Key: BEAM-4652
> URL: https://issues.apache.org/jira/browse/BEAM-4652
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Critical
> Fix For: 2.6.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> If you try to read a public pubsub topic in the DirectRunner, it will fail 
> with 403 when trying to create a subscription. This is because it tries to 
> create a subscription on the shared public data set.
> There is an example used in 
> https://github.com/googlecodelabs/cloud-dataflow-nyc-taxi-tycoon and the 
> dataset is {{projects/pubsub-public-data/topics/taxirides-realtime}}. I 
> discovered that I could not read this in the DirectRunner even though the 
> codelab works. But that 1.x codelab also does not work in the 
> InProcessPipelineRunner, so it has been broken all along.
> So you cannot read public data or any other read-only data using PubsubIO.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4622) Many Beam SQL expressions never have their validation called

2018-06-29 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4622:
-

Assignee: Alexey Romanenko

> Many Beam SQL expressions never have their validation called
> 
>
> Key: BEAM-4622
> URL: https://issues.apache.org/jira/browse/BEAM-4622
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Alexey Romanenko
>Priority: Major
>  Labels: easyfix, newbie, starter
>
> In {{BeamSqlFnExecutor}} there is a pattern where first the returned 
> expression is assigned to a variable {{ret}} and then after a giant switch 
> statement the validation is invoked. But there are many code paths that just 
> call {{return}} and skip validation. This should be refactored so it is 
> impossible to short-circuit on accident like this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4689) Dataflow cannot deserialize SplittableParDo DoFns

2018-06-29 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-4689:
--
Summary: Dataflow cannot deserialize SplittableParDo DoFns  (was: Dataflow 
postcommit broken)

> Dataflow cannot deserialize SplittableParDo DoFns
> -
>
> Key: BEAM-4689
> URL: https://issues.apache.org/jira/browse/BEAM-4689
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Eugene Kirpichov
>Priority: Blocker
>
> The Dataflow postcommit is broken in a way that seems real and user-impacting:
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/testReport/junit/org.apache.beam.sdk.transforms/SplittableDoFnTest/testSideInput/
> {code}
> Caused by: java.lang.IllegalArgumentException: unable to deserialize 
> Serialized DoFnInfo
> ...
> Caused by: java.io.InvalidClassException: 
> org.apache.beam.runners.core.construction.SplittableParDo$RandomUniqueKeyFn; 
> local class incompatible: stream classdesc serialVersionUID = 
> 6068396661487412884, local class serialVersionUID = -617521663543732196
> {code}
> This means that the worker is using a version of the class from its own 
> classpath, not the version from the user's staged pipeline. It implies that 
> the worker is not shading runners-core-construction. Because that is where a 
> ton of utility DoFns live, it is critical that it be shaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4689) Dataflow postcommit broken

2018-06-29 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4689:
-

Assignee: Eugene Kirpichov  (was: Thomas Groh)

> Dataflow postcommit broken
> --
>
> Key: BEAM-4689
> URL: https://issues.apache.org/jira/browse/BEAM-4689
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Eugene Kirpichov
>Priority: Blocker
>
> The Dataflow postcommit is broken in a way that seems real and user-impacting:
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/testReport/junit/org.apache.beam.sdk.transforms/SplittableDoFnTest/testSideInput/
> {code}
> Caused by: java.lang.IllegalArgumentException: unable to deserialize 
> Serialized DoFnInfo
> ...
> Caused by: java.io.InvalidClassException: 
> org.apache.beam.runners.core.construction.SplittableParDo$RandomUniqueKeyFn; 
> local class incompatible: stream classdesc serialVersionUID = 
> 6068396661487412884, local class serialVersionUID = -617521663543732196
> {code}
> This means that the worker is using a version of the class from its own 
> classpath, not the version from the user's staged pipeline. It implies that 
> the worker is not shading runners-core-construction. Because that is where a 
> ton of utility DoFns live, it is critical that it be shaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4689) Dataflow postcommit broken

2018-06-29 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-4689:
-

 Summary: Dataflow postcommit broken
 Key: BEAM-4689
 URL: https://issues.apache.org/jira/browse/BEAM-4689
 Project: Beam
  Issue Type: New Feature
  Components: runner-dataflow
Reporter: Kenneth Knowles
Assignee: Thomas Groh


The Dataflow postcommit is broken in a way that seems real and user-impacting:

https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/testReport/junit/org.apache.beam.sdk.transforms/SplittableDoFnTest/testSideInput/

{code}
Caused by: java.lang.IllegalArgumentException: unable to deserialize Serialized 
DoFnInfo
...
Caused by: java.io.InvalidClassException: 
org.apache.beam.runners.core.construction.SplittableParDo$RandomUniqueKeyFn; 
local class incompatible: stream classdesc serialVersionUID = 
6068396661487412884, local class serialVersionUID = -617521663543732196
{code}

This means that the worker is using a version of the class from its own 
classpath, not the version from the user's staged pipeline. It implies that the 
worker is not shading runners-core-construction. Because that is where a ton of 
utility DoFns live, it is critical that it be shaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   6   7   8   9   10   >