[jira] [Commented] (BEAM-4706) BigQueryTornadoesIT cannot be run using integrationTest and performanceTest tasks

2018-07-02 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-4706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529546#comment-16529546
 ] 

Łukasz Gajowy commented on BEAM-4706:
-

More context here: https://github.com/apache/beam/pull/5755

> BigQueryTornadoesIT cannot be run using integrationTest and performanceTest 
> tasks
> -
>
> Key: BEAM-4706
> URL: https://issues.apache.org/jira/browse/BEAM-4706
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Łukasz Gajowy
>Assignee: Kasia Kucharczyk
>Priority: Major
>
> It seems that we cannot run this test using tasks from BuildModulePlugin 
> designed to run such tests. Those are not included in build.gradle of 
> examples module. There's a possibility that some other tests are not able to 
> run due to this reason (this also has to be checked).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4707) Change fields and table names in nexmark perfkit tables

2018-07-02 Thread Etienne Chauchot (JIRA)
Etienne Chauchot created BEAM-4707:
--

 Summary: Change fields and table names in nexmark perfkit tables
 Key: BEAM-4707
 URL: https://issues.apache.org/jira/browse/BEAM-4707
 Project: Beam
  Issue Type: Improvement
  Components: testing
Reporter: Etienne Chauchot
Assignee: Etienne Chauchot


Nexmark BQ tables for perfkit lack timestamp field. Also the table name 
contains a boolean than shows the mode of execution. It would be better to have 
batch or streaming label in place of the boolean.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4706) BigQueryTornadoesIT cannot be run using integrationTest and performanceTest tasks

2018-07-02 Thread JIRA
Łukasz Gajowy created BEAM-4706:
---

 Summary: BigQueryTornadoesIT cannot be run using integrationTest 
and performanceTest tasks
 Key: BEAM-4706
 URL: https://issues.apache.org/jira/browse/BEAM-4706
 Project: Beam
  Issue Type: Bug
  Components: testing
Reporter: Łukasz Gajowy
Assignee: Kasia Kucharczyk


It seems that we cannot run this test using tasks from BuildModulePlugin 
designed to run such tests. Those are not included in build.gradle of examples 
module. There's a possibility that some other tests are not able to run due to 
this reason (this also has to be checked).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4705) Better Kryo integration

2018-07-02 Thread Vaclav Plajt (JIRA)
Vaclav Plajt created BEAM-4705:
--

 Summary: Better Kryo integration
 Key: BEAM-4705
 URL: https://issues.apache.org/jira/browse/BEAM-4705
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-euphoria
Reporter: Vaclav Plajt
Assignee: David Moravek






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3446) RedisIO non-prefix read operations

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3446?focusedWorklogId=118096=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118096
 ]

ASF GitHub Bot logged work on BEAM-3446:


Author: ASF GitHub Bot
Created on: 02/Jul/18 07:36
Start Date: 02/Jul/18 07:36
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #5841: Fixes 
https://issues.apache.org/jira/browse/BEAM-3446.
URL: https://github.com/apache/beam/pull/5841#issuecomment-401697979
 
 
   Run Java Precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118096)
Time Spent: 2h 20m  (was: 2h 10m)

> RedisIO non-prefix read operations
> --
>
> Key: BEAM-3446
> URL: https://issues.apache.org/jira/browse/BEAM-3446
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-redis
>Reporter: Vinay varma
>Assignee: Vinay varma
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Read operation in RedisIO is for prefix based look ups. While this can be 
> used for exact key matches as well, the number of operations limits the 
> through put of the function.
> I suggest exposing current readAll operation as readbyprefix and using more 
> simpler operations for readAll functionality.
> ex:
> {code:java}
> String output = jedis.get(element);
> if (output != null) {
> processContext.output(KV.of(element, output));
> }
> {code}
> instead of:
> https://github.com/apache/beam/blob/7d240c0bb171af6868f1a6e95196c9dcfc9ac640/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java#L292



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PreCommit_Java_Cron #55

2018-07-02 Thread Apache Jenkins Server
See 

--
[...truncated 16.56 MB...]
INFO: 2018-07-02T06:15:57.990Z: Lifting ValueCombiningMappingFns into 
MergeBucketsMappingFns
Jul 02, 2018 6:16:00 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T06:15:58.170Z: Fusing adjacent ParDo, Read, Write, and 
Flatten operations
Jul 02, 2018 6:16:00 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T06:15:58.202Z: Unzipping flatten s13 for input 
s12.org.apache.beam.sdk.values.PCollection.:349#1d275f544daf228c
Jul 02, 2018 6:16:00 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T06:15:58.235Z: Fusing unzipped copy of 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Add void 
key/AddKeys/Map, through flatten 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/WriteUnshardedBundlesToTempFiles/Flatten.PCollections,
 into producer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/WriteUnshardedBundlesToTempFiles/DropShardNum
Jul 02, 2018 6:16:00 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T06:15:58.263Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle/ExpandIterable
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle/GroupByKey/GroupByWindow
Jul 02, 2018 6:16:00 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T06:15:58.286Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/Window.Into()/Window.Assign
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Pair
 with random key
Jul 02, 2018 6:16:00 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T06:15:58.310Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Write
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Reify
Jul 02, 2018 6:16:00 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T06:15:58.340Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/GroupByWindow
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Read
Jul 02, 2018 6:16:00 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T06:15:58.370Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Reify
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/Window.Into()/Window.Assign
Jul 02, 2018 6:16:00 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T06:15:58.395Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Drop 
key/Values/Map into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle/ExpandIterable
Jul 02, 2018 6:16:00 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T06:15:58.426Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Gather 
bundles into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Drop 
key/Values/Map
Jul 02, 2018 6:16:00 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T06:15:58.451Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Pair
 with random key into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Gather 
bundles
Jul 02, 2018 6:16:00 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T06:15:58.475Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/FinalizeTempFileBundles/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/GroupByWindow
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/FinalizeTempFileBundles/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Read
Jul 02, 2018 6:16:00 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T06:15:58.505Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle/GroupByKey/Reify
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle/Window.Into()/Window.Assign
Jul 

[jira] [Work logged] (BEAM-4417) BigqueryIO Numeric datatype Support

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4417?focusedWorklogId=118112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118112
 ]

ASF GitHub Bot logged work on BEAM-4417:


Author: ASF GitHub Bot
Created on: 02/Jul/18 08:46
Start Date: 02/Jul/18 08:46
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #5755: [BEAM-4417] Support 
BigQuery's NUMERIC type using Java
URL: https://github.com/apache/beam/pull/5755#issuecomment-401716875
 
 
   @ElliottBrossard I think you've found an issue in the gradle code. 
`performanceTest` task is defined in `BeamModulePlugin`. Therefore every module 
that uses it, has to append it by adding 
   
   ```
   provideIntegrationTestingDependencies()
   enableJavaPerformanceTesting()
   ```
   
   in the build.gradle file. For some reason, those are not added in examples' 
build.gradle, so this is why you are unable to use it and run the test. We'll 
have to fix it. Thanks for reporting this. 
   
   JIRA for this: https://issues.apache.org/jira/browse/BEAM-4706


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118112)
Time Spent: 3h 20m  (was: 3h 10m)

> BigqueryIO Numeric datatype Support
> ---
>
> Key: BEAM-4417
> URL: https://issues.apache.org/jira/browse/BEAM-4417
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Kishan Kumar
>Assignee: Chamikara Jayalath
>Priority: Critical
>  Labels: newbie, patch
> Fix For: 2.6.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The BigQueryIO.read fails while parsing the data from the avro file generated 
> while reading the data from the table which has columns with *Numeric* 
> datatypes. 
> We have gone through the source code at Git-Hub and noticed that *Numeric 
> data type is not yet supported.* 
>  
> Caused by: com.google.common.base.VerifyException: Unsupported BigQuery type: 
> NUMERIC
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4643) Allow to check early panes of a window

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4643?focusedWorklogId=118142=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118142
 ]

ASF GitHub Bot logged work on BEAM-4643:


Author: ASF GitHub Bot
Created on: 02/Jul/18 12:14
Start Date: 02/Jul/18 12:14
Worklog Time Spent: 10m 
  Work Description: lhauspie edited a comment on issue #5811: [BEAM-4643] 
Allow to check early panes of a window
URL: https://github.com/apache/beam/pull/5811#issuecomment-401279907
 
 
   I don't understand what's going wrong with the [Jenkins 
Build](https://builds.apache.org/job/beam_PreCommit_Java_Commit/87/console).
   
   On local, I've got the following outputs (BUILD SUCCESSFUL):
   ```logs
   $ ./gradlew :beam-sdks-java-io-hadoop-input-format:test
   [...]
   > Task :beam-sdks-java-io-hadoop-input-format:testClasses
   
   > Task :beam-sdks-java-io-hadoop-input-format:test
   Java HotSpot(TM) Server VM warning: You have loaded library 
/tmp/libnetty-transport-native-epoll7569020236429498545.so which might have 
disabled stack guard. The VM will try to fix the stack guard now.
   It's highly recommended that you fix the library with 'execstack -c 
', or link it with '-z noexecstack'.
   
   Deprecated Gradle features were used in this build, making it 
incompatible with Gradle 5.0.
   See 
https://docs.gradle.org/4.8/userguide/command_line_interface.html#sec:command_line_warnings
   
   BUILD SUCCESSFUL in 2m 3s
   53 actionable tasks: 41 executed, 12 up-to-date
   ```
   
   ```logs
   $ ./gradlew :beam-sdks-java-io-elasticsearch-tests-2:test
   [...]
   > Task :beam-sdks-java-io-elasticsearch-tests-2:testClasses
   > Task :beam-sdks-java-io-elasticsearch-tests-2:test
   
   Deprecated Gradle features were used in this build, making it 
incompatible with Gradle 5.0.
   See 
https://docs.gradle.org/4.8/userguide/command_line_interface.html#sec:command_line_warnings
   
   BUILD SUCCESSFUL in 25s
   48 actionable tasks: 6 executed, 42 up-to-date
   ```
   
   @kennknowles, could you please tell me if I introduced some bugs or did 
something wrong ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118142)
Time Spent: 40m  (was: 0.5h)
Remaining Estimate: 23h 20m  (was: 23.5h)

> Allow to check early panes of a window
> --
>
> Key: BEAM-4643
> URL: https://issues.apache.org/jira/browse/BEAM-4643
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, testing
>Affects Versions: 2.5.0
>Reporter: Logan HAUSPIE
>Assignee: Logan HAUSPIE
>Priority: Minor
>   Original Estimate: 24h
>  Time Spent: 40m
>  Remaining Estimate: 23h 20m
>
> What I would like to do is:
> {{PAssert.that(teamScores)}}
>  {{    .inEarlyPanes(intervalWindow(05, 20))}}
>  {{        .containsInAnyOrder(KV.of("black", 1), KV.of("black", 2)) // 
> Window triggered 2 times earlier (black, 1) + (black, 1)}}
>  {{    .inOnTimePane(intervalWindow(05, 20))}}
>  {{         .containsInAnyOrder(KV.of("black", 2)) // Then triggered again by 
> reach the watermark (no additionnal data)}}
>  {{    .inFinalPane(intervalWindow(05, 20))}}
>  {{         .containsInAnyOrder(KV.of("black", 10))}}{{; // And then fired by 
> receiving a late data (black, 8)}}
> NB: intervalWindow(05, 20) return an IntervalWindow from 5 minutes to 20 
> minutes
>  
> The workaround I found is to filter the PCollection to keep only the EARLY 
> elements with this method:
> {{public static  PCollection filter(PCollection values, 
> PaneInfo.Timing timing) {}}
> {{  PCollection filtered = values}}
> {{      .apply("Wrap into ValueInSingleWindow for filtering",}}
> {{          ParDo.of(}}
> {{              new DoFn>() {}}
> {{                  @ProcessElement}}
> {{                  public void processElement(ProcessContext c, 
> BoundedWindow window) {}}
> {{                    
> c.outputWithTimestamp(ValueInSingleWindow.of(c.element(), c.timestamp(), 
> window, c.pane()), c.timestamp());}}
> {{                    }}}
> {{              }}}
> {{          )}}
> {{      )}}
>  {{      .setCoder(}}
>  {{          ValueInSingleWindow.Coder.of(}}
>  {{              values.getCoder(), 
> values.getWindowingStrategy().getWindowFn().windowCoder()}}
> {{          )}}
> {{      )}}
> {{      .apply(Filter.by(a -> a.getPane().getTiming() == timing))}}
> {{      .apply("Unwrap from ValueInSingleWindow 

Build failed in Jenkins: beam_PreCommit_Java_Cron #56

2018-07-02 Thread Apache Jenkins Server
See 

--
[...truncated 16.53 MB...]
INFO: 2018-07-02T12:20:30.381Z: Fusing adjacent ParDo, Read, Write, and 
Flatten operations
Jul 02, 2018 12:20:36 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:20:30.415Z: Unzipping flatten s13 for input 
s12.org.apache.beam.sdk.values.PCollection.:349#1d275f544daf228c
Jul 02, 2018 12:20:36 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:20:30.444Z: Fusing unzipped copy of 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Add void 
key/AddKeys/Map, through flatten 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/WriteUnshardedBundlesToTempFiles/Flatten.PCollections,
 into producer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/WriteUnshardedBundlesToTempFiles/DropShardNum
Jul 02, 2018 12:20:36 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:20:30.474Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle/ExpandIterable
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle/GroupByKey/GroupByWindow
Jul 02, 2018 12:20:36 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:20:30.502Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/Window.Into()/Window.Assign
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Pair
 with random key
Jul 02, 2018 12:20:36 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:20:30.536Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Write
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Reify
Jul 02, 2018 12:20:36 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:20:30.563Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/GroupByWindow
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Read
Jul 02, 2018 12:20:36 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:20:30.593Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Reify
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/Window.Into()/Window.Assign
Jul 02, 2018 12:20:36 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:20:30.622Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Drop 
key/Values/Map into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle/ExpandIterable
Jul 02, 2018 12:20:36 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:20:30.653Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Gather 
bundles into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Drop 
key/Values/Map
Jul 02, 2018 12:20:36 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:20:30.682Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Pair
 with random key into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Gather 
bundles
Jul 02, 2018 12:20:36 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:20:30.708Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/FinalizeTempFileBundles/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/GroupByWindow
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/FinalizeTempFileBundles/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Read
Jul 02, 2018 12:20:36 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:20:30.738Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle/GroupByKey/Reify
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle/Window.Into()/Window.Assign
Jul 02, 2018 12:20:36 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:20:30.763Z: Fusing consumer 

[jira] [Work logged] (BEAM-4708) Runners-core javadoc broken, blocked snapshot release

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4708?focusedWorklogId=118162=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118162
 ]

ASF GitHub Bot logged work on BEAM-4708:


Author: ASF GitHub Bot
Created on: 02/Jul/18 13:09
Start Date: 02/Jul/18 13:09
Worklog Time Spent: 10m 
  Work Description: kennknowles closed pull request #5853: [BEAM-4708] Fix 
broken javadoc
URL: https://github.com/apache/beam/pull/5853
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Timer.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Timer.java
index 77962815b04..072dbc70416 100644
--- 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Timer.java
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Timer.java
@@ -33,10 +33,9 @@
 /**
  * A timer consists of a timestamp and a corresponding user supplied payload.
  *
- * Note that this implementation is specifically used during execution 
within runners and inside
- * the SDK harness. The {@link org.apache.beam.sdk.state.Timer} represents the 
current user facing
- * API. Consider consolidating the two once {@link 
org.apache.beam.runners.core.TimerInternals} is
- * no longer the way in which users construct/interact with timers.
+ * Note that this is an implementation helper specifically intended for use 
during execution by
+ * runners and the Java SDK harness. The API for pipeline authors is {@link
+ * org.apache.beam.sdk.state.Timer}.
  */
 @AutoValue
 public abstract class Timer {


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118162)
Time Spent: 0.5h  (was: 20m)

> Runners-core javadoc broken, blocked snapshot release
> -
>
> Key: BEAM-4708
> URL: https://issues.apache.org/jira/browse/BEAM-4708
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Blocker
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] 01/01: Merge pull request #5853: [BEAM-4708] Fix broken javadoc

2018-07-02 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03
Merge: d703868 d3b8830
Author: Kenn Knowles 
AuthorDate: Mon Jul 2 06:09:23 2018 -0700

Merge pull request #5853: [BEAM-4708] Fix broken javadoc

 .../main/java/org/apache/beam/runners/core/construction/Timer.java | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)



[beam] branch master updated (d703868 -> fbfe6ce)

2018-07-02 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from d703868  Merge pull request #5849: [BEAM-4700] Convert Beam Row to 
Avatica Row in BeamEnumerableCollector
 add d3b8830  Fix broken javadoc
 new fbfe6ce  Merge pull request #5853: [BEAM-4708] Fix broken javadoc

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../main/java/org/apache/beam/runners/core/construction/Timer.java | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)



[jira] [Work logged] (BEAM-4708) Runners-core javadoc broken, blocked snapshot release

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4708?focusedWorklogId=118163=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118163
 ]

ASF GitHub Bot logged work on BEAM-4708:


Author: ASF GitHub Bot
Created on: 02/Jul/18 13:11
Start Date: 02/Jul/18 13:11
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #5853: [BEAM-4708] Fix 
broken javadoc
URL: https://github.com/apache/beam/pull/5853#issuecomment-401799189
 
 
   Filed https://issues.apache.org/jira/browse/BEAM-4709 to track testing this 
in a postcommit before the nightly. Seems like we could.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118163)
Time Spent: 40m  (was: 0.5h)

> Runners-core javadoc broken, blocked snapshot release
> -
>
> Key: BEAM-4708
> URL: https://issues.apache.org/jira/browse/BEAM-4708
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4708) Runners-core javadoc broken, blocked snapshot release

2018-07-02 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-4708.
---
   Resolution: Fixed
Fix Version/s: Not applicable

> Runners-core javadoc broken, blocked snapshot release
> -
>
> Key: BEAM-4708
> URL: https://issues.apache.org/jira/browse/BEAM-4708
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4709) Javadoc build only tested during release

2018-07-02 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-4709:
-

 Summary: Javadoc build only tested during release
 Key: BEAM-4709
 URL: https://issues.apache.org/jira/browse/BEAM-4709
 Project: Beam
  Issue Type: Bug
  Components: build-system
Reporter: Kenneth Knowles


As seen in breakage like BEAM-4708, the javadoc is only build as part of the 
publish process. This seems like something that could comfortably fit in a 
postcommit test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4691) Rename (and reorganize?) jenkins jobs

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4691?focusedWorklogId=118140=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118140
 ]

ASF GitHub Bot logged work on BEAM-4691:


Author: ASF GitHub Bot
Created on: 02/Jul/18 12:07
Start Date: 02/Jul/18 12:07
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #5831: [BEAM-4691] 
(do-not-merge-yet!) Move perf tests to a separate directory and rename them 
(conventionally)
URL: https://github.com/apache/beam/pull/5831#issuecomment-401783039
 
 
   Run seed job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118140)
Time Spent: 0.5h  (was: 20m)

> Rename (and reorganize?) jenkins jobs
> -
>
> Key: BEAM-4691
> URL: https://issues.apache.org/jira/browse/BEAM-4691
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Łukasz Gajowy
>Assignee: Łukasz Gajowy
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Link to discussion: 
> [https://lists.apache.org/thread.html/ebe220ec1cebc73c8fb7190cf115fb9b23165fdbf950d58e05db544d@%3Cdev.beam.apache.org%3E]
> Since jobs are Groovy files their names should be CamelCase. We could also 
> place them in subdirectories instead of prefixing job names. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PerformanceTests_HadoopInputFormat #467

2018-07-02 Thread Apache Jenkins Server
See 


--
[...truncated 161.90 KB...]
org.postgresql.util.PSQLException: The connection attempt failed.
at 
org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:257)
at 
org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
at org.postgresql.jdbc.PgConnection.(PgConnection.java:195)
at org.postgresql.Driver.makeConnection(Driver.java:452)
at org.postgresql.Driver.connect(Driver.java:254)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at 
org.postgresql.ds.common.BaseDataSource.getConnection(BaseDataSource.java:94)
at 
org.postgresql.ds.common.BaseDataSource.getConnection(BaseDataSource.java:79)
at 
org.apache.beam.sdk.io.common.DatabaseTestHelper.deleteTable(DatabaseTestHelper.java:52)
at 
org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIOIT.deleteTable(HadoopInputFormatIOIT.java:131)
at 
org.apache.beam.sdk.io.common.IOITHelper.executeWithRetry(IOITHelper.java:86)
at 
org.apache.beam.sdk.io.common.IOITHelper.executeWithRetry(IOITHelper.java:66)
at 
org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIOIT.tearDown(HadoopInputFormatIOIT.java:127)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:109)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:155)
at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:137)
at 
org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
at 
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
at 
org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
at 

[jira] [Work logged] (BEAM-4691) Rename (and reorganize?) jenkins jobs

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4691?focusedWorklogId=118145=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118145
 ]

ASF GitHub Bot logged work on BEAM-4691:


Author: ASF GitHub Bot
Created on: 02/Jul/18 12:24
Start Date: 02/Jul/18 12:24
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #5831: [BEAM-4691] 
(do-not-merge-yet!) Move perf tests to a separate directory and rename them 
(conventionally)
URL: https://github.com/apache/beam/pull/5831#issuecomment-401786970
 
 
   Run seed job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118145)
Time Spent: 40m  (was: 0.5h)

> Rename (and reorganize?) jenkins jobs
> -
>
> Key: BEAM-4691
> URL: https://issues.apache.org/jira/browse/BEAM-4691
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Łukasz Gajowy
>Assignee: Łukasz Gajowy
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Link to discussion: 
> [https://lists.apache.org/thread.html/ebe220ec1cebc73c8fb7190cf115fb9b23165fdbf950d58e05db544d@%3Cdev.beam.apache.org%3E]
> Since jobs are Groovy files their names should be CamelCase. We could also 
> place them in subdirectories instead of prefixing job names. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #1011

2018-07-02 Thread Apache Jenkins Server
See 


--
[...truncated 19.95 MB...]
INFO: 2018-07-02T12:38:41.821Z: Autoscaling is enabled for job 
2018-07-02_05_38_41-263554828202134781. The number of workers will be between 1 
and 1000.
Jul 02, 2018 12:38:53 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:38:41.846Z: Autoscaling was automatically enabled for 
job 2018-07-02_05_38_41-263554828202134781.
Jul 02, 2018 12:38:53 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:38:44.389Z: Checking required Cloud APIs are enabled.
Jul 02, 2018 12:38:53 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:38:44.786Z: Checking permissions granted to controller 
Service Account.
Jul 02, 2018 12:38:53 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:38:48.883Z: Worker configuration: n1-standard-1 in 
us-central1-f.
Jul 02, 2018 12:38:53 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:38:49.437Z: Expanding CoGroupByKey operations into 
optimizable parts.
Jul 02, 2018 12:38:53 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:38:49.662Z: Expanding GroupByKey operations into 
optimizable parts.
Jul 02, 2018 12:38:53 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:38:49.697Z: Lifting ValueCombiningMappingFns into 
MergeBucketsMappingFns
Jul 02, 2018 12:38:53 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:38:49.913Z: Fusing adjacent ParDo, Read, Write, and 
Flatten operations
Jul 02, 2018 12:38:53 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:38:49.943Z: Elided trivial flatten 
Jul 02, 2018 12:38:53 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:38:49.978Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map into SpannerIO.Write/Write 
mutations to Cloud Spanner/Create seed/Read(CreateSource)
Jul 02, 2018 12:38:53 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:38:50.006Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Read information schema into SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map
Jul 02, 2018 12:38:53 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:38:50.030Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Write
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/ParDo(UseWindowHashAsKeyAndWindowAsSortKey)
Jul 02, 2018 12:38:53 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:38:50.061Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/ParDo(IsmRecordForSingularValuePerWindow) 
into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Read
Jul 02, 2018 12:38:53 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:38:50.218Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/WithKeys/AddKeys/Map
 into SpannerIO.Write/Write mutations to Cloud Spanner/Read information schema
Jul 02, 2018 12:38:53 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:38:50.588Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey/Read
Jul 02, 2018 12:38:53 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-02T12:38:50.847Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Values/Values/Map
 into 

[jira] [Work logged] (BEAM-4691) Rename (and reorganize?) jenkins jobs

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4691?focusedWorklogId=118153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118153
 ]

ASF GitHub Bot logged work on BEAM-4691:


Author: ASF GitHub Bot
Created on: 02/Jul/18 12:45
Start Date: 02/Jul/18 12:45
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #5831: [BEAM-4691] 
(do-not-merge-yet!) Move perf tests to a separate directory and rename them 
(conventionally)
URL: https://github.com/apache/beam/pull/5831#issuecomment-401792393
 
 
   Run seed job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118153)
Time Spent: 50m  (was: 40m)

> Rename (and reorganize?) jenkins jobs
> -
>
> Key: BEAM-4691
> URL: https://issues.apache.org/jira/browse/BEAM-4691
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Łukasz Gajowy
>Assignee: Łukasz Gajowy
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Link to discussion: 
> [https://lists.apache.org/thread.html/ebe220ec1cebc73c8fb7190cf115fb9b23165fdbf950d58e05db544d@%3Cdev.beam.apache.org%3E]
> Since jobs are Groovy files their names should be CamelCase. We could also 
> place them in subdirectories instead of prefixing job names. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4708) Runners-core javadoc broken, blocked snapshot release

2018-07-02 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-4708:
-

 Summary: Runners-core javadoc broken, blocked snapshot release
 Key: BEAM-4708
 URL: https://issues.apache.org/jira/browse/BEAM-4708
 Project: Beam
  Issue Type: Bug
  Components: runner-core
Reporter: Kenneth Knowles
Assignee: Kenneth Knowles






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4708) Runners-core javadoc broken, blocked snapshot release

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4708?focusedWorklogId=118160=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118160
 ]

ASF GitHub Bot logged work on BEAM-4708:


Author: ASF GitHub Bot
Created on: 02/Jul/18 13:07
Start Date: 02/Jul/18 13:07
Worklog Time Spent: 10m 
  Work Description: kennknowles opened a new pull request #5853: 
[BEAM-4708] Fix broken javadoc
URL: https://github.com/apache/beam/pull/5853
 
 
   This javadoc has a broken link, per the actual dependencies declared for the 
module. For now, a quick fix.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118160)
Time Spent: 10m
Remaining Estimate: 0h

> Runners-core javadoc broken, blocked snapshot release
> -
>
> Key: BEAM-4708
> URL: https://issues.apache.org/jira/browse/BEAM-4708
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Blocker
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This 

[jira] [Work logged] (BEAM-4708) Runners-core javadoc broken, blocked snapshot release

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4708?focusedWorklogId=118161=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118161
 ]

ASF GitHub Bot logged work on BEAM-4708:


Author: ASF GitHub Bot
Created on: 02/Jul/18 13:08
Start Date: 02/Jul/18 13:08
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #5853: [BEAM-4708] Fix 
broken javadoc
URL: https://github.com/apache/beam/pull/5853#issuecomment-401798317
 
 
   Commit-then-review for build break.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118161)
Time Spent: 20m  (was: 10m)

> Runners-core javadoc broken, blocked snapshot release
> -
>
> Key: BEAM-4708
> URL: https://issues.apache.org/jira/browse/BEAM-4708
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Blocker
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3446) RedisIO non-prefix read operations

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3446?focusedWorklogId=118164=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118164
 ]

ASF GitHub Bot logged work on BEAM-3446:


Author: ASF GitHub Bot
Created on: 02/Jul/18 13:12
Start Date: 02/Jul/18 13:12
Worklog Time Spent: 10m 
  Work Description: iemejia commented on a change in pull request #5841: 
Fixes https://issues.apache.org/jira/browse/BEAM-3446.
URL: https://github.com/apache/beam/pull/5841#discussion_r199491469
 
 

 ##
 File path: 
sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java
 ##
 @@ -279,28 +290,31 @@ public void processElement(ProcessContext 
processContext) throws Exception {
   while (!finished) {
 ScanResult scanResult = jedis.scan(cursor, scanParams);
 List keys = scanResult.getResult();
-
-Pipeline pipeline = jedis.pipelined();
-if (keys != null) {
-  for (String key : keys) {
-pipeline.get(key);
-  }
-  List values = pipeline.syncAndReturnAll();
-  for (int i = 0; i < values.size(); i++) {
-processContext.output(KV.of(keys.get(i), (String) values.get(i)));
-  }
+for (String k : keys) {
+  processContext.output(k);
 }
-
 cursor = scanResult.getStringCursor();
 if ("0".equals(cursor)) {
   finished = true;
 }
   }
 }
+  }
+  /** A {@link DoFn} requesting Redis server to get key/value pairs. */
+  private static class ReadFn extends BaseReadFn> {
 
-@Teardown
-public void teardown() {
-  jedis.close();
+ReadFn(RedisConnectionConfiguration connectionConfiguration) {
+  super(connectionConfiguration);
+}
+
+@ProcessElement
+public void processElement(ProcessContext processContext) throws Exception 
{
+  String key = processContext.element();
+
+  String value = jedis.get(key);
 
 Review comment:
   As mentioned in the previous PR I am a bit concerned about losing the 
multiple data request capability, any chance you can work on this with the 
approach you mentioned based on `MGET` for ReadFn.
   The simplest approach probably is to do like other IOs and have a default 
size that can be parametrized via a `withBatchSize` method. WDYT ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118164)
Time Spent: 2.5h  (was: 2h 20m)

> RedisIO non-prefix read operations
> --
>
> Key: BEAM-3446
> URL: https://issues.apache.org/jira/browse/BEAM-3446
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-redis
>Reporter: Vinay varma
>Assignee: Vinay varma
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Read operation in RedisIO is for prefix based look ups. While this can be 
> used for exact key matches as well, the number of operations limits the 
> through put of the function.
> I suggest exposing current readAll operation as readbyprefix and using more 
> simpler operations for readAll functionality.
> ex:
> {code:java}
> String output = jedis.get(element);
> if (output != null) {
> processContext.output(KV.of(element, output));
> }
> {code}
> instead of:
> https://github.com/apache/beam/blob/7d240c0bb171af6868f1a6e95196c9dcfc9ac640/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java#L292



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4691) Rename (and reorganize?) jenkins jobs

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4691?focusedWorklogId=118173=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118173
 ]

ASF GitHub Bot logged work on BEAM-4691:


Author: ASF GitHub Bot
Created on: 02/Jul/18 13:33
Start Date: 02/Jul/18 13:33
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #5831: [BEAM-4691] 
(do-not-merge-yet!) Move perf tests to a separate directory and rename them 
(conventionally)
URL: https://github.com/apache/beam/pull/5831#issuecomment-401805584
 
 
   I did a small test in order to check if we can refactor the code this way 
(split to directories, change to CamelCase). Below are some things I've found 
out:
   
   1. I am not sure if I can change the job_00_seed.groovy name. Jenkins didn't 
change the glob pattern used to search for other job definitions when the name 
was `Seed.groovy` (like it wasn't able to find the seed job when it's name 
changed). Am I missing something here? Is path to seed job hardcoded in the 
workspace/jenkins config (see: [the seed job run that cannot find jobs using 
the old "job_" 
pattern](https://builds.apache.org/view/A-D/view/Beam/job/beam_SeedJob/2133/console)).
   
   2. I cannot import CommonProperties everywhere when the directory structure 
is more than 1 level deep. This is described here:  
[LINK](https://github.com/jenkinsci/job-dsl-plugin/wiki/Real-World-Examples#import-other-files-ie-with-class-definitions-into-your-script).
 I will try to simplify the directory tree to be one level and check if it 
works for our case.
   
   3. I think we should split the CommonProperties file (former 
common_job_properties). It's a "god object" to me and knows way too much.
   
   Another good thing of having directories would be the possibility to hide 
PreCommitBuilder in pre_commit directory (and so on) - so this is more than 
just clean structure. 
   
   CC: @kennknowles @swegner request for comments :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118173)
Time Spent: 1h  (was: 50m)

> Rename (and reorganize?) jenkins jobs
> -
>
> Key: BEAM-4691
> URL: https://issues.apache.org/jira/browse/BEAM-4691
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Łukasz Gajowy
>Assignee: Łukasz Gajowy
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Link to discussion: 
> [https://lists.apache.org/thread.html/ebe220ec1cebc73c8fb7190cf115fb9b23165fdbf950d58e05db544d@%3Cdev.beam.apache.org%3E]
> Since jobs are Groovy files their names should be CamelCase. We could also 
> place them in subdirectories instead of prefixing job names. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4707) Change fields and table names in nexmark perfkit tables

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4707?focusedWorklogId=118175=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118175
 ]

ASF GitHub Bot logged work on BEAM-4707:


Author: ASF GitHub Bot
Created on: 02/Jul/18 13:35
Start Date: 02/Jul/18 13:35
Worklog Time Spent: 10m 
  Work Description: echauchot opened a new pull request #5855: [BEAM-4707] 
Add timestamp field to Nexmark tables and use explicit mode in table name
URL: https://github.com/apache/beam/pull/5855
 
 
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [X] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [X] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118175)
Time Spent: 10m
Remaining Estimate: 0h

> Change fields and table names in nexmark perfkit tables
> ---
>
> Key: BEAM-4707
> URL: https://issues.apache.org/jira/browse/BEAM-4707
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Nexmark BQ tables for perfkit lack timestamp field. Also the table 

[jira] [Work logged] (BEAM-4126) Delete maven build files

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4126?focusedWorklogId=118251=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118251
 ]

ASF GitHub Bot logged work on BEAM-4126:


Author: ASF GitHub Bot
Created on: 02/Jul/18 17:12
Start Date: 02/Jul/18 17:12
Worklog Time Spent: 10m 
  Work Description: lukecwik closed pull request #5571: DO NOT MERGE YET: 
[BEAM-4126] Delete Maven build files.
URL: https://github.com/apache/beam/pull/5571
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):



 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118251)
Time Spent: 50m  (was: 40m)

> Delete maven build files
> 
>
> Key: BEAM-4126
> URL: https://issues.apache.org/jira/browse/BEAM-4126
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Scott Wegner
>Assignee: Luke Cwik
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Once we are fully-migrated to Gradle, we should proactively remove Maven 
> build files (pom.xml etc) so that it's clear they no longer need to be 
> maintained.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (fbfe6ce -> 98534d9)

2018-07-02 Thread lcwik
This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from fbfe6ce  Merge pull request #5853: [BEAM-4708] Fix broken javadoc
 add a8fcca4  Revert "Enabling gradle build to receive extra buildscript 
repositories"
 add 9351539  Don't enable plugins which require you to be online when run 
in --offline mode.
 new 98534d9  [BEAM-3249] Update the build script to be compatible with 
building Gradle within Google

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 build.gradle | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)



[jira] [Work logged] (BEAM-3971) Pipeline translation utilities should not use SDK construction classes

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3971?focusedWorklogId=118281=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118281
 ]

ASF GitHub Bot logged work on BEAM-3971:


Author: ASF GitHub Bot
Created on: 02/Jul/18 17:43
Start Date: 02/Jul/18 17:43
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5833: 
[BEAM-3971, BEAM-4284] Remove fromProto for Pipeline and PTransform translation.
URL: https://github.com/apache/beam/pull/5833#discussion_r199571916
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CombineTranslationTest.java
 ##
 @@ -179,6 +185,12 @@ public void leaveCompositeTransform(Node node) {
 }
   }
 
+  private static Coder getAccumulatorCoder(
+  CombinePayload payload, RehydratedComponents components) throws 
IOException {
 
 Review comment:
   RehyrdatedComponents is still useful as an object cache for coders/windowing 
strategy/...
   
   Still undecided what to do in this space.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118281)
Time Spent: 6h 10m  (was: 6h)

> Pipeline translation utilities should not use SDK construction classes
> --
>
> Key: BEAM-3971
> URL: https://issues.apache.org/jira/browse/BEAM-3971
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Ben Sidhom
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> In general, portable runners will require access to pipeline information not 
> available in rehydrated pipelines while constructing physical plans. 
> Translation utilities should operate directly on protos or on thin, 
> information-preserving wrappers.
> The pipeline fusion utilities already operate on protos directly and can be 
> used as an example of how this could be done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3971) Pipeline translation utilities should not use SDK construction classes

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3971?focusedWorklogId=118280=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118280
 ]

ASF GitHub Bot logged work on BEAM-3971:


Author: ASF GitHub Bot
Created on: 02/Jul/18 17:42
Start Date: 02/Jul/18 17:42
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5833: 
[BEAM-3971, BEAM-4284] Remove fromProto for Pipeline and PTransform translation.
URL: https://github.com/apache/beam/pull/5833#discussion_r199571699
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingTransformTranslators.java
 ##
 @@ -912,30 +922,81 @@ public void translateNode(
   TypeInformation>> outputTypeInfo =
   context.getTypeInfo(context.getOutput(transform));
 
-  TupleTag> mainTag = new TupleTag<>("main output");
-  WindowDoFnOperator doFnOperator =
-  new WindowDoFnOperator<>(
-  reduceFn,
-  fullName,
-  (Coder) windowedWorkItemCoder,
-  mainTag,
-  Collections.emptyList(),
-  new DoFnOperator.MultiOutputOutputManagerFactory<>(mainTag, 
outputCoder),
-  windowingStrategy,
-  new HashMap<>(), /* side-input mapping */
-  Collections.emptyList(), /* side inputs */
-  context.getPipelineOptions(),
-  inputKvCoder.getKeyCoder(),
-  keySelector);
+  List> sideInputs = ((Combine.PerKey) 
transform).getSideInputs();
 
-  // our operator excepts WindowedValue while our input 
stream
-  // is WindowedValue, which is fine but Java 
doesn't like it ...
-  @SuppressWarnings("unchecked")
-  SingleOutputStreamOperator>> outDataStream =
-  keyedWorkItemStream
-  .transform(fullName, outputTypeInfo, (OneInputStreamOperator) 
doFnOperator)
-  .uid(fullName);
-  context.setOutputDataStream(context.getOutput(transform), outDataStream);
+  if (sideInputs.isEmpty()) {
+TupleTag> mainTag = new TupleTag<>("main output");
+WindowDoFnOperator doFnOperator =
+new WindowDoFnOperator<>(
+reduceFn,
+fullName,
+(Coder) windowedWorkItemCoder,
+mainTag,
+Collections.emptyList(),
+new DoFnOperator.MultiOutputOutputManagerFactory<>(mainTag, 
outputCoder),
+windowingStrategy,
+new HashMap<>(), /* side-input mapping */
+Collections.emptyList(), /* side inputs */
+context.getPipelineOptions(),
+inputKvCoder.getKeyCoder(),
+keySelector);
+
+// our operator excepts WindowedValue while our input 
stream
 
 Review comment:
   fixed here and elsewhere.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118280)
Time Spent: 6h  (was: 5h 50m)

> Pipeline translation utilities should not use SDK construction classes
> --
>
> Key: BEAM-3971
> URL: https://issues.apache.org/jira/browse/BEAM-3971
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Ben Sidhom
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> In general, portable runners will require access to pipeline information not 
> available in rehydrated pipelines while constructing physical plans. 
> Translation utilities should operate directly on protos or on thin, 
> information-preserving wrappers.
> The pipeline fusion utilities already operate on protos directly and can be 
> used as an example of how this could be done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3708) Implement the portable lifted Combiner transforms in Java SDK

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3708?focusedWorklogId=118320=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118320
 ]

ASF GitHub Bot logged work on BEAM-3708:


Author: ASF GitHub Bot
Created on: 02/Jul/18 18:06
Start Date: 02/Jul/18 18:06
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5795: 
[BEAM-3708] Adding grouping table to Precombine step.
URL: https://github.com/apache/beam/pull/5795#discussion_r199304718
 
 

 ##
 File path: 
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/PrecombineGroupingTable.java
 ##
 @@ -0,0 +1,568 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.fn.harness;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.io.ByteStreams;
+import com.google.common.io.CountingOutputStream;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.Map;
+import java.util.Random;
+import org.apache.beam.runners.core.GlobalCombineFnRunner;
+import org.apache.beam.runners.core.GlobalCombineFnRunners;
+import org.apache.beam.runners.core.NullSideInputReader;
+import org.apache.beam.runners.core.SideInputReader;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.transforms.Combine.CombineFn;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.apache.beam.sdk.util.common.ElementByteSizeObserver;
+import org.apache.beam.sdk.values.KV;
+import org.joda.time.Instant;
+
+/** Static utility methods that provide {@link GroupingTable} implementations. 
*/
+public class PrecombineGroupingTable
+implements GroupingTable {
+  /** Returns a {@link GroupingTable} that combines inputs into a accumulator. 
*/
+  public static  GroupingTable, InputT, 
AccumT> combining(
+  PipelineOptions options,
+  CombineFn combineFn,
+  Coder keyCoder,
+  Coder accumulatorCoder) {
+Combiner, InputT, AccumT, ?> valueCombiner =
+new ValueCombiner<>(
+GlobalCombineFnRunners.create(combineFn), 
NullSideInputReader.empty(), options);
+return new PrecombineGroupingTable<>(
+DEFAULT_MAX_GROUPING_TABLE_BYTES,
+new WindowingCoderGroupingKeyCreator<>(keyCoder),
+WindowedPairInfo.create(),
+valueCombiner,
+new CoderSizeEstimator<>(WindowedValue.getValueOnlyCoder(keyCoder)),
+new CoderSizeEstimator<>(accumulatorCoder));
+  }
+
+  /**
+   * Returns a {@link GroupingTable} that combines inputs into a accumulator 
with sampling {@link
+   * SizeEstimator SizeEstimators}.
+   */
+  public static 
+  GroupingTable, InputT, AccumT> combiningAndSampling(
+  PipelineOptions options,
+  CombineFn combineFn,
+  Coder keyCoder,
+  Coder accumulatorCoder,
+  double sizeEstimatorSampleRate) {
+Combiner, InputT, AccumT, ?> valueCombiner =
+new ValueCombiner<>(
+GlobalCombineFnRunners.create(combineFn), 
NullSideInputReader.empty(), options);
+return new PrecombineGroupingTable<>(
+DEFAULT_MAX_GROUPING_TABLE_BYTES,
+new WindowingCoderGroupingKeyCreator<>(keyCoder),
+WindowedPairInfo.create(),
+valueCombiner,
+new SamplingSizeEstimator<>(
+new 
CoderSizeEstimator<>(WindowedValue.getValueOnlyCoder(keyCoder)),
+sizeEstimatorSampleRate,
+1.0),
+new SamplingSizeEstimator<>(
+new CoderSizeEstimator<>(accumulatorCoder), 
sizeEstimatorSampleRate, 1.0));
+  }
+
+  /** Provides client-specific operations for grouping keys. */
+  public interface GroupingKeyCreator {
+Object createGroupingKey(K key) throws Exception;
+  }
+
+  /** Implements Precombine GroupingKeyCreator via Coder. */
+  public static class WindowingCoderGroupingKeyCreator
+  implements GroupingKeyCreator> {
+
+private static final Instant ignored = BoundedWindow.TIMESTAMP_MIN_VALUE;
+
+private final Coder coder;
+
+

[jira] [Work logged] (BEAM-3708) Implement the portable lifted Combiner transforms in Java SDK

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3708?focusedWorklogId=118310=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118310
 ]

ASF GitHub Bot logged work on BEAM-3708:


Author: ASF GitHub Bot
Created on: 02/Jul/18 18:06
Start Date: 02/Jul/18 18:06
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5795: 
[BEAM-3708] Adding grouping table to Precombine step.
URL: https://github.com/apache/beam/pull/5795#discussion_r199303366
 
 

 ##
 File path: 
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/CombineRunners.java
 ##
 @@ -51,17 +71,120 @@
 }
   }
 
-  static 
-  ThrowingFunction, KV> 
createPrecombineMapFunction(
-  String pTransformId, PTransform pTransform) throws IOException {
-CombinePayload combinePayload = 
CombinePayload.parseFrom(pTransform.getSpec().getPayload());
-CombineFn combineFn =
-(CombineFn)
-SerializableUtils.deserializeFromByteArray(
-
combinePayload.getCombineFn().getSpec().getPayload().toByteArray(), 
"CombineFn");
+  private static class PrecombineRunner {
+private PipelineOptions options;
+private CombineFn combineFn;
+private FnDataReceiver>> output;
+private Coder keyCoder;
+private GroupingTable, InputT, AccumT> groupingTable;
+private Coder accumCoder;
+
+PrecombineRunner(
+PipelineOptions options,
+CombineFn combineFn,
+FnDataReceiver>> output,
+Coder keyCoder,
+Coder accumCoder) {
+  this.options = options;
+  this.combineFn = combineFn;
+  this.output = output;
+  this.keyCoder = keyCoder;
+  this.accumCoder = accumCoder;
+}
+
+void startBundle() {
+  groupingTable =
+  PrecombineGroupingTable.combiningAndSampling(
+  options, combineFn, keyCoder, accumCoder, 0.001 
/*sizeEstimatorSampleRate*/);
+}
+
+void processElement(WindowedValue> elem) throws Exception 
{
+  groupingTable.put(
+  elem, (Object outputElem) -> output.accept((WindowedValue>) outputElem));
+}
+
+void finishBundle() throws Exception {
+  groupingTable.flush(
+  (Object outputElem) -> output.accept((WindowedValue>) outputElem));
+}
+  }
+
+  /** A factory for {@link PrecombineRunner}s. */
+  @VisibleForTesting
+  public static class PrecombineFactory
+  implements PTransformRunnerFactory> {
+
+@Override
+public PrecombineRunner createRunnerForPTransform(
+PipelineOptions pipelineOptions,
+BeamFnDataClient beamFnDataClient,
+BeamFnStateClient beamFnStateClient,
+String pTransformId,
+PTransform pTransform,
+Supplier processBundleInstructionId,
+Map pCollections,
+Map coders,
+Map windowingStrategies,
+Multimap>> 
pCollectionIdsToConsumers,
+Consumer addStartFunction,
+Consumer addFinishFunction,
+BundleSplitListener splitListener)
+throws IOException {
+  // Get objects needed to create the runner.
+  RehydratedComponents rehydratedComponents =
+  RehydratedComponents.forComponents(
+  RunnerApi.Components.newBuilder()
+  .putAllCoders(coders)
+  .putAllWindowingStrategies(windowingStrategies)
+  .build());
+  String mainInputTag = 
Iterables.getOnlyElement(pTransform.getInputsMap().keySet());
+  RunnerApi.PCollection mainInput = 
pCollections.get(pTransform.getInputsOrThrow(mainInputTag));
+
+  // Input coder may sometimes be WindowedValueCoder depending on runner, 
instead of the
+  // expected KvCoder.
+  Coder uncastInputCoder = 
rehydratedComponents.getCoder(mainInput.getCoderId());
+  KvCoder inputCoder;
 
 Review comment:
   You don't use the `inputCoder` anywhere except to get the key coder.
   
   Consider dropping the local variable `inputCoder` and setting `keyCoder` 
directly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118310)
Time Spent: 4h 20m  (was: 4h 10m)

> Implement the portable lifted Combiner transforms in Java SDK
> -
>
> Key: BEAM-3708
> URL: https://issues.apache.org/jira/browse/BEAM-3708
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>  

[jira] [Work logged] (BEAM-3708) Implement the portable lifted Combiner transforms in Java SDK

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3708?focusedWorklogId=118321=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118321
 ]

ASF GitHub Bot logged work on BEAM-3708:


Author: ASF GitHub Bot
Created on: 02/Jul/18 18:06
Start Date: 02/Jul/18 18:06
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5795: 
[BEAM-3708] Adding grouping table to Precombine step.
URL: https://github.com/apache/beam/pull/5795#discussion_r199305044
 
 

 ##
 File path: 
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/PrecombineGroupingTable.java
 ##
 @@ -0,0 +1,568 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.fn.harness;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.io.ByteStreams;
+import com.google.common.io.CountingOutputStream;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.Map;
+import java.util.Random;
+import org.apache.beam.runners.core.GlobalCombineFnRunner;
+import org.apache.beam.runners.core.GlobalCombineFnRunners;
+import org.apache.beam.runners.core.NullSideInputReader;
+import org.apache.beam.runners.core.SideInputReader;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.transforms.Combine.CombineFn;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.apache.beam.sdk.util.common.ElementByteSizeObserver;
+import org.apache.beam.sdk.values.KV;
+import org.joda.time.Instant;
+
+/** Static utility methods that provide {@link GroupingTable} implementations. 
*/
+public class PrecombineGroupingTable
+implements GroupingTable {
+  /** Returns a {@link GroupingTable} that combines inputs into a accumulator. 
*/
+  public static  GroupingTable, InputT, 
AccumT> combining(
+  PipelineOptions options,
+  CombineFn combineFn,
+  Coder keyCoder,
+  Coder accumulatorCoder) {
+Combiner, InputT, AccumT, ?> valueCombiner =
+new ValueCombiner<>(
+GlobalCombineFnRunners.create(combineFn), 
NullSideInputReader.empty(), options);
+return new PrecombineGroupingTable<>(
+DEFAULT_MAX_GROUPING_TABLE_BYTES,
+new WindowingCoderGroupingKeyCreator<>(keyCoder),
+WindowedPairInfo.create(),
+valueCombiner,
+new CoderSizeEstimator<>(WindowedValue.getValueOnlyCoder(keyCoder)),
+new CoderSizeEstimator<>(accumulatorCoder));
+  }
+
+  /**
+   * Returns a {@link GroupingTable} that combines inputs into a accumulator 
with sampling {@link
+   * SizeEstimator SizeEstimators}.
+   */
+  public static 
+  GroupingTable, InputT, AccumT> combiningAndSampling(
+  PipelineOptions options,
+  CombineFn combineFn,
+  Coder keyCoder,
+  Coder accumulatorCoder,
+  double sizeEstimatorSampleRate) {
+Combiner, InputT, AccumT, ?> valueCombiner =
+new ValueCombiner<>(
+GlobalCombineFnRunners.create(combineFn), 
NullSideInputReader.empty(), options);
+return new PrecombineGroupingTable<>(
+DEFAULT_MAX_GROUPING_TABLE_BYTES,
+new WindowingCoderGroupingKeyCreator<>(keyCoder),
+WindowedPairInfo.create(),
+valueCombiner,
+new SamplingSizeEstimator<>(
+new 
CoderSizeEstimator<>(WindowedValue.getValueOnlyCoder(keyCoder)),
+sizeEstimatorSampleRate,
+1.0),
+new SamplingSizeEstimator<>(
+new CoderSizeEstimator<>(accumulatorCoder), 
sizeEstimatorSampleRate, 1.0));
+  }
+
+  /** Provides client-specific operations for grouping keys. */
+  public interface GroupingKeyCreator {
+Object createGroupingKey(K key) throws Exception;
+  }
+
+  /** Implements Precombine GroupingKeyCreator via Coder. */
+  public static class WindowingCoderGroupingKeyCreator
+  implements GroupingKeyCreator> {
+
+private static final Instant ignored = BoundedWindow.TIMESTAMP_MIN_VALUE;
+
+private final Coder coder;
+
+

[jira] [Work logged] (BEAM-3708) Implement the portable lifted Combiner transforms in Java SDK

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3708?focusedWorklogId=118323=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118323
 ]

ASF GitHub Bot logged work on BEAM-3708:


Author: ASF GitHub Bot
Created on: 02/Jul/18 18:06
Start Date: 02/Jul/18 18:06
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #5795: [BEAM-3708] Adding 
grouping table to Precombine step.
URL: https://github.com/apache/beam/pull/5795#issuecomment-401887478
 
 
   Run Java PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118323)
Time Spent: 5h 40m  (was: 5.5h)

> Implement the portable lifted Combiner transforms in Java SDK
> -
>
> Key: BEAM-3708
> URL: https://issues.apache.org/jira/browse/BEAM-3708
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>  Labels: portability
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Lifted combines are split into separate parts with different URNs. These 
> parts need to be implemented in the Java SDK harness so that the SDK can 
> actually execute them when receiving Combine transforms with the 
> corresponding URNs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3708) Implement the portable lifted Combiner transforms in Java SDK

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3708?focusedWorklogId=118319=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118319
 ]

ASF GitHub Bot logged work on BEAM-3708:


Author: ASF GitHub Bot
Created on: 02/Jul/18 18:06
Start Date: 02/Jul/18 18:06
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5795: 
[BEAM-3708] Adding grouping table to Precombine step.
URL: https://github.com/apache/beam/pull/5795#discussion_r199576986
 
 

 ##
 File path: 
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/PrecombineGroupingTable.java
 ##
 @@ -0,0 +1,568 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.fn.harness;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.io.ByteStreams;
+import com.google.common.io.CountingOutputStream;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.Map;
+import java.util.Random;
+import org.apache.beam.runners.core.GlobalCombineFnRunner;
+import org.apache.beam.runners.core.GlobalCombineFnRunners;
+import org.apache.beam.runners.core.NullSideInputReader;
+import org.apache.beam.runners.core.SideInputReader;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.transforms.Combine.CombineFn;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.apache.beam.sdk.util.common.ElementByteSizeObserver;
+import org.apache.beam.sdk.values.KV;
+import org.joda.time.Instant;
+
+/** Static utility methods that provide {@link GroupingTable} implementations. 
*/
+public class PrecombineGroupingTable
+implements GroupingTable {
+  /** Returns a {@link GroupingTable} that combines inputs into a accumulator. 
*/
+  public static  GroupingTable, InputT, 
AccumT> combining(
+  PipelineOptions options,
+  CombineFn combineFn,
+  Coder keyCoder,
+  Coder accumulatorCoder) {
+Combiner, InputT, AccumT, ?> valueCombiner =
+new ValueCombiner<>(
+GlobalCombineFnRunners.create(combineFn), 
NullSideInputReader.empty(), options);
+return new PrecombineGroupingTable<>(
+DEFAULT_MAX_GROUPING_TABLE_BYTES,
+new WindowingCoderGroupingKeyCreator<>(keyCoder),
+WindowedPairInfo.create(),
+valueCombiner,
+new CoderSizeEstimator<>(WindowedValue.getValueOnlyCoder(keyCoder)),
+new CoderSizeEstimator<>(accumulatorCoder));
+  }
+
+  /**
+   * Returns a {@link GroupingTable} that combines inputs into a accumulator 
with sampling {@link
+   * SizeEstimator SizeEstimators}.
+   */
+  public static 
+  GroupingTable, InputT, AccumT> combiningAndSampling(
+  PipelineOptions options,
+  CombineFn combineFn,
+  Coder keyCoder,
+  Coder accumulatorCoder,
+  double sizeEstimatorSampleRate) {
+Combiner, InputT, AccumT, ?> valueCombiner =
+new ValueCombiner<>(
+GlobalCombineFnRunners.create(combineFn), 
NullSideInputReader.empty(), options);
+return new PrecombineGroupingTable<>(
+DEFAULT_MAX_GROUPING_TABLE_BYTES,
+new WindowingCoderGroupingKeyCreator<>(keyCoder),
+WindowedPairInfo.create(),
+valueCombiner,
+new SamplingSizeEstimator<>(
+new 
CoderSizeEstimator<>(WindowedValue.getValueOnlyCoder(keyCoder)),
+sizeEstimatorSampleRate,
+1.0),
+new SamplingSizeEstimator<>(
+new CoderSizeEstimator<>(accumulatorCoder), 
sizeEstimatorSampleRate, 1.0));
+  }
+
+  /** Provides client-specific operations for grouping keys. */
+  public interface GroupingKeyCreator {
+Object createGroupingKey(K key) throws Exception;
+  }
+
+  /** Implements Precombine GroupingKeyCreator via Coder. */
+  public static class WindowingCoderGroupingKeyCreator
+  implements GroupingKeyCreator> {
+
+private static final Instant ignored = BoundedWindow.TIMESTAMP_MIN_VALUE;
+
+private final Coder coder;
+
+

[jira] [Work logged] (BEAM-3708) Implement the portable lifted Combiner transforms in Java SDK

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3708?focusedWorklogId=118313=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118313
 ]

ASF GitHub Bot logged work on BEAM-3708:


Author: ASF GitHub Bot
Created on: 02/Jul/18 18:06
Start Date: 02/Jul/18 18:06
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5795: 
[BEAM-3708] Adding grouping table to Precombine step.
URL: https://github.com/apache/beam/pull/5795#discussion_r199303580
 
 

 ##
 File path: 
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/CombineRunners.java
 ##
 @@ -51,17 +71,120 @@
 }
   }
 
-  static 
-  ThrowingFunction, KV> 
createPrecombineMapFunction(
-  String pTransformId, PTransform pTransform) throws IOException {
-CombinePayload combinePayload = 
CombinePayload.parseFrom(pTransform.getSpec().getPayload());
-CombineFn combineFn =
-(CombineFn)
-SerializableUtils.deserializeFromByteArray(
-
combinePayload.getCombineFn().getSpec().getPayload().toByteArray(), 
"CombineFn");
+  private static class PrecombineRunner {
+private PipelineOptions options;
+private CombineFn combineFn;
+private FnDataReceiver>> output;
+private Coder keyCoder;
+private GroupingTable, InputT, AccumT> groupingTable;
+private Coder accumCoder;
+
+PrecombineRunner(
+PipelineOptions options,
+CombineFn combineFn,
+FnDataReceiver>> output,
+Coder keyCoder,
+Coder accumCoder) {
+  this.options = options;
+  this.combineFn = combineFn;
+  this.output = output;
+  this.keyCoder = keyCoder;
+  this.accumCoder = accumCoder;
+}
+
+void startBundle() {
+  groupingTable =
+  PrecombineGroupingTable.combiningAndSampling(
+  options, combineFn, keyCoder, accumCoder, 0.001 
/*sizeEstimatorSampleRate*/);
+}
+
+void processElement(WindowedValue> elem) throws Exception 
{
+  groupingTable.put(
+  elem, (Object outputElem) -> output.accept((WindowedValue>) outputElem));
+}
+
+void finishBundle() throws Exception {
+  groupingTable.flush(
+  (Object outputElem) -> output.accept((WindowedValue>) outputElem));
 
 Review comment:
   ditto here, if you use a cast, you should be able to pass this in as a 
method reference instead of using a lambda


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118313)
Time Spent: 4h 50m  (was: 4h 40m)

> Implement the portable lifted Combiner transforms in Java SDK
> -
>
> Key: BEAM-3708
> URL: https://issues.apache.org/jira/browse/BEAM-3708
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>  Labels: portability
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Lifted combines are split into separate parts with different URNs. These 
> parts need to be implemented in the Java SDK harness so that the SDK can 
> actually execute them when receiving Combine transforms with the 
> corresponding URNs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3708) Implement the portable lifted Combiner transforms in Java SDK

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3708?focusedWorklogId=118314=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118314
 ]

ASF GitHub Bot logged work on BEAM-3708:


Author: ASF GitHub Bot
Created on: 02/Jul/18 18:06
Start Date: 02/Jul/18 18:06
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5795: 
[BEAM-3708] Adding grouping table to Precombine step.
URL: https://github.com/apache/beam/pull/5795#discussion_r199303721
 
 

 ##
 File path: 
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/GroupingTable.java
 ##
 @@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.fn.harness;
+
+/** An interface that groups inputs to an accumulator and flushes the output. 
*/
+public interface GroupingTable {
+
+  /** Abstract interface of things that accept inputs one at a time via 
process(). */
+  interface Receiver {
+/** Processes the element. */
+void process(Object outputElem) throws Exception;
+  }
+
+  /** Adds a pair to this table, possibly flushing some entries to output if 
the table is full. */
 
 Review comment:
   `pair` -> `keyed value`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118314)
Time Spent: 5h  (was: 4h 50m)

> Implement the portable lifted Combiner transforms in Java SDK
> -
>
> Key: BEAM-3708
> URL: https://issues.apache.org/jira/browse/BEAM-3708
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>  Labels: portability
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Lifted combines are split into separate parts with different URNs. These 
> parts need to be implemented in the Java SDK harness so that the SDK can 
> actually execute them when receiving Combine transforms with the 
> corresponding URNs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4710) Read complex types of data by SQL

2018-07-02 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-4710:
---
Description: 
Support reading complex types of data by SQL. Typical complex types include 
nested ROW, nested ARRAY, etc. 

 

Complex types might be different for different data sources.

  was:Support reading complex types of data by SQL. Typical complex types 
include nested ROW, nested ARRAY, etc. 


> Read complex types of data by SQL
> -
>
> Key: BEAM-4710
> URL: https://issues.apache.org/jira/browse/BEAM-4710
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>
> Support reading complex types of data by SQL. Typical complex types include 
> nested ROW, nested ARRAY, etc. 
>  
> Complex types might be different for different data sources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4601) BigQuery reads basic types from pure SQL

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4601?focusedWorklogId=118322=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118322
 ]

ASF GitHub Bot logged work on BEAM-4601:


Author: ASF GitHub Bot
Created on: 02/Jul/18 18:06
Start Date: 02/Jul/18 18:06
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #5830: [BEAM-4601][SQL] 
Support BigQuery read from SQL.
URL: https://github.com/apache/beam/pull/5830#issuecomment-401887456
 
 
   Yep, a JIRA is created to track it: 
https://issues.apache.org/jira/browse/BEAM-4710


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118322)
Time Spent: 6h 20m  (was: 6h 10m)

> BigQuery reads basic types from pure SQL
> 
>
> Key: BEAM-4601
> URL: https://issues.apache.org/jira/browse/BEAM-4601
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Right now Beam SQL can created a BigQuery table, however, read from BigQuery 
> table is not supported yet. We want to support reading basic data types from 
> BigQuery table (not including ROW type).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3708) Implement the portable lifted Combiner transforms in Java SDK

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3708?focusedWorklogId=118312=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118312
 ]

ASF GitHub Bot logged work on BEAM-3708:


Author: ASF GitHub Bot
Created on: 02/Jul/18 18:06
Start Date: 02/Jul/18 18:06
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5795: 
[BEAM-3708] Adding grouping table to Precombine step.
URL: https://github.com/apache/beam/pull/5795#discussion_r199303656
 
 

 ##
 File path: 
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/GroupingTable.java
 ##
 @@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.fn.harness;
+
+/** An interface that groups inputs to an accumulator and flushes the output. 
*/
+public interface GroupingTable {
+
+  /** Abstract interface of things that accept inputs one at a time via 
process(). */
+  interface Receiver {
 
 Review comment:
   I don't think we'll need to make this generic in this sense. Consider using 
`FnDataReceiver` directly here instead of `Receiver`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118312)
Time Spent: 4h 40m  (was: 4.5h)

> Implement the portable lifted Combiner transforms in Java SDK
> -
>
> Key: BEAM-3708
> URL: https://issues.apache.org/jira/browse/BEAM-3708
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>  Labels: portability
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Lifted combines are split into separate parts with different URNs. These 
> parts need to be implemented in the Java SDK harness so that the SDK can 
> actually execute them when receiving Combine transforms with the 
> corresponding URNs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3708) Implement the portable lifted Combiner transforms in Java SDK

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3708?focusedWorklogId=118317=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118317
 ]

ASF GitHub Bot logged work on BEAM-3708:


Author: ASF GitHub Bot
Created on: 02/Jul/18 18:06
Start Date: 02/Jul/18 18:06
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5795: 
[BEAM-3708] Adding grouping table to Precombine step.
URL: https://github.com/apache/beam/pull/5795#discussion_r199305122
 
 

 ##
 File path: 
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/PrecombineGroupingTable.java
 ##
 @@ -0,0 +1,568 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.fn.harness;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.io.ByteStreams;
+import com.google.common.io.CountingOutputStream;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.Map;
+import java.util.Random;
+import org.apache.beam.runners.core.GlobalCombineFnRunner;
+import org.apache.beam.runners.core.GlobalCombineFnRunners;
+import org.apache.beam.runners.core.NullSideInputReader;
+import org.apache.beam.runners.core.SideInputReader;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.transforms.Combine.CombineFn;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.apache.beam.sdk.util.common.ElementByteSizeObserver;
+import org.apache.beam.sdk.values.KV;
+import org.joda.time.Instant;
+
+/** Static utility methods that provide {@link GroupingTable} implementations. 
*/
+public class PrecombineGroupingTable
+implements GroupingTable {
+  /** Returns a {@link GroupingTable} that combines inputs into a accumulator. 
*/
+  public static  GroupingTable, InputT, 
AccumT> combining(
+  PipelineOptions options,
+  CombineFn combineFn,
+  Coder keyCoder,
+  Coder accumulatorCoder) {
+Combiner, InputT, AccumT, ?> valueCombiner =
+new ValueCombiner<>(
+GlobalCombineFnRunners.create(combineFn), 
NullSideInputReader.empty(), options);
+return new PrecombineGroupingTable<>(
+DEFAULT_MAX_GROUPING_TABLE_BYTES,
+new WindowingCoderGroupingKeyCreator<>(keyCoder),
+WindowedPairInfo.create(),
+valueCombiner,
+new CoderSizeEstimator<>(WindowedValue.getValueOnlyCoder(keyCoder)),
+new CoderSizeEstimator<>(accumulatorCoder));
+  }
+
+  /**
+   * Returns a {@link GroupingTable} that combines inputs into a accumulator 
with sampling {@link
+   * SizeEstimator SizeEstimators}.
+   */
+  public static 
+  GroupingTable, InputT, AccumT> combiningAndSampling(
+  PipelineOptions options,
+  CombineFn combineFn,
+  Coder keyCoder,
+  Coder accumulatorCoder,
+  double sizeEstimatorSampleRate) {
+Combiner, InputT, AccumT, ?> valueCombiner =
+new ValueCombiner<>(
+GlobalCombineFnRunners.create(combineFn), 
NullSideInputReader.empty(), options);
+return new PrecombineGroupingTable<>(
+DEFAULT_MAX_GROUPING_TABLE_BYTES,
+new WindowingCoderGroupingKeyCreator<>(keyCoder),
+WindowedPairInfo.create(),
+valueCombiner,
+new SamplingSizeEstimator<>(
+new 
CoderSizeEstimator<>(WindowedValue.getValueOnlyCoder(keyCoder)),
+sizeEstimatorSampleRate,
+1.0),
+new SamplingSizeEstimator<>(
+new CoderSizeEstimator<>(accumulatorCoder), 
sizeEstimatorSampleRate, 1.0));
+  }
+
+  /** Provides client-specific operations for grouping keys. */
+  public interface GroupingKeyCreator {
+Object createGroupingKey(K key) throws Exception;
+  }
+
+  /** Implements Precombine GroupingKeyCreator via Coder. */
+  public static class WindowingCoderGroupingKeyCreator
+  implements GroupingKeyCreator> {
+
+private static final Instant ignored = BoundedWindow.TIMESTAMP_MIN_VALUE;
+
+private final Coder coder;
+
+

[jira] [Work logged] (BEAM-3708) Implement the portable lifted Combiner transforms in Java SDK

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3708?focusedWorklogId=118315=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118315
 ]

ASF GitHub Bot logged work on BEAM-3708:


Author: ASF GitHub Bot
Created on: 02/Jul/18 18:06
Start Date: 02/Jul/18 18:06
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5795: 
[BEAM-3708] Adding grouping table to Precombine step.
URL: https://github.com/apache/beam/pull/5795#discussion_r199303702
 
 

 ##
 File path: 
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/GroupingTable.java
 ##
 @@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.fn.harness;
+
+/** An interface that groups inputs to an accumulator and flushes the output. 
*/
+public interface GroupingTable {
+
+  /** Abstract interface of things that accept inputs one at a time via 
process(). */
+  interface Receiver {
+/** Processes the element. */
+void process(Object outputElem) throws Exception;
+  }
+
+  /** Adds a pair to this table, possibly flushing some entries to output if 
the table is full. */
+  void put(Object pair, Receiver receiver) throws Exception;
 
 Review comment:
   You can use `KV`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118315)

> Implement the portable lifted Combiner transforms in Java SDK
> -
>
> Key: BEAM-3708
> URL: https://issues.apache.org/jira/browse/BEAM-3708
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>  Labels: portability
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Lifted combines are split into separate parts with different URNs. These 
> parts need to be implemented in the Java SDK harness so that the SDK can 
> actually execute them when receiving Combine transforms with the 
> corresponding URNs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3708) Implement the portable lifted Combiner transforms in Java SDK

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3708?focusedWorklogId=118318=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118318
 ]

ASF GitHub Bot logged work on BEAM-3708:


Author: ASF GitHub Bot
Created on: 02/Jul/18 18:06
Start Date: 02/Jul/18 18:06
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5795: 
[BEAM-3708] Adding grouping table to Precombine step.
URL: https://github.com/apache/beam/pull/5795#discussion_r199577433
 
 

 ##
 File path: 
sdks/java/harness/src/test/java/org/apache/beam/fn/harness/CombineRunnersTest.java
 ##
 @@ -93,6 +93,7 @@ public Integer extractOutput(Integer accum) {
   private RunnerApi.PTransform pTransform;
   private String inputPCollectionId;
   private String outputPCollectionId;
+  private RunnerApi.Pipeline pProto;
 
 Review comment:
   nit: `pProto` -> `pipeline` or `pipelineProto`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118318)
Time Spent: 5h 20m  (was: 5h 10m)

> Implement the portable lifted Combiner transforms in Java SDK
> -
>
> Key: BEAM-3708
> URL: https://issues.apache.org/jira/browse/BEAM-3708
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>  Labels: portability
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Lifted combines are split into separate parts with different URNs. These 
> parts need to be implemented in the Java SDK harness so that the SDK can 
> actually execute them when receiving Combine transforms with the 
> corresponding URNs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3708) Implement the portable lifted Combiner transforms in Java SDK

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3708?focusedWorklogId=118316=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118316
 ]

ASF GitHub Bot logged work on BEAM-3708:


Author: ASF GitHub Bot
Created on: 02/Jul/18 18:06
Start Date: 02/Jul/18 18:06
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5795: 
[BEAM-3708] Adding grouping table to Precombine step.
URL: https://github.com/apache/beam/pull/5795#discussion_r199303794
 
 

 ##
 File path: 
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/PrecombineGroupingTable.java
 ##
 @@ -0,0 +1,568 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.fn.harness;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.io.ByteStreams;
+import com.google.common.io.CountingOutputStream;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.Map;
+import java.util.Random;
+import org.apache.beam.runners.core.GlobalCombineFnRunner;
+import org.apache.beam.runners.core.GlobalCombineFnRunners;
+import org.apache.beam.runners.core.NullSideInputReader;
+import org.apache.beam.runners.core.SideInputReader;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.transforms.Combine.CombineFn;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.apache.beam.sdk.util.common.ElementByteSizeObserver;
+import org.apache.beam.sdk.values.KV;
+import org.joda.time.Instant;
+
+/** Static utility methods that provide {@link GroupingTable} implementations. 
*/
+public class PrecombineGroupingTable
 
 Review comment:
   Consider calling this class `GroupingTables`.
   
   I don't know how much re-use we will get on this class, you can make it an 
internal detail of `CombineRunners` instead of exposing it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118316)
Time Spent: 5h 10m  (was: 5h)

> Implement the portable lifted Combiner transforms in Java SDK
> -
>
> Key: BEAM-3708
> URL: https://issues.apache.org/jira/browse/BEAM-3708
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>  Labels: portability
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Lifted combines are split into separate parts with different URNs. These 
> parts need to be implemented in the Java SDK harness so that the SDK can 
> actually execute them when receiving Combine transforms with the 
> corresponding URNs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3708) Implement the portable lifted Combiner transforms in Java SDK

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3708?focusedWorklogId=118311=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118311
 ]

ASF GitHub Bot logged work on BEAM-3708:


Author: ASF GitHub Bot
Created on: 02/Jul/18 18:06
Start Date: 02/Jul/18 18:06
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5795: 
[BEAM-3708] Adding grouping table to Precombine step.
URL: https://github.com/apache/beam/pull/5795#discussion_r199303571
 
 

 ##
 File path: 
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/CombineRunners.java
 ##
 @@ -51,17 +71,120 @@
 }
   }
 
-  static 
-  ThrowingFunction, KV> 
createPrecombineMapFunction(
-  String pTransformId, PTransform pTransform) throws IOException {
-CombinePayload combinePayload = 
CombinePayload.parseFrom(pTransform.getSpec().getPayload());
-CombineFn combineFn =
-(CombineFn)
-SerializableUtils.deserializeFromByteArray(
-
combinePayload.getCombineFn().getSpec().getPayload().toByteArray(), 
"CombineFn");
+  private static class PrecombineRunner {
+private PipelineOptions options;
+private CombineFn combineFn;
+private FnDataReceiver>> output;
+private Coder keyCoder;
+private GroupingTable, InputT, AccumT> groupingTable;
+private Coder accumCoder;
+
+PrecombineRunner(
+PipelineOptions options,
+CombineFn combineFn,
+FnDataReceiver>> output,
+Coder keyCoder,
+Coder accumCoder) {
+  this.options = options;
+  this.combineFn = combineFn;
+  this.output = output;
+  this.keyCoder = keyCoder;
+  this.accumCoder = accumCoder;
+}
+
+void startBundle() {
+  groupingTable =
+  PrecombineGroupingTable.combiningAndSampling(
+  options, combineFn, keyCoder, accumCoder, 0.001 
/*sizeEstimatorSampleRate*/);
+}
+
+void processElement(WindowedValue> elem) throws Exception 
{
+  groupingTable.put(
+  elem, (Object outputElem) -> output.accept((WindowedValue>) outputElem));
 
 Review comment:
   if you use a cast, you should be able to pass this in as a method reference 
instead of using a lambda


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118311)
Time Spent: 4.5h  (was: 4h 20m)

> Implement the portable lifted Combiner transforms in Java SDK
> -
>
> Key: BEAM-3708
> URL: https://issues.apache.org/jira/browse/BEAM-3708
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>  Labels: portability
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Lifted combines are split into separate parts with different URNs. These 
> parts need to be implemented in the Java SDK harness so that the SDK can 
> actually execute them when receiving Combine transforms with the 
> corresponding URNs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4711) LocalFileSystem.delete doesn't support globbing

2018-07-02 Thread Ryan Williams (JIRA)
Ryan Williams created BEAM-4711:
---

 Summary: LocalFileSystem.delete doesn't support globbing
 Key: BEAM-4711
 URL: https://issues.apache.org/jira/browse/BEAM-4711
 Project: Beam
  Issue Type: Task
  Components: sdk-py-core
Affects Versions: 2.5.0
Reporter: Ryan Williams
Assignee: Ryan Williams


I attempted to run {{wordcount_it_test:WordCountIT.test_wordcount_it}} locally 
with {{DirectRunner}}:

{code}
python setup.py nosetests \
  --tests apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it \
  --test-pipeline-options="--output=foo"
{code}

It failed in [the {{delete_files}} cleanup 
command|https://github.com/apache/beam/blob/a58f1ffaafb0e2ebcc73a1c5abfb05a15ec6a84b/sdks/python/apache_beam/examples/wordcount_it_test.py#L64]:

{code}
root: WARNING: Retry with exponential backoff: waiting for 11.1454450937 
seconds before retrying delete_files because we caught exception: BeamIOError: 
Delete operation failed with exceptions {'foo/1530557644/results*': 
IOError(OSError(2, 'No such file or directory'),)}
 Traceback for above exception (most recent call last):
  File "/Users/ryan/c/beam/sdks/python/apache_beam/utils/retry.py", line 184, 
in wrapper
return fun(*args, **kwargs)
  File "/Users/ryan/c/beam/sdks/python/apache_beam/testing/test_utils.py", line 
136, in delete_files
FileSystems.delete(file_paths)
  File "/Users/ryan/c/beam/sdks/python/apache_beam/io/filesystems.py", line 
282, in delete
return filesystem.delete(paths)
  File "/Users/ryan/c/beam/sdks/python/apache_beam/io/localfilesystem.py", line 
304, in delete
raise BeamIOError("Delete operation failed", exceptions)
{code}

The line:

{code}
self.addCleanup(delete_files, [output + '*'])
{code}

works as expected in GCS, and deletes a test's output-directory, but it fails 
in on the local-filesystem, which doesn't expand globs before attempting to 
delete paths.

It would be good to make these consistent, presumably by adding glob-support to 
{{LocalFileSystem}}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4716) Remove findbugs declarations in build.gradle files since it is now globally a compileOnly and testCompileOnly dependency

2018-07-02 Thread Luke Cwik (JIRA)
Luke Cwik created BEAM-4716:
---

 Summary: Remove findbugs declarations in build.gradle files since 
it is now globally a compileOnly and testCompileOnly dependency
 Key: BEAM-4716
 URL: https://issues.apache.org/jira/browse/BEAM-4716
 Project: Beam
  Issue Type: Sub-task
  Components: build-system
Reporter: Luke Cwik
Assignee: Luke Cwik






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4547) implement sum0 aggregation function

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4547?focusedWorklogId=118383=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118383
 ]

ASF GitHub Bot logged work on BEAM-4547:


Author: ASF GitHub Bot
Created on: 02/Jul/18 20:51
Start Date: 02/Jul/18 20:51
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #5864: [BEAM-4547] 
Instantiate $SUM0 as a SUM operator
URL: https://github.com/apache/beam/pull/5864#issuecomment-401930961
 
 
   JMS


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118383)
Time Spent: 40m  (was: 0.5h)

> implement sum0 aggregation function
> ---
>
> Key: BEAM-4547
> URL: https://issues.apache.org/jira/browse/BEAM-4547
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kai Jiang
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4547) implement sum0 aggregation function

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4547?focusedWorklogId=118384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118384
 ]

ASF GitHub Bot logged work on BEAM-4547:


Author: ASF GitHub Bot
Created on: 02/Jul/18 20:51
Start Date: 02/Jul/18 20:51
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #5864: [BEAM-4547] 
Instantiate $SUM0 as a SUM operator
URL: https://github.com/apache/beam/pull/5864#issuecomment-401930979
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118384)
Time Spent: 50m  (was: 40m)

> implement sum0 aggregation function
> ---
>
> Key: BEAM-4547
> URL: https://issues.apache.org/jira/browse/BEAM-4547
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kai Jiang
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4714) Some DATETIME PLUS operators end up as ordinary PLUS and crash in accept()

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4714?focusedWorklogId=118418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118418
 ]

ASF GitHub Bot logged work on BEAM-4714:


Author: ASF GitHub Bot
Created on: 02/Jul/18 21:37
Start Date: 02/Jul/18 21:37
Worklog Time Spent: 10m 
  Work Description: kennknowles closed pull request #5865: [BEAM-4714] 
Instantiate "+" as DATETIME_PLUS
URL: https://github.com/apache/beam/pull/5865
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/BeamSqlFnExecutor.java
 
b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/BeamSqlFnExecutor.java
index 4a9e96a89b8..de667f0a1e7 100644
--- 
a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/BeamSqlFnExecutor.java
+++ 
b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/BeamSqlFnExecutor.java
@@ -270,7 +270,11 @@ static BeamSqlExpression buildExpression(RexNode rexNode) {
 
   // arithmetic operators
 case "+":
-  ret = new BeamSqlPlusExpression(subExps);
+  if (SqlTypeName.NUMERIC_TYPES.contains(node.type.getSqlTypeName())) {
+ret = new BeamSqlPlusExpression(subExps);
+  } else {
+ret = new BeamSqlDatetimePlusExpression(subExps);
+  }
   break;
 case "-":
   if (SqlTypeName.NUMERIC_TYPES.contains(node.type.getSqlTypeName())) {


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118418)
Time Spent: 1h  (was: 50m)

> Some DATETIME PLUS operators end up as ordinary PLUS and crash in accept()
> --
>
> Key: BEAM-4714
> URL: https://issues.apache.org/jira/browse/BEAM-4714
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In this case it was in HOP_END (but not HOP_START, interestingly enough). 
> Unclear why this manifests, but the "+" operator should just work for 
> datetime + interval for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] 01/01: Merge pull request #5864: [BEAM-4547] Instantiate $SUM0 as a SUM operator

2018-07-02 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit b6d4eebb8a87c2b9e095f9d3b9b4f7b61a4749ef
Merge: b519de4 63b7ba0
Author: Kenn Knowles 
AuthorDate: Mon Jul 2 14:37:45 2018 -0700

Merge pull request #5864: [BEAM-4547] Instantiate $SUM0 as a SUM operator

 .../sdk/extensions/sql/impl/transform/BeamAggregationTransforms.java | 1 +
 1 file changed, 1 insertion(+)



[jira] [Work logged] (BEAM-4547) implement sum0 aggregation function

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4547?focusedWorklogId=118419=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118419
 ]

ASF GitHub Bot logged work on BEAM-4547:


Author: ASF GitHub Bot
Created on: 02/Jul/18 21:37
Start Date: 02/Jul/18 21:37
Worklog Time Spent: 10m 
  Work Description: kennknowles closed pull request #5864: [BEAM-4547] 
Instantiate $SUM0 as a SUM operator
URL: https://github.com/apache/beam/pull/5864
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamAggregationTransforms.java
 
b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamAggregationTransforms.java
index 532da615449..2d257ecb03b 100644
--- 
a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamAggregationTransforms.java
+++ 
b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamAggregationTransforms.java
@@ -185,6 +185,7 @@ public AggregationAdaptor(
 
aggregators.add(BeamBuiltinAggregations.createMin(call.type.getSqlTypeName()));
 break;
   case "SUM":
+  case "$SUM0":
 
aggregators.add(BeamBuiltinAggregations.createSum(call.type.getSqlTypeName()));
 break;
   case "AVG":


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118419)
Time Spent: 1h  (was: 50m)

> implement sum0 aggregation function
> ---
>
> Key: BEAM-4547
> URL: https://issues.apache.org/jira/browse/BEAM-4547
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kai Jiang
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (b519de4 -> b6d4eeb)

2018-07-02 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from b519de4  Merge pull request #5865: [BEAM-4714] Instantiate "+" as 
DATETIME_PLUS
 add 63b7ba0  Instantiate $SUM0 as a SUM operator
 new b6d4eeb  Merge pull request #5864: [BEAM-4547] Instantiate $SUM0 as a 
SUM operator

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../sdk/extensions/sql/impl/transform/BeamAggregationTransforms.java | 1 +
 1 file changed, 1 insertion(+)



[beam] branch master updated (e963dbb -> b519de4)

2018-07-02 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from e963dbb  Merge pull request #5836: Makes FileIO.match watermark 
advance even without new files
 add 43ae71c  Instantiate "+" as DATETIME_PLUS
 new b519de4  Merge pull request #5865: [BEAM-4714] Instantiate "+" as 
DATETIME_PLUS

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../beam/sdk/extensions/sql/impl/interpreter/BeamSqlFnExecutor.java | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)



[beam] 01/01: Merge pull request #5865: [BEAM-4714] Instantiate "+" as DATETIME_PLUS

2018-07-02 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit b519de44112cf63e7f39ef2488cf50d54879af3a
Merge: e963dbb 43ae71c
Author: Kenn Knowles 
AuthorDate: Mon Jul 2 14:37:21 2018 -0700

Merge pull request #5865: [BEAM-4714] Instantiate "+" as DATETIME_PLUS

 .../beam/sdk/extensions/sql/impl/interpreter/BeamSqlFnExecutor.java | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)



[jira] [Created] (BEAM-4720) Boundedness and Unboundedness as a trait of a Rel so we know during planning

2018-07-02 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-4720:
-

 Summary: Boundedness and Unboundedness as a trait of a Rel so we 
know during planning
 Key: BEAM-4720
 URL: https://issues.apache.org/jira/browse/BEAM-4720
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Kenneth Knowles


Currently we do not know the boundedness or unboundedness of a collection until 
after planning is complete, so it cannot influence the plan. Instead, 
BeamJoinRel makes post-planning decisions about what sort of join to implement. 
We should move this into the planning phase. I think Calcite traits are the 
intended way to do this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2826) Need to generate a single XML file when write is performed on small amount of data

2018-07-02 Thread Eugene Kirpichov (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov closed BEAM-2826.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

This is indeed addressed by FileIO.write which I think was added in 2.2.

> Need to generate a single XML file when write is performed on small amount of 
> data
> --
>
> Key: BEAM-2826
> URL: https://issues.apache.org/jira/browse/BEAM-2826
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Affects Versions: 2.0.0
>Reporter: Balajee Venkatesh
>Assignee: Eugene Kirpichov
>Priority: Major
> Fix For: 2.2.0
>
>
> I'm trying to write an XML file where the source is a text file stored in 
> GCS. The code is running fine but instead of a single XML file, it is 
> generating multiple XML files. (No. of XML files seem to follow total no. of 
> records present in source text file). I have observed this scenario while 
> using 'DataflowRunner'.
> When I run the same code in local then two files get generated. First one 
> contains all the records with proper elements and the second one contains 
> only opening and closing root element.
> As I learnt,it is expected that it may produce multiple files: e.g. if the 
> runner chooses to process your data parallelizing it into 3 tasks 
> ("bundles"), you'll get 3 files. Some of the parts may turn out empty in some 
> cases, but the total data written will always add up to the expected data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2883) Poor error message when forgetting to specify a Datastore project.

2018-07-02 Thread Eugene Kirpichov (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov closed BEAM-2883.
--
   Resolution: Fixed
Fix Version/s: Not applicable

Fixed a long time ago.

> Poor error message when forgetting to specify a Datastore project.
> --
>
> Key: BEAM-2883
> URL: https://issues.apache.org/jira/browse/BEAM-2883
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Scott Wegner
>Assignee: Eugene Kirpichov
>Priority: Minor
> Fix For: Not applicable
>
>
> From: 
> https://stackoverflow.com/questions/46155781/mojoexecutionexception-while-writing-to-bigquery
> A Beam user using DatastoreIO.v1() encountered an error because their 
> pipeline didn't specify a Datastore project via 
> DatastoreIO.v1().read().withProject(..). But the error was not very 
> informative. Even when referencing the source code, it is difficult to trace 
> through the code due to use of AutoValue and ValueProvider.
> The error was:
> [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:java 
> (default-cli) on project pubsub-bigquery: An exception occured while 
> executing the Java class. null: InvocationTargetException: projectId -> [Help 
> 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.codehaus.mojo:exec-maven-plugin:1.4.0:java (default-cli) on project 
> pubsub-bigquery: An exception occured while executing the Java class. null
> at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
> at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
> at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
> at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
> at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
> at 
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
> at 
> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
> at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
> at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
> at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
> at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863)
> at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)
> at org.apache.maven.cli.MavenCli.main(MavenCli.java:199)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
> at 
> org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
> at 
> org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
> at 
> org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
> Caused by: org.apache.maven.plugin.MojoExecutionException: An exception 
> occured while executing the Java class. null
> at org.codehaus.mojo.exec.ExecJavaMojo.execute(ExecJavaMojo.java:345)
> at 
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
> at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:207)
> ... 20 more
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException: projectId
> at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:787)
> at 
> org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$Read.validate(DatastoreV1.java:624)
> at 
> org.apache.beam.sdk.Pipeline$ValidateVisitor.enterCompositeTransform(Pipeline.java:610)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:590)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:594)
> at 
> 

[jira] [Updated] (BEAM-3962) Add support for aborting bundles in the SdkHarnessClient on failure during processing

2018-07-02 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-3962:

Issue Type: Improvement  (was: Sub-task)
Parent: (was: BEAM-2899)

> Add support for aborting bundles in the SdkHarnessClient on failure during 
> processing
> -
>
> Key: BEAM-3962
> URL: https://issues.apache.org/jira/browse/BEAM-3962
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Luke Cwik
>Priority: Minor
>
> Specifically here:
> https://github.com/apache/beam/blob/6b357c67122195780d3fb89c32868bf5de4723b9/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/SdkHarnessClient.java#L214



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3761) Fix Python 3 cmp function

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3761?focusedWorklogId=118483=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118483
 ]

ASF GitHub Bot logged work on BEAM-3761:


Author: ASF GitHub Bot
Created on: 02/Jul/18 23:18
Start Date: 02/Jul/18 23:18
Worklog Time Spent: 10m 
  Work Description: cclauss commented on issue #5843: [BEAM-3761] Define 
cmp() in Python 3
URL: https://github.com/apache/beam/pull/5843#issuecomment-401965759
 
 
   The current approach is safer...
   
   We only have [18 more months](https://pythonclock.org) to get this done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118483)
Time Spent: 10h 50m  (was: 10h 40m)

> Fix Python 3 cmp function
> -
>
> Key: BEAM-3761
> URL: https://issues.apache.org/jira/browse/BEAM-3761
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: holdenk
>Priority: Major
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> Various functions don't exist in Python 3 that did in python 2. This Jira is 
> to fix the use of cmp (which often will involve rewriting __cmp__ as well).
>  
> Note: there are existing PRs for basestring and unicode ( 
> [https://github.com/apache/beam/pull/4697|https://github.com/apache/beam/pull/4697,]
>  , [https://github.com/apache/beam/pull/4730] )
>  
> Note once all of the missing names/functions are fixed we can enable F821 in 
> falke8 python 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4713) DECIMAL support in RowJsonDeserializer

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4713?focusedWorklogId=118482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118482
 ]

ASF GitHub Bot logged work on BEAM-4713:


Author: ASF GitHub Bot
Created on: 02/Jul/18 23:18
Start Date: 02/Jul/18 23:18
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #5867: [BEAM-4713] 
DECIMAL support in RowJsonDeserializer
URL: https://github.com/apache/beam/pull/5867#issuecomment-401965741
 
 
   `RowJsonDeserializerTest`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118482)
Time Spent: 0.5h  (was: 20m)

> DECIMAL support in RowJsonDeserializer
> --
>
> Key: BEAM-4713
> URL: https://issues.apache.org/jira/browse/BEAM-4713
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4663) Implement Cost calculations for Cost-Based Optimization (CBO)

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4663?focusedWorklogId=118481=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118481
 ]

ASF GitHub Bot logged work on BEAM-4663:


Author: ASF GitHub Bot
Created on: 02/Jul/18 23:17
Start Date: 02/Jul/18 23:17
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #5825: [WIP] [BEAM-4663] 
CBO cost calculation
URL: https://github.com/apache/beam/pull/5825#issuecomment-401965565
 
 
   Still more spotless, now you can run it globally `./gradlew 
spotlessJavaApply`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118481)
Time Spent: 20m  (was: 10m)

> Implement Cost calculations for Cost-Based Optimization (CBO) 
> --
>
> Key: BEAM-4663
> URL: https://issues.apache.org/jira/browse/BEAM-4663
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kai Jiang
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> To support CBO, we should implement methods in each Beam*Rel.java.  
> computeSelfCost(...) as our first step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] 01/01: Merge pull request #5856: Remove aliased tables in Nexmark SQL query 5

2018-07-02 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit f53bc0997c47708955a8f7ce0ac342294e9039ac
Merge: b6d4eeb 4c3e288
Author: Kenn Knowles 
AuthorDate: Mon Jul 2 16:28:28 2018 -0700

Merge pull request #5856: Remove aliased tables in Nexmark SQL query 5

 .../beam/sdk/nexmark/queries/sql/SqlQuery5.java| 49 +-
 1 file changed, 30 insertions(+), 19 deletions(-)



[jira] [Work logged] (BEAM-4451) SchemaRegistry should support a ServiceLoader interface

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4451?focusedWorklogId=118399=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118399
 ]

ASF GitHub Bot logged work on BEAM-4451:


Author: ASF GitHub Bot
Created on: 02/Jul/18 21:14
Start Date: 02/Jul/18 21:14
Worklog Time Spent: 10m 
  Work Description: apilloud commented on a change in pull request #5840: 
[BEAM-4451] Schemas default provider and serviceloader
URL: https://github.com/apache/beam/pull/5840#discussion_r199621243
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaProviderRegistrar.java
 ##
 @@ -0,0 +1,42 @@
+package org.apache.beam.sdk.schemas;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import com.google.auto.service.AutoService;
+import java.util.List;
+import java.util.ServiceLoader;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+
+/**
+ * {@link SchemaProvider} creators have the ability to automatically have 
their {@link
+ * SchemaProvider schemaProvider} registered with this SDK by creating a 
{@link ServiceLoader} entry
+ * and a concrete implementation of this interface.
+ *
+ * It is optional but recommended to use one of the many build time tools 
such as {@link
+ * AutoService} to generate the necessary META-INF files automatically.
+ */
+@Experimental(Kind.SCHEMAS)
+public interface SchemaProviderRegistrar {
+  /**
+   * Returns a list of {@link SchemaProvider schema providers} which will be 
registered by default
 
 Review comment:
   I'm curious as to why this always returns the providers that are "registered 
by default" instead of just the providers that are registered. Why is this 
extra restriction necessary?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118399)

> SchemaRegistry should support a ServiceLoader interface
> ---
>
> Key: BEAM-4451
> URL: https://issues.apache.org/jira/browse/BEAM-4451
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This will allow JARs to register schemas only when they are linked in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4451) SchemaRegistry should support a ServiceLoader interface

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4451?focusedWorklogId=118398=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118398
 ]

ASF GitHub Bot logged work on BEAM-4451:


Author: ASF GitHub Bot
Created on: 02/Jul/18 21:14
Start Date: 02/Jul/18 21:14
Worklog Time Spent: 10m 
  Work Description: apilloud commented on a change in pull request #5840: 
[BEAM-4451] Schemas default provider and serviceloader
URL: https://github.com/apache/beam/pull/5840#discussion_r199622358
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaRegistry.java
 ##
 @@ -55,8 +64,12 @@
 }
   }
 
-  private final Map entries = Maps.newHashMap();
-  private final List providers = Lists.newArrayList();
+  Map entries = Maps.newHashMap();
 
 Review comment:
   It doesn't look like anything requires the `private final` to be removed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118398)
Time Spent: 20m  (was: 10m)

> SchemaRegistry should support a ServiceLoader interface
> ---
>
> Key: BEAM-4451
> URL: https://issues.apache.org/jira/browse/BEAM-4451
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This will allow JARs to register schemas only when they are linked in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4713) DECIMAL support in RowJsonDeserializer

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4713?focusedWorklogId=118424=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118424
 ]

ASF GitHub Bot logged work on BEAM-4713:


Author: ASF GitHub Bot
Created on: 02/Jul/18 21:55
Start Date: 02/Jul/18 21:55
Worklog Time Spent: 10m 
  Work Description: amaliujia opened a new pull request #5867: [BEAM-4713] 
DECIMAL support in RowJsonDeserializer
URL: https://github.com/apache/beam/pull/5867
 
 
   DECIMAL support in RowJsonDeserializer
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118424)
Time Spent: 10m
Remaining Estimate: 0h

> DECIMAL support in RowJsonDeserializer
> --
>
> Key: BEAM-4713
> URL: https://issues.apache.org/jira/browse/BEAM-4713
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4713) DECIMAL support in RowJsonDeserializer

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4713?focusedWorklogId=118425=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118425
 ]

ASF GitHub Bot logged work on BEAM-4713:


Author: ASF GitHub Bot
Created on: 02/Jul/18 21:56
Start Date: 02/Jul/18 21:56
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #5867: [BEAM-4713] DECIMAL 
support in RowJsonDeserializer
URL: https://github.com/apache/beam/pull/5867#issuecomment-401950218
 
 
   R: @kennknowles 
   
   Also, we need improve Pubsub integration test to cover more data types. 
Tracking here: https://issues.apache.org/jira/browse/BEAM-4717


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118425)
Time Spent: 20m  (was: 10m)

> DECIMAL support in RowJsonDeserializer
> --
>
> Key: BEAM-4713
> URL: https://issues.apache.org/jira/browse/BEAM-4713
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4719) Enhanced LIMIT support

2018-07-02 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-4719:
-

 Summary: Enhanced LIMIT support
 Key: BEAM-4719
 URL: https://issues.apache.org/jira/browse/BEAM-4719
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Kenneth Knowles


Currently, Beam SQL supports LIMIT in two ways:

1. Within a query, the results are subject to LIMIT. This works.
2. The shell knows to cancel a pipeline when the limit is reached, even if 
there is unfinished unbounded data.

The canceling of a pipeline works via a basic pattern match against the query 
execution plan, checking a few child nodes of the BeamEnumerableConverter for a 
BeamSortRel without a collation. If it can figure out what the limit is for the 
outermost query, then it will cancel the pipeline.

A more robust approach might be to use traits (or some other thorough analysis) 
to see if there is a known size for the outermost query. This would, for 
example, be unaffected by any number of layer of non-size-changing 
transformations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4721) WindowingStrategy, Window fields in the project as a trait of a Rel

2018-07-02 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-4721:
-

 Summary: WindowingStrategy, Window fields in the project as a 
trait of a Rel
 Key: BEAM-4721
 URL: https://issues.apache.org/jira/browse/BEAM-4721
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Kenneth Knowles


Currently, WindowingStrategy is just a property of a PCollection but is unknown 
during planning. As with boundedness / unboundedness, this should be available 
statically so it can affect the plan. It may not be literally a 
WindowingStrategy object, but a SQL-level equivalent.

One windowing-related thing that is interesting is which fields represent the 
window. If we know this, then we know we can support per-window equijoins as 
long as a join condition also does an equijoin on the window.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4718) ./gradlew build should run nightly before ./gradlew publish

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4718?focusedWorklogId=118445=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118445
 ]

ASF GitHub Bot logged work on BEAM-4718:


Author: ASF GitHub Bot
Created on: 02/Jul/18 22:08
Start Date: 02/Jul/18 22:08
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #5868: [BEAM-4718]Run 
gradle build before publish
URL: https://github.com/apache/beam/pull/5868#issuecomment-401952745
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118445)
Time Spent: 20m  (was: 10m)

> ./gradlew build should run nightly before ./gradlew publish
> ---
>
> Key: BEAM-4718
> URL: https://issues.apache.org/jira/browse/BEAM-4718
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-1841) FileBasedSource should have safeguards for when set of files grows while job is running

2018-07-02 Thread Eugene Kirpichov (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530528#comment-16530528
 ] 

Eugene Kirpichov commented on BEAM-1841:


(especially because there is already a transform for handling a growing set of 
files - Watch / FileIO.match().continuously())

> FileBasedSource should have safeguards for when set of files grows while job 
> is running
> ---
>
> Key: BEAM-1841
> URL: https://issues.apache.org/jira/browse/BEAM-1841
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>Priority: Major
> Fix For: Not applicable
>
>
> In some cases people run pipelines over directories where the set of files in 
> the directory grows while the job runs. This may lead to a situation like 
> this, in particular with Dataflow runner:
> At job submission time, the FileBasedSource estimates the current size of the 
> filepattern, and ends up with a small number. Dataflow runner chooses thus a 
> small desiredBundleSizeBytes to pass to .splitIntoBundles(). However, at the 
> time splitIntoBundles() runs, the set of files has greatly grown, and we 
> produce many more, unnecessarily small bundles, than anticipated.
> I see a few things we could do:
> - In splitIntoBundles(), compute the actual size and detect when the desired 
> size is unreasonably small for it; e.g. set an upper threshold on how many 
> bundles we produce in total.
> - Somehow remember, at submission time, what was the estimated size. Then, in 
> splitIntoBundles(), compute the actual current size, and scale 
> desiredBundleSizeBytes accordingly to get approximately the intended number 
> of bundles. Caveat: files may still change between the moment size is 
> estimated and the moment splitting happens.
> - (much larger in scope) Change the whole protocol to use number of bundles 
> instead of bundle size bytes. This probably won't happen with BoundedSource, 
> but it is going to be the case with Splittable DoFn.
> Option 1 seems by far the simplest.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-3595) Normalize URNs across SDKs and runners.

2018-07-02 Thread Eugene Kirpichov (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov closed BEAM-3595.
--
   Resolution: Fixed
Fix Version/s: 2.5.0

Yeah this was done to a sufficient degree some time ago, by introducing the 
"(beam_urn)" enum value option.

> Normalize URNs across SDKs and runners.
> ---
>
> Key: BEAM-3595
> URL: https://issues.apache.org/jira/browse/BEAM-3595
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Eugene Kirpichov
>Priority: Major
> Fix For: 2.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-2901) Containerize ULR

2018-07-02 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-2901:

Issue Type: Improvement  (was: Sub-task)
Parent: (was: BEAM-2899)

> Containerize ULR
> 
>
> Key: BEAM-2901
> URL: https://issues.apache.org/jira/browse/BEAM-2901
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
>
> We should containerize ULR as a convenience options for users who have docker 
> installed, but perhaps not java.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-2900) ULR: configurable container/process management

2018-07-02 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-2900:

Issue Type: Improvement  (was: Sub-task)
Parent: (was: BEAM-2899)

> ULR: configurable container/process management
> --
>
> Key: BEAM-2900
> URL: https://issues.apache.org/jira/browse/BEAM-2900
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Henning Rohde
>Priority: Major
>  Labels: portability
>
> The ULR should support configurable container/process management as per 
> https://s.apache.org/beam-fn-api-container-contract. It would be convenient 
> if containers are optional for testing purposes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-2900) ULR: configurable container/process management

2018-07-02 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-2900:

Priority: Minor  (was: Major)

> ULR: configurable container/process management
> --
>
> Key: BEAM-2900
> URL: https://issues.apache.org/jira/browse/BEAM-2900
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
>
> The ULR should support configurable container/process management as per 
> https://s.apache.org/beam-fn-api-container-contract. It would be convenient 
> if containers are optional for testing purposes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3761) Fix Python 3 cmp function

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3761?focusedWorklogId=118479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118479
 ]

ASF GitHub Bot logged work on BEAM-3761:


Author: ASF GitHub Bot
Created on: 02/Jul/18 23:13
Start Date: 02/Jul/18 23:13
Worklog Time Spent: 10m 
  Work Description: holdenk commented on issue #5843: [BEAM-3761] Define 
cmp() in Python 3
URL: https://github.com/apache/beam/pull/5843#issuecomment-401964936
 
 
   You could try updating tox.ini to explicitly mention that as a dep?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118479)
Time Spent: 10h 40m  (was: 10.5h)

> Fix Python 3 cmp function
> -
>
> Key: BEAM-3761
> URL: https://issues.apache.org/jira/browse/BEAM-3761
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: holdenk
>Priority: Major
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> Various functions don't exist in Python 3 that did in python 2. This Jira is 
> to fix the use of cmp (which often will involve rewriting __cmp__ as well).
>  
> Note: there are existing PRs for basestring and unicode ( 
> [https://github.com/apache/beam/pull/4697|https://github.com/apache/beam/pull/4697,]
>  , [https://github.com/apache/beam/pull/4730] )
>  
> Note once all of the missing names/functions are fixed we can enable F821 in 
> falke8 python 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3874) Switch AvroIO sink default codec to Snappy

2018-07-02 Thread Eugene Kirpichov (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov reassigned BEAM-3874:
--

Assignee: (was: Eugene Kirpichov)

> Switch AvroIO sink default codec to Snappy
> --
>
> Key: BEAM-3874
> URL: https://issues.apache.org/jira/browse/BEAM-3874
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-avro
>Reporter: Marian Dvorsky
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> AvroIO currently uses CodecFactory.deflateCodec(6) as the default codec for 
> writes.
> That compresses well, but is quite expensive.
> Snappy codec offers sparser, but much faster compression, and is typically a 
> better CPU/storage tradeoff except for very long lived files. 
> We should consider switching the default to Snappy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2810) Consider a faster Avro library in Python

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2810?focusedWorklogId=118485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118485
 ]

ASF GitHub Bot logged work on BEAM-2810:


Author: ASF GitHub Bot
Created on: 02/Jul/18 23:23
Start Date: 02/Jul/18 23:23
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #5496: [BEAM-2810] 
use fastavro in Avro IO
URL: https://github.com/apache/beam/pull/5496#issuecomment-401966713
 
 
   This change fails the isort lint test:
   
   ```
   Running isort for module apache_beam  gen_protos.py  setup.py  
test_config.py:
   ERROR: /home/git/beam/sdks/python/apache_beam/io/avroio.py Imports are 
incorrectly sorted.
   --- /home/git/beam/sdks/python/apache_beam/io/avroio.py:before   
2018-07-02 12:57:38.472207
   +++ /home/git/beam/sdks/python/apache_beam/io/avroio.py:after
2018-07-02 16:20:40.310009
   @@ -51,8 +51,6 @@
from avro import io as avroio
from avro import datafile
from avro import schema
   -from fastavro.read import block_reader
   -from fastavro.write import Writer

import apache_beam as beam
from apache_beam.io import filebasedsink
   @@ -61,6 +59,8 @@
from apache_beam.io.filesystem import CompressionTypes
from apache_beam.io.iobase import Read
from apache_beam.transforms import PTransform
   +from fastavro.read import block_reader
   +from fastavro.write import Writer

__all__ = ['ReadFromAvro', 'ReadAllFromAvro', 'WriteToAvro']

   Command exited with non-zero status 1
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118485)
Time Spent: 5h 40m  (was: 5.5h)

> Consider a faster Avro library in Python
> 
>
> Key: BEAM-2810
> URL: https://issues.apache.org/jira/browse/BEAM-2810
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Ryan Williams
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> https://stackoverflow.com/questions/45870789/bottleneck-on-data-source
> Seems like this job is reading Avro files (exported by BigQuery) at about 2 
> MB/s.
> We use the standard Python "avro" library which is apparently known to be 
> very slow (10x+ slower than Java) 
> http://apache-avro.679487.n3.nabble.com/Avro-decode-very-slow-in-Python-td4034422.html,
>  and there are alternatives e.g. https://pypi.python.org/pypi/fastavro/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4064) ClassCastExeption when reading Avro files using specific records with org.apache.avro.util.Utf8 fields

2018-07-02 Thread Eugene Kirpichov (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov reassigned BEAM-4064:
--

Assignee: Chamikara Jayalath  (was: Eugene Kirpichov)

> ClassCastExeption when reading Avro files using specific records with 
> org.apache.avro.util.Utf8 fields
> --
>
> Key: BEAM-4064
> URL: https://issues.apache.org/jira/browse/BEAM-4064
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-avro
>Affects Versions: 2.4.0
>Reporter: Przemyslaw Dubaniewicz
>Assignee: Chamikara Jayalath
>Priority: Major
>
> Reading Avro files using Avro-generated classes with 
> org.apache.avro.util.Utf8 fields fails with an exception:
> {code:java}
> Exception in thread "main" java.lang.ClassCastException: java.lang.String 
> cannot be cast to org.apache.avro.util.Utf8
>     at com.example.avro.AvroRecord.put(AvroRecord.java:129)
>     at org.apache.avro.generic.GenericData.setField(GenericData.java:690)
>     at org.apache.avro.reflect.ReflectData.setField(ReflectData.java:135)
>     at org.apache.avro.reflect.ReflectData.setField(ReflectData.java:128)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:119)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:310)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
>     at 
> org.apache.beam.sdk.io.AvroSource$AvroBlock.readNextRecord(AvroSource.java:577)
>     at 
> org.apache.beam.sdk.io.BlockBasedSource$BlockBasedReader.readNextRecord(BlockBasedSource.java:223)
>     at 
> org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:473)
>     at 
> org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:468)
>     at 
> org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:261)
>     at 
> org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.processElement(BoundedReadEvaluatorFactory.java:141)
>     at 
> org.apache.beam.runners.direct.DirectTransformExecutor.processElements(DirectTransformExecutor.java:161)
>     at 
> org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:125)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4722) Dataflow post-commits failing due to insufficient INSTANCE_TEMPLATES quota

2018-07-02 Thread Rafael Fernandez (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rafael Fernandez resolved BEAM-4722.

   Resolution: Fixed
Fix Version/s: Not applicable

3x'd instance templates.

> Dataflow post-commits failing due to insufficient INSTANCE_TEMPLATES quota
> --
>
> Key: BEAM-4722
> URL: https://issues.apache.org/jira/browse/BEAM-4722
> Project: Beam
>  Issue Type: Improvement
>  Components: gcp-quota, runner-dataflow, testing
>Reporter: Scott Wegner
>Assignee: Rafael Fernandez
>Priority: Major
> Fix For: Not applicable
>
>
> See: https://github.com/apache/beam/pull/5861
> We recently increased the parallelism of Dataflow ValidatesRunner tests. 
> However, when we ran 3 concurrent builds we saw them all fail with 
> insufficient INSTANCE_TEMPLATES quota errors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3761) Fix Python 3 cmp function

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3761?focusedWorklogId=118487=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118487
 ]

ASF GitHub Bot logged work on BEAM-3761:


Author: ASF GitHub Bot
Created on: 02/Jul/18 23:34
Start Date: 02/Jul/18 23:34
Worklog Time Spent: 10m 
  Work Description: cclauss commented on issue #5843: [BEAM-3761] Define 
cmp() in Python 3
URL: https://github.com/apache/beam/pull/5843#issuecomment-401968502
 
 
   There are no Jenkins build issues using my approach  ;-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118487)
Time Spent: 11h 20m  (was: 11h 10m)

> Fix Python 3 cmp function
> -
>
> Key: BEAM-3761
> URL: https://issues.apache.org/jira/browse/BEAM-3761
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: holdenk
>Priority: Major
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> Various functions don't exist in Python 3 that did in python 2. This Jira is 
> to fix the use of cmp (which often will involve rewriting __cmp__ as well).
>  
> Note: there are existing PRs for basestring and unicode ( 
> [https://github.com/apache/beam/pull/4697|https://github.com/apache/beam/pull/4697,]
>  , [https://github.com/apache/beam/pull/4730] )
>  
> Note once all of the missing names/functions are fixed we can enable F821 in 
> falke8 python 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4714) Some DATETIME PLUS operators end up as ordinary PLUS and crash in accept()

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4714?focusedWorklogId=118387=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118387
 ]

ASF GitHub Bot logged work on BEAM-4714:


Author: ASF GitHub Bot
Created on: 02/Jul/18 20:52
Start Date: 02/Jul/18 20:52
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #5865: [BEAM-4714] 
Instantiate "+" as DATETIME_PLUS
URL: https://github.com/apache/beam/pull/5865#issuecomment-401931459
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118387)
Time Spent: 50m  (was: 40m)

> Some DATETIME PLUS operators end up as ordinary PLUS and crash in accept()
> --
>
> Key: BEAM-4714
> URL: https://issues.apache.org/jira/browse/BEAM-4714
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In this case it was in HOP_END (but not HOP_START, interestingly enough). 
> Unclear why this manifests, but the "+" operator should just work for 
> datetime + interval for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4714) Some DATETIME PLUS operators end up as ordinary PLUS and crash in accept()

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4714?focusedWorklogId=118386=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118386
 ]

ASF GitHub Bot logged work on BEAM-4714:


Author: ASF GitHub Bot
Created on: 02/Jul/18 20:52
Start Date: 02/Jul/18 20:52
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #5865: [BEAM-4714] 
Instantiate "+" as DATETIME_PLUS
URL: https://github.com/apache/beam/pull/5865#issuecomment-401931437
 
 
   file system artifact service flake


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118386)
Time Spent: 40m  (was: 0.5h)

> Some DATETIME PLUS operators end up as ordinary PLUS and crash in accept()
> --
>
> Key: BEAM-4714
> URL: https://issues.apache.org/jira/browse/BEAM-4714
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In this case it was in HOP_END (but not HOP_START, interestingly enough). 
> Unclear why this manifests, but the "+" operator should just work for 
> datetime + interval for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] 01/01: Merge pull request #5836: Makes FileIO.match watermark advance even without new files

2018-07-02 Thread jkff
This is an automated email from the ASF dual-hosted git repository.

jkff pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit e963dbb476b911298cba93f8a38c6366b8293506
Merge: cb72b48 6daf403
Author: Eugene Kirpichov 
AuthorDate: Mon Jul 2 14:02:28 2018 -0700

Merge pull request #5836: Makes FileIO.match watermark advance even without 
new files

Makes FileIO.match watermark advance even without new files

 sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)



[beam] branch master updated (cb72b48 -> e963dbb)

2018-07-02 Thread jkff
This is an automated email from the ASF dual-hosted git repository.

jkff pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from cb72b48  Merge pull request #5847: Remove more extraneous printlns 
from build
 add 6daf403  Makes FileIO.match watermark advance even without new files
 new e963dbb  Merge pull request #5836: Makes FileIO.match watermark 
advance even without new files

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)



[jira] [Work logged] (BEAM-4718) ./gradlew build should run nightly before ./gradlew publish

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4718?focusedWorklogId=118444=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118444
 ]

ASF GitHub Bot logged work on BEAM-4718:


Author: ASF GitHub Bot
Created on: 02/Jul/18 22:06
Start Date: 02/Jul/18 22:06
Worklog Time Spent: 10m 
  Work Description: boyuanzz opened a new pull request #5868: 
[BEAM-4718]Run gradle build before publish
URL: https://github.com/apache/beam/pull/5868
 
 
   r: @alanmyrvold 
   cc: @yifanzou
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118444)
Time Spent: 10m
Remaining Estimate: 0h

> ./gradlew build should run nightly before ./gradlew publish
> ---
>
> Key: BEAM-4718
> URL: https://issues.apache.org/jira/browse/BEAM-4718
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4718) ./gradlew build should run nightly before ./gradlew publish

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4718?focusedWorklogId=118448=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118448
 ]

ASF GitHub Bot logged work on BEAM-4718:


Author: ASF GitHub Bot
Created on: 02/Jul/18 22:12
Start Date: 02/Jul/18 22:12
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #5868: [BEAM-4718]Run 
gradle build before publish
URL: https://github.com/apache/beam/pull/5868#issuecomment-401953530
 
 
   Run Gradle Build


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118448)
Time Spent: 0.5h  (was: 20m)

> ./gradlew build should run nightly before ./gradlew publish
> ---
>
> Key: BEAM-4718
> URL: https://issues.apache.org/jira/browse/BEAM-4718
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4594) Implement Beam Python User State and Timer API

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4594?focusedWorklogId=118449=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118449
 ]

ASF GitHub Bot logged work on BEAM-4594:


Author: ASF GitHub Bot
Created on: 02/Jul/18 22:23
Start Date: 02/Jul/18 22:23
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5691: [BEAM-4594] Beam Python state and timers user-facing API
URL: https://github.com/apache/beam/pull/5691#discussion_r199606220
 
 

 ##
 File path: sdks/python/apache_beam/runners/common.py
 ##
 @@ -207,18 +208,29 @@ def _validate(self):
 self._validate_process()
 self._validate_bundle_method(self.start_bundle_method)
 self._validate_bundle_method(self.finish_bundle_method)
+self._validate_stateful_dofn()
 
   def _validate_process(self):
 """Validate that none of the DoFnParameters are repeated in the function
 """
-for param in core.DoFn.DoFnParams:
-  assert self.process_method.defaults.count(param) <= 1
+param_ids = list(d.param_id for d in self.process_method.defaults
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118449)
Time Spent: 2h 50m  (was: 2h 40m)

> Implement Beam Python User State and Timer API
> --
>
> Key: BEAM-4594
> URL: https://issues.apache.org/jira/browse/BEAM-4594
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Major
>  Labels: portability
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the implementation of the Beam Python User State and Timer 
> API, described here: [https://s.apache.org/beam-python-user-state-and-timers].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4594) Implement Beam Python User State and Timer API

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4594?focusedWorklogId=118452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118452
 ]

ASF GitHub Bot logged work on BEAM-4594:


Author: ASF GitHub Bot
Created on: 02/Jul/18 22:23
Start Date: 02/Jul/18 22:23
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5691: [BEAM-4594] Beam Python state and timers user-facing API
URL: https://github.com/apache/beam/pull/5691#discussion_r199608394
 
 

 ##
 File path: sdks/python/apache_beam/transforms/userstate.py
 ##
 @@ -0,0 +1,200 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""User-facing interfaces for the Beam State and Timer APIs.
+
+Experimental; no backwards-compatibility guarantees.
+"""
+
+from __future__ import absolute_import
+
+import logging
+
+from apache_beam.coders import Coder
+from apache_beam.transforms.timeutil import TimeDomain
+
+
+class StateSpec(object):
+  """Specification for a user DoFn state cell."""
+
+  def __init__(self):
+raise NotImplementedError
+
+  def __repr__(self):
+return '%s(%s)' % (self.__class__.__name__, self.name)
+
+
+class BagStateSpec(StateSpec):
+  """Specification for a user DoFn bag state cell."""
+
+  def __init__(self, name, coder):
+assert isinstance(name, str)
+assert isinstance(coder, Coder)
+self.name = name
+self.coder = coder
+
+
+class CombiningValueStateSpec(StateSpec):
+  """Specification for a user DoFn combining value state cell."""
+
+  def __init__(self, name, coder, combiner):
+# Avoid circular import.
+from apache_beam.transforms.core import CombineFn
+
+assert isinstance(name, str)
+assert isinstance(coder, Coder)
+assert isinstance(combiner, CombineFn)
+self.name = name
+self.coder = coder
+self.combiner = combiner
+
+
+class TimerSpec(object):
+  """Specification for a user stateful DoFn timer."""
+
+  def __init__(self, name, time_domain):
+self.name = name
+if time_domain not in (TimeDomain.WATERMARK, TimeDomain.REAL_TIME):
+  raise ValueError('Unsupported TimeDomain: %r.' % (time_domain,))
+self.time_domain = time_domain
+
+  def __repr__(self):
+return '%s(%s)' % (self.__class__.__name__, self.name)
+
+
+# Attribute for attaching a TimerSpec to an on_timer method.
+_TIMER_SPEC_ATTRIBUTE_NAME = '_on_timer_spec'
+
+
+def on_timer(timer_spec):
+  """Decorator for timer firing DoFn method.
+
+  This decorator allows a user to specify an on_timer processing method
+  in a stateful DoFn.  Sample usage:
+
+  > class MyDoFn(DoFn):
+  >   TIMER_SPEC = TimerSpec('timer', TimeDomain.WATERMARK)
+  >
+  >   @on_timer(TIMER_SPEC)
+  >   def my_timer_expiry_callback(self):
+  > logging.info('Timer expired!')
+  """
+
+  if not isinstance(timer_spec, TimerSpec):
+raise ValueError('@on_timer decorator expected TimerSpec.')
+
+  def _inner(method):
+if not callable(method):
+  raise ValueError('@on_timer decorator expected callable.')
+setattr(method, _TIMER_SPEC_ATTRIBUTE_NAME, timer_spec)
+return method
+
+  return _inner
+
+
+class UserStateUtils(object):
+
+  @staticmethod
+  def get_on_timer_methods(dofn):
+# Avoid circular import.
+from apache_beam.transforms.core import DoFn
+if not isinstance(dofn, DoFn):
 
 Review comment:
   This is just a sanity check.  Would you suggest removing this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118452)

> Implement Beam Python User State and Timer API
> --
>
> Key: BEAM-4594
> URL: https://issues.apache.org/jira/browse/BEAM-4594
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Charles Chen
>

[jira] [Work logged] (BEAM-4594) Implement Beam Python User State and Timer API

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4594?focusedWorklogId=118450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118450
 ]

ASF GitHub Bot logged work on BEAM-4594:


Author: ASF GitHub Bot
Created on: 02/Jul/18 22:23
Start Date: 02/Jul/18 22:23
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5691: [BEAM-4594] Beam Python state and timers user-facing API
URL: https://github.com/apache/beam/pull/5691#discussion_r199606354
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -288,13 +329,21 @@ class DoFn(WithTypeHints, HasDisplayData, 
urns.RunnerApiFn):
   callable object using the CallableWrapperDoFn class.
   """
 
-  ElementParam = 'ElementParam'
-  SideInputParam = 'SideInputParam'
-  TimestampParam = 'TimestampParam'
-  WindowParam = 'WindowParam'
-  WatermarkReporterParam = 'WatermarkReporterParam'
-
-  DoFnParams = [ElementParam, SideInputParam, TimestampParam, WindowParam]
+  # Parameters that can be used in the .process() method.
+  ElementParam = _BuiltinDoFnParam('ElementParam')
+  SideInputParam = _BuiltinDoFnParam('SideInputParam')
+  TimestampParam = _BuiltinDoFnParam('TimestampParam')
+  WindowParam = _BuiltinDoFnParam('WindowParam')
+  WatermarkReporterParam = _BuiltinDoFnParam('WatermarkReporterParam')
+
+  DoFnProcessParams = [ElementParam, SideInputParam, TimestampParam,
+   WindowParam, WatermarkReporterParam]
+
+  # Parameters to access state and timers.  Not restricted to use only in the
+  # .process() method. Usage: DoFn.StateParam(state_spec),
+  # DoFn.TimerParam(timer_spec).
+  StateParam = _StateDoFnParam
 
 Review comment:
   Acknowledged.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118450)
Time Spent: 3h  (was: 2h 50m)

> Implement Beam Python User State and Timer API
> --
>
> Key: BEAM-4594
> URL: https://issues.apache.org/jira/browse/BEAM-4594
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Major
>  Labels: portability
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> This issue tracks the implementation of the Beam Python User State and Timer 
> API, described here: [https://s.apache.org/beam-python-user-state-and-timers].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4594) Implement Beam Python User State and Timer API

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4594?focusedWorklogId=118454=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118454
 ]

ASF GitHub Bot logged work on BEAM-4594:


Author: ASF GitHub Bot
Created on: 02/Jul/18 22:23
Start Date: 02/Jul/18 22:23
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5691: [BEAM-4594] Beam Python state and timers user-facing API
URL: https://github.com/apache/beam/pull/5691#discussion_r199606238
 
 

 ##
 File path: sdks/python/apache_beam/runners/common.py
 ##
 @@ -207,18 +208,29 @@ def _validate(self):
 self._validate_process()
 self._validate_bundle_method(self.start_bundle_method)
 self._validate_bundle_method(self.finish_bundle_method)
+self._validate_stateful_dofn()
 
   def _validate_process(self):
 """Validate that none of the DoFnParameters are repeated in the function
 """
-for param in core.DoFn.DoFnParams:
-  assert self.process_method.defaults.count(param) <= 1
+param_ids = list(d.param_id for d in self.process_method.defaults
+ if isinstance(d, core._DoFnParam))
+if len(param_ids) != len(set(param_ids)):
+  raise ValueError(
+  'DoFn %r has duplicate process method parameters: %s.' % (
+  self.do_fn, param_ids))
 
   def _validate_bundle_method(self, method_wrapper):
 """Validate that none of the DoFnParameters are used in the function
 """
-for param in core.DoFn.DoFnParams:
-  assert param not in method_wrapper.defaults
+for param in core.DoFn.DoFnProcessParams:
+  if param in method_wrapper.defaults:
+raise ValueError(
+'DoFn.process() method-only parameters cannot be used in %s.' %
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118454)
Time Spent: 3.5h  (was: 3h 20m)

> Implement Beam Python User State and Timer API
> --
>
> Key: BEAM-4594
> URL: https://issues.apache.org/jira/browse/BEAM-4594
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Major
>  Labels: portability
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> This issue tracks the implementation of the Beam Python User State and Timer 
> API, described here: [https://s.apache.org/beam-python-user-state-and-timers].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4594) Implement Beam Python User State and Timer API

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4594?focusedWorklogId=118451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118451
 ]

ASF GitHub Bot logged work on BEAM-4594:


Author: ASF GitHub Bot
Created on: 02/Jul/18 22:23
Start Date: 02/Jul/18 22:23
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5691: [BEAM-4594] Beam Python state and timers user-facing API
URL: https://github.com/apache/beam/pull/5691#discussion_r199606282
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -276,6 +278,45 @@ def get_function_arguments(obj, func):
   return inspect.getargspec(f)
 
 
+class _DoFnParam(object):
+  """DoFn parameter."""
+
+  def __repr__(self):
+return '_DoFnParam(%s)' % self.param_id
+
+  def __eq__(self, other):
+if type(self) == type(other):
+  return self.param_id == other.param_id
+return False
+
+
+class _BuiltinDoFnParam(_DoFnParam):
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118451)
Time Spent: 3h 10m  (was: 3h)

> Implement Beam Python User State and Timer API
> --
>
> Key: BEAM-4594
> URL: https://issues.apache.org/jira/browse/BEAM-4594
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Major
>  Labels: portability
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the implementation of the Beam Python User State and Timer 
> API, described here: [https://s.apache.org/beam-python-user-state-and-timers].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4594) Implement Beam Python User State and Timer API

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4594?focusedWorklogId=118453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118453
 ]

ASF GitHub Bot logged work on BEAM-4594:


Author: ASF GitHub Bot
Created on: 02/Jul/18 22:23
Start Date: 02/Jul/18 22:23
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5691: [BEAM-4594] Beam Python state and timers user-facing API
URL: https://github.com/apache/beam/pull/5691#discussion_r199606308
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -276,6 +278,45 @@ def get_function_arguments(obj, func):
   return inspect.getargspec(f)
 
 
+class _DoFnParam(object):
+  """DoFn parameter."""
+
+  def __repr__(self):
+return '_DoFnParam(%s)' % self.param_id
+
+  def __eq__(self, other):
+if type(self) == type(other):
+  return self.param_id == other.param_id
+return False
+
+
+class _BuiltinDoFnParam(_DoFnParam):
+  """Built-in DoFn parameter."""
+
+  def __init__(self, param_id):
+self.param_id = param_id
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118453)
Time Spent: 3h 20m  (was: 3h 10m)

> Implement Beam Python User State and Timer API
> --
>
> Key: BEAM-4594
> URL: https://issues.apache.org/jira/browse/BEAM-4594
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Major
>  Labels: portability
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the implementation of the Beam Python User State and Timer 
> API, described here: [https://s.apache.org/beam-python-user-state-and-timers].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4594) Implement Beam Python User State and Timer API

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4594?focusedWorklogId=118455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118455
 ]

ASF GitHub Bot logged work on BEAM-4594:


Author: ASF GitHub Bot
Created on: 02/Jul/18 22:23
Start Date: 02/Jul/18 22:23
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5691: [BEAM-4594] Beam Python state and timers user-facing API
URL: https://github.com/apache/beam/pull/5691#discussion_r199637622
 
 

 ##
 File path: sdks/python/scripts/generate_pydoc.sh
 ##
 @@ -165,7 +165,11 @@ ignore_identifiers = [
   'WindowedTypeConstraint',  # apache_beam.typehints.typehints
 
   # stdlib classes without documentation
-  'unittest.case.TestCase'
+  'unittest.case.TestCase',
+
+  # DoFn param inner classes
+  '_StateDoFnParam',
+  '_TimerDoFnParam'
 
 Review comment:
   Done. As discussed offline, this is due to an issue Sphinx has with 
extracting docs for these classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118455)

> Implement Beam Python User State and Timer API
> --
>
> Key: BEAM-4594
> URL: https://issues.apache.org/jira/browse/BEAM-4594
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Major
>  Labels: portability
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> This issue tracks the implementation of the Beam Python User State and Timer 
> API, described here: [https://s.apache.org/beam-python-user-state-and-timers].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4594) Implement Beam Python User State and Timer API

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4594?focusedWorklogId=118458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118458
 ]

ASF GitHub Bot logged work on BEAM-4594:


Author: ASF GitHub Bot
Created on: 02/Jul/18 22:23
Start Date: 02/Jul/18 22:23
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5691: [BEAM-4594] Beam Python state and timers user-facing API
URL: https://github.com/apache/beam/pull/5691#discussion_r199637833
 
 

 ##
 File path: sdks/python/apache_beam/transforms/userstate.py
 ##
 @@ -0,0 +1,200 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""User-facing interfaces for the Beam State and Timer APIs.
+
+Experimental; no backwards-compatibility guarantees.
+"""
+
+from __future__ import absolute_import
+
+import logging
+
+from apache_beam.coders import Coder
+from apache_beam.transforms.timeutil import TimeDomain
+
+
+class StateSpec(object):
+  """Specification for a user DoFn state cell."""
+
+  def __init__(self):
+raise NotImplementedError
+
+  def __repr__(self):
+return '%s(%s)' % (self.__class__.__name__, self.name)
+
+
+class BagStateSpec(StateSpec):
+  """Specification for a user DoFn bag state cell."""
+
+  def __init__(self, name, coder):
+assert isinstance(name, str)
+assert isinstance(coder, Coder)
+self.name = name
+self.coder = coder
+
+
+class CombiningValueStateSpec(StateSpec):
+  """Specification for a user DoFn combining value state cell."""
+
+  def __init__(self, name, coder, combiner):
+# Avoid circular import.
+from apache_beam.transforms.core import CombineFn
+
+assert isinstance(name, str)
+assert isinstance(coder, Coder)
+assert isinstance(combiner, CombineFn)
+self.name = name
+self.coder = coder
+self.combiner = combiner
+
+
+class TimerSpec(object):
+  """Specification for a user stateful DoFn timer."""
+
+  def __init__(self, name, time_domain):
+self.name = name
+if time_domain not in (TimeDomain.WATERMARK, TimeDomain.REAL_TIME):
+  raise ValueError('Unsupported TimeDomain: %r.' % (time_domain,))
+self.time_domain = time_domain
+
+  def __repr__(self):
+return '%s(%s)' % (self.__class__.__name__, self.name)
+
+
+# Attribute for attaching a TimerSpec to an on_timer method.
+_TIMER_SPEC_ATTRIBUTE_NAME = '_on_timer_spec'
+
+
+def on_timer(timer_spec):
+  """Decorator for timer firing DoFn method.
+
+  This decorator allows a user to specify an on_timer processing method
+  in a stateful DoFn.  Sample usage:
+
+  > class MyDoFn(DoFn):
+  >   TIMER_SPEC = TimerSpec('timer', TimeDomain.WATERMARK)
+  >
+  >   @on_timer(TIMER_SPEC)
+  >   def my_timer_expiry_callback(self):
+  > logging.info('Timer expired!')
+  """
+
+  if not isinstance(timer_spec, TimerSpec):
+raise ValueError('@on_timer decorator expected TimerSpec.')
+
+  def _inner(method):
+if not callable(method):
+  raise ValueError('@on_timer decorator expected callable.')
+setattr(method, _TIMER_SPEC_ATTRIBUTE_NAME, timer_spec)
+return method
+
+  return _inner
+
+
+class UserStateUtils(object):
+
+  @staticmethod
+  def get_on_timer_methods(dofn):
+# Avoid circular import.
+from apache_beam.transforms.core import DoFn
+if not isinstance(dofn, DoFn):
+  raise ValueError('Expected DoFn.')
+
+timerid_to_methods = {}
+result = {}
+
+for name in dir(dofn):
+  value = getattr(dofn, name, None)
+  if (callable(value) and
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118458)
Time Spent: 4h  (was: 3h 50m)

> Implement Beam Python User State and Timer API
> --
>
> Key: BEAM-4594
> URL: 

[jira] [Work logged] (BEAM-4594) Implement Beam Python User State and Timer API

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4594?focusedWorklogId=118457=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118457
 ]

ASF GitHub Bot logged work on BEAM-4594:


Author: ASF GitHub Bot
Created on: 02/Jul/18 22:23
Start Date: 02/Jul/18 22:23
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5691: [BEAM-4594] Beam Python state and timers user-facing API
URL: https://github.com/apache/beam/pull/5691#discussion_r199637809
 
 

 ##
 File path: sdks/python/apache_beam/transforms/userstate.py
 ##
 @@ -0,0 +1,200 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""User-facing interfaces for the Beam State and Timer APIs.
+
+Experimental; no backwards-compatibility guarantees.
+"""
+
+from __future__ import absolute_import
+
+import logging
+
+from apache_beam.coders import Coder
+from apache_beam.transforms.timeutil import TimeDomain
+
+
+class StateSpec(object):
+  """Specification for a user DoFn state cell."""
+
+  def __init__(self):
+raise NotImplementedError
+
+  def __repr__(self):
+return '%s(%s)' % (self.__class__.__name__, self.name)
+
+
+class BagStateSpec(StateSpec):
+  """Specification for a user DoFn bag state cell."""
+
+  def __init__(self, name, coder):
+assert isinstance(name, str)
+assert isinstance(coder, Coder)
+self.name = name
+self.coder = coder
+
+
+class CombiningValueStateSpec(StateSpec):
+  """Specification for a user DoFn combining value state cell."""
+
+  def __init__(self, name, coder, combiner):
+# Avoid circular import.
+from apache_beam.transforms.core import CombineFn
+
+assert isinstance(name, str)
+assert isinstance(coder, Coder)
+assert isinstance(combiner, CombineFn)
+self.name = name
+self.coder = coder
+self.combiner = combiner
+
+
+class TimerSpec(object):
+  """Specification for a user stateful DoFn timer."""
+
+  def __init__(self, name, time_domain):
+self.name = name
+if time_domain not in (TimeDomain.WATERMARK, TimeDomain.REAL_TIME):
+  raise ValueError('Unsupported TimeDomain: %r.' % (time_domain,))
+self.time_domain = time_domain
+
+  def __repr__(self):
+return '%s(%s)' % (self.__class__.__name__, self.name)
+
+
+# Attribute for attaching a TimerSpec to an on_timer method.
+_TIMER_SPEC_ATTRIBUTE_NAME = '_on_timer_spec'
+
+
+def on_timer(timer_spec):
+  """Decorator for timer firing DoFn method.
+
+  This decorator allows a user to specify an on_timer processing method
+  in a stateful DoFn.  Sample usage:
+
+  > class MyDoFn(DoFn):
+  >   TIMER_SPEC = TimerSpec('timer', TimeDomain.WATERMARK)
+  >
+  >   @on_timer(TIMER_SPEC)
+  >   def my_timer_expiry_callback(self):
+  > logging.info('Timer expired!')
+  """
+
+  if not isinstance(timer_spec, TimerSpec):
 
 Review comment:
   I kept the asserts on the Spec constructors for brevity.  Let me know if you 
feel strongly about this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118457)
Time Spent: 3h 50m  (was: 3h 40m)

> Implement Beam Python User State and Timer API
> --
>
> Key: BEAM-4594
> URL: https://issues.apache.org/jira/browse/BEAM-4594
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Major
>  Labels: portability
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the implementation of the Beam Python User State and Timer 
> API, described here: [https://s.apache.org/beam-python-user-state-and-timers].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4594) Implement Beam Python User State and Timer API

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4594?focusedWorklogId=118464=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118464
 ]

ASF GitHub Bot logged work on BEAM-4594:


Author: ASF GitHub Bot
Created on: 02/Jul/18 22:25
Start Date: 02/Jul/18 22:25
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5691: [BEAM-4594] Beam Python state and timers user-facing API
URL: https://github.com/apache/beam/pull/5691#discussion_r199638210
 
 

 ##
 File path: sdks/python/apache_beam/transforms/userstate.py
 ##
 @@ -0,0 +1,200 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""User-facing interfaces for the Beam State and Timer APIs.
+
+Experimental; no backwards-compatibility guarantees.
+"""
+
+from __future__ import absolute_import
+
+import logging
+
+from apache_beam.coders import Coder
+from apache_beam.transforms.timeutil import TimeDomain
+
+
+class StateSpec(object):
+  """Specification for a user DoFn state cell."""
+
+  def __init__(self):
+raise NotImplementedError
+
+  def __repr__(self):
+return '%s(%s)' % (self.__class__.__name__, self.name)
+
+
+class BagStateSpec(StateSpec):
+  """Specification for a user DoFn bag state cell."""
+
+  def __init__(self, name, coder):
+assert isinstance(name, str)
+assert isinstance(coder, Coder)
+self.name = name
+self.coder = coder
+
+
+class CombiningValueStateSpec(StateSpec):
+  """Specification for a user DoFn combining value state cell."""
+
+  def __init__(self, name, coder, combiner):
+# Avoid circular import.
+from apache_beam.transforms.core import CombineFn
+
+assert isinstance(name, str)
+assert isinstance(coder, Coder)
+assert isinstance(combiner, CombineFn)
+self.name = name
+self.coder = coder
+self.combiner = combiner
+
+
+class TimerSpec(object):
+  """Specification for a user stateful DoFn timer."""
+
+  def __init__(self, name, time_domain):
+self.name = name
+if time_domain not in (TimeDomain.WATERMARK, TimeDomain.REAL_TIME):
+  raise ValueError('Unsupported TimeDomain: %r.' % (time_domain,))
+self.time_domain = time_domain
+
+  def __repr__(self):
+return '%s(%s)' % (self.__class__.__name__, self.name)
+
+
+# Attribute for attaching a TimerSpec to an on_timer method.
+_TIMER_SPEC_ATTRIBUTE_NAME = '_on_timer_spec'
+
+
+def on_timer(timer_spec):
+  """Decorator for timer firing DoFn method.
+
+  This decorator allows a user to specify an on_timer processing method
+  in a stateful DoFn.  Sample usage:
+
+  > class MyDoFn(DoFn):
+  >   TIMER_SPEC = TimerSpec('timer', TimeDomain.WATERMARK)
+  >
+  >   @on_timer(TIMER_SPEC)
+  >   def my_timer_expiry_callback(self):
+  > logging.info('Timer expired!')
+  """
+
+  if not isinstance(timer_spec, TimerSpec):
+raise ValueError('@on_timer decorator expected TimerSpec.')
+
+  def _inner(method):
+if not callable(method):
+  raise ValueError('@on_timer decorator expected callable.')
+setattr(method, _TIMER_SPEC_ATTRIBUTE_NAME, timer_spec)
+return method
+
+  return _inner
+
+
+class UserStateUtils(object):
+
+  @staticmethod
+  def get_on_timer_methods(dofn):
+# Avoid circular import.
+from apache_beam.transforms.core import DoFn
+if not isinstance(dofn, DoFn):
+  raise ValueError('Expected DoFn.')
+
+timerid_to_methods = {}
+result = {}
+
+for name in dir(dofn):
+  value = getattr(dofn, name, None)
+  if (callable(value) and
+  getattr(value, _TIMER_SPEC_ATTRIBUTE_NAME, None) is not None):
+timer_spec = getattr(value, _TIMER_SPEC_ATTRIBUTE_NAME)
+assert isinstance(timer_spec, TimerSpec)
+if timer_spec.name in timerid_to_methods:
+  raise ValueError(
+  'Duplicate on_timer method for timer of name %r.' %
+  timer_spec.name)
+timerid_to_methods[timer_spec.name] = value
+result[timer_spec] = value
+
+return result
+
+  @staticmethod
+  def validate_stateful_dofn(dofn):
+# Avoid circular import.
+from apache_beam.runners.common import MethodWrapper
+from 

[jira] [Work logged] (BEAM-4594) Implement Beam Python User State and Timer API

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4594?focusedWorklogId=118463=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118463
 ]

ASF GitHub Bot logged work on BEAM-4594:


Author: ASF GitHub Bot
Created on: 02/Jul/18 22:25
Start Date: 02/Jul/18 22:25
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5691: [BEAM-4594] Beam Python state and timers user-facing API
URL: https://github.com/apache/beam/pull/5691#discussion_r199638138
 
 

 ##
 File path: sdks/python/apache_beam/transforms/userstate.py
 ##
 @@ -0,0 +1,200 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""User-facing interfaces for the Beam State and Timer APIs.
+
+Experimental; no backwards-compatibility guarantees.
+"""
+
+from __future__ import absolute_import
+
+import logging
+
+from apache_beam.coders import Coder
+from apache_beam.transforms.timeutil import TimeDomain
+
+
+class StateSpec(object):
+  """Specification for a user DoFn state cell."""
+
+  def __init__(self):
+raise NotImplementedError
+
+  def __repr__(self):
+return '%s(%s)' % (self.__class__.__name__, self.name)
+
+
+class BagStateSpec(StateSpec):
+  """Specification for a user DoFn bag state cell."""
+
+  def __init__(self, name, coder):
+assert isinstance(name, str)
+assert isinstance(coder, Coder)
+self.name = name
+self.coder = coder
+
+
+class CombiningValueStateSpec(StateSpec):
+  """Specification for a user DoFn combining value state cell."""
+
+  def __init__(self, name, coder, combiner):
+# Avoid circular import.
+from apache_beam.transforms.core import CombineFn
+
+assert isinstance(name, str)
+assert isinstance(coder, Coder)
+assert isinstance(combiner, CombineFn)
+self.name = name
+self.coder = coder
+self.combiner = combiner
+
+
+class TimerSpec(object):
+  """Specification for a user stateful DoFn timer."""
+
+  def __init__(self, name, time_domain):
+self.name = name
+if time_domain not in (TimeDomain.WATERMARK, TimeDomain.REAL_TIME):
+  raise ValueError('Unsupported TimeDomain: %r.' % (time_domain,))
+self.time_domain = time_domain
+
+  def __repr__(self):
+return '%s(%s)' % (self.__class__.__name__, self.name)
+
+
+# Attribute for attaching a TimerSpec to an on_timer method.
+_TIMER_SPEC_ATTRIBUTE_NAME = '_on_timer_spec'
+
+
+def on_timer(timer_spec):
+  """Decorator for timer firing DoFn method.
+
+  This decorator allows a user to specify an on_timer processing method
+  in a stateful DoFn.  Sample usage:
+
+  > class MyDoFn(DoFn):
+  >   TIMER_SPEC = TimerSpec('timer', TimeDomain.WATERMARK)
+  >
+  >   @on_timer(TIMER_SPEC)
+  >   def my_timer_expiry_callback(self):
+  > logging.info('Timer expired!')
+  """
+
+  if not isinstance(timer_spec, TimerSpec):
+raise ValueError('@on_timer decorator expected TimerSpec.')
+
+  def _inner(method):
+if not callable(method):
+  raise ValueError('@on_timer decorator expected callable.')
+setattr(method, _TIMER_SPEC_ATTRIBUTE_NAME, timer_spec)
+return method
+
+  return _inner
+
+
+class UserStateUtils(object):
+
+  @staticmethod
+  def get_on_timer_methods(dofn):
+# Avoid circular import.
+from apache_beam.transforms.core import DoFn
+if not isinstance(dofn, DoFn):
+  raise ValueError('Expected DoFn.')
+
+timerid_to_methods = {}
+result = {}
+
+for name in dir(dofn):
+  value = getattr(dofn, name, None)
+  if (callable(value) and
+  getattr(value, _TIMER_SPEC_ATTRIBUTE_NAME, None) is not None):
+timer_spec = getattr(value, _TIMER_SPEC_ATTRIBUTE_NAME)
+assert isinstance(timer_spec, TimerSpec)
+if timer_spec.name in timerid_to_methods:
+  raise ValueError(
+  'Duplicate on_timer method for timer of name %r.' %
+  timer_spec.name)
+timerid_to_methods[timer_spec.name] = value
+result[timer_spec] = value
+
+return result
+
+  @staticmethod
+  def validate_stateful_dofn(dofn):
+# Avoid circular import.
+from apache_beam.runners.common import MethodWrapper
+from 

[jira] [Work logged] (BEAM-4594) Implement Beam Python User State and Timer API

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4594?focusedWorklogId=118462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118462
 ]

ASF GitHub Bot logged work on BEAM-4594:


Author: ASF GitHub Bot
Created on: 02/Jul/18 22:25
Start Date: 02/Jul/18 22:25
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5691: [BEAM-4594] Beam Python state and timers user-facing API
URL: https://github.com/apache/beam/pull/5691#discussion_r199638065
 
 

 ##
 File path: sdks/python/apache_beam/transforms/userstate.py
 ##
 @@ -0,0 +1,200 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""User-facing interfaces for the Beam State and Timer APIs.
+
+Experimental; no backwards-compatibility guarantees.
+"""
+
+from __future__ import absolute_import
+
+import logging
+
+from apache_beam.coders import Coder
+from apache_beam.transforms.timeutil import TimeDomain
+
+
+class StateSpec(object):
+  """Specification for a user DoFn state cell."""
+
+  def __init__(self):
+raise NotImplementedError
+
+  def __repr__(self):
+return '%s(%s)' % (self.__class__.__name__, self.name)
+
+
+class BagStateSpec(StateSpec):
+  """Specification for a user DoFn bag state cell."""
+
+  def __init__(self, name, coder):
+assert isinstance(name, str)
+assert isinstance(coder, Coder)
+self.name = name
+self.coder = coder
+
+
+class CombiningValueStateSpec(StateSpec):
+  """Specification for a user DoFn combining value state cell."""
+
+  def __init__(self, name, coder, combiner):
+# Avoid circular import.
+from apache_beam.transforms.core import CombineFn
+
+assert isinstance(name, str)
+assert isinstance(coder, Coder)
+assert isinstance(combiner, CombineFn)
+self.name = name
+self.coder = coder
+self.combiner = combiner
+
+
+class TimerSpec(object):
+  """Specification for a user stateful DoFn timer."""
+
+  def __init__(self, name, time_domain):
+self.name = name
+if time_domain not in (TimeDomain.WATERMARK, TimeDomain.REAL_TIME):
+  raise ValueError('Unsupported TimeDomain: %r.' % (time_domain,))
+self.time_domain = time_domain
+
+  def __repr__(self):
+return '%s(%s)' % (self.__class__.__name__, self.name)
+
+
+# Attribute for attaching a TimerSpec to an on_timer method.
+_TIMER_SPEC_ATTRIBUTE_NAME = '_on_timer_spec'
+
+
+def on_timer(timer_spec):
+  """Decorator for timer firing DoFn method.
+
+  This decorator allows a user to specify an on_timer processing method
+  in a stateful DoFn.  Sample usage:
+
+  > class MyDoFn(DoFn):
+  >   TIMER_SPEC = TimerSpec('timer', TimeDomain.WATERMARK)
+  >
+  >   @on_timer(TIMER_SPEC)
+  >   def my_timer_expiry_callback(self):
+  > logging.info('Timer expired!')
+  """
+
+  if not isinstance(timer_spec, TimerSpec):
+raise ValueError('@on_timer decorator expected TimerSpec.')
+
+  def _inner(method):
+if not callable(method):
+  raise ValueError('@on_timer decorator expected callable.')
+setattr(method, _TIMER_SPEC_ATTRIBUTE_NAME, timer_spec)
+return method
+
+  return _inner
+
+
+class UserStateUtils(object):
+
+  @staticmethod
+  def get_on_timer_methods(dofn):
+# Avoid circular import.
+from apache_beam.transforms.core import DoFn
+if not isinstance(dofn, DoFn):
+  raise ValueError('Expected DoFn.')
+
+timerid_to_methods = {}
+result = {}
+
+for name in dir(dofn):
+  value = getattr(dofn, name, None)
+  if (callable(value) and
+  getattr(value, _TIMER_SPEC_ATTRIBUTE_NAME, None) is not None):
+timer_spec = getattr(value, _TIMER_SPEC_ATTRIBUTE_NAME)
+assert isinstance(timer_spec, TimerSpec)
+if timer_spec.name in timerid_to_methods:
+  raise ValueError(
+  'Duplicate on_timer method for timer of name %r.' %
+  timer_spec.name)
+timerid_to_methods[timer_spec.name] = value
+result[timer_spec] = value
+
+return result
+
+  @staticmethod
+  def validate_stateful_dofn(dofn):
+# Avoid circular import.
+from apache_beam.runners.common import MethodWrapper
+from 

[jira] [Work logged] (BEAM-4451) SchemaRegistry should support a ServiceLoader interface

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4451?focusedWorklogId=118471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118471
 ]

ASF GitHub Bot logged work on BEAM-4451:


Author: ASF GitHub Bot
Created on: 02/Jul/18 22:46
Start Date: 02/Jul/18 22:46
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on a change in pull request #5840: 
[BEAM-4451] Schemas default provider and serviceloader
URL: https://github.com/apache/beam/pull/5840#discussion_r199641531
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaProviderRegistrar.java
 ##
 @@ -0,0 +1,42 @@
+package org.apache.beam.sdk.schemas;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import com.google.auto.service.AutoService;
+import java.util.List;
+import java.util.ServiceLoader;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+
+/**
+ * {@link SchemaProvider} creators have the ability to automatically have 
their {@link
+ * SchemaProvider schemaProvider} registered with this SDK by creating a 
{@link ServiceLoader} entry
+ * and a concrete implementation of this interface.
+ *
+ * It is optional but recommended to use one of the many build time tools 
such as {@link
+ * AutoService} to generate the necessary META-INF files automatically.
+ */
+@Experimental(Kind.SCHEMAS)
+public interface SchemaProviderRegistrar {
+  /**
+   * Returns a list of {@link SchemaProvider schema providers} which will be 
registered by default
 
 Review comment:
   I'm not sure I understand. This returns "a" list of providers that will be 
registered by default. Users will put instances of this in their own modules. 
e.g. the GCP module can contain one of these to register schemas for all 
TableRow objects.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118471)
Time Spent: 50m  (was: 40m)

> SchemaRegistry should support a ServiceLoader interface
> ---
>
> Key: BEAM-4451
> URL: https://issues.apache.org/jira/browse/BEAM-4451
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This will allow JARs to register schemas only when they are linked in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   >