[jira] [Work logged] (BEAM-6855) Side inputs are not supported when using the state API

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6855?focusedWorklogId=323322&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323322
 ]

ASF GitHub Bot logged work on BEAM-6855:


Author: ASF GitHub Bot
Created on: 04/Oct/19 10:43
Start Date: 04/Oct/19 10:43
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #9612: [BEAM-6855] 
Side inputs are not supported when using the state API
URL: https://github.com/apache/beam/pull/9612
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323322)
Time Spent: 6h 40m  (was: 6.5h)

> Side inputs are not supported when using the state API
> --
>
> Key: BEAM-6855
> URL: https://issues.apache.org/jira/browse/BEAM-6855
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core, runner-dataflow, runner-direct
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-8162) NPE error when add flink 1.9 runner

2019-10-04 Thread Maximilian Michels (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed BEAM-8162.

Fix Version/s: Not applicable
   Resolution: Duplicate

This has been fixed and merged for the 2.17.0 release. We decided not to 
backport because this becomes relevant for Flink 1.9 which also will be 
introduced in 2.17.0.

> NPE error when add flink 1.9 runner
> ---
>
> Key: BEAM-8162
> URL: https://issues.apache.org/jira/browse/BEAM-8162
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When add flink 1.9 runner in https://github.com/apache/beam/pull/9296, we 
> find an NPE error when run the `PortableTimersExecutionTest`. 
> the detail can be found here: 
> https://github.com/apache/beam/pull/9296#issuecomment-525262607



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8353) Replace Flink StreamingImpulseSource with generic timer-based source

2019-10-04 Thread Maximilian Michels (Jira)
Maximilian Michels created BEAM-8353:


 Summary: Replace Flink StreamingImpulseSource with generic 
timer-based source
 Key: BEAM-8353
 URL: https://issues.apache.org/jira/browse/BEAM-8353
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Maximilian Michels


The {{StreamingImpulseSource}} is a Flink Runner specific transform for load 
tests. Now that timers are supported, we should look into whether a timer-based 
generator could generate the same load as the current source. This would allow 
us to remove the proprietary implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=323338&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323338
 ]

ASF GitHub Bot logged work on BEAM-5707:


Author: ASF GitHub Bot
Created on: 04/Oct/19 11:25
Start Date: 04/Oct/19 11:25
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #9728: [BEAM-5707] Modify 
Flink streaming impulse function to include a counter
URL: https://github.com/apache/beam/pull/9728
 
 
   Flink's streaming impulse function is used for load tests which used to be
   impossible to realize with timers. This extends the functionality to send a
   per-partition counter with the impulse, instead of just an empty byte array.
   
   I've also filed https://jira.apache.org/jira/browse/BEAM-8353 to start the
   process of removing the proprietary source.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds

[jira] [Comment Edited] (BEAM-8319) Errorprone 0.0.13 fails during JDK11 build

2019-10-04 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942969#comment-16942969
 ] 

Lukasz Gajowy edited comment on BEAM-8319 at 10/4/19 1:47 PM:
--

I've found out that we're fine as long as guava's version is greater or equal 
to 23.5-jre. 

EDIT: version 23.1-jre is also fine. Versions 23.0 and 23.1-android fail with 
the following reason: 
{code:java}
java.lang.NoSuchMethodError: 
com.google.common.collect.ImmutableSet.toImmutableSet()Ljava/util/stream/Collector;
at com.google.errorprone.BugCheckerInfo.(BugCheckerInfo.java:119)
at com.google.errorprone.BugCheckerInfo.create(BugCheckerInfo.java:105)
at 
com.google.errorprone.scanner.BuiltInCheckerSuppliers.getSuppliers(BuiltInCheckerSuppliers.java:420)
at 
com.google.errorprone.scanner.BuiltInCheckerSuppliers.getSuppliers(BuiltInCheckerSuppliers.java:413)
at 
com.google.errorprone.scanner.BuiltInCheckerSuppliers.(BuiltInCheckerSuppliers.java:449)
at 
com.google.errorprone.ErrorProneJavacPlugin.init(ErrorProneJavacPlugin.java:44)
at 
com.sun.tools.javac.api.BasicJavacTask.initPlugins(BasicJavacTask.java:214)
at 
com.sun.tools.javac.api.JavacTaskImpl.prepareCompiler(JavacTaskImpl.java:192)
at 
com.sun.tools.javac.api.JavacTaskImpl.lambda$doCall$0(JavacTaskImpl.java:97)
at 
com.sun.tools.javac.api.JavacTaskImpl.handleExceptions(JavacTaskImpl.java:142)
at com.sun.tools.javac.api.JavacTaskImpl.doCall(JavacTaskImpl.java:96)
at com.sun.tools.javac.api.JavacTaskImpl.call(JavacTaskImpl.java:90)
at 
org.gradle.api.internal.tasks.compile.AnnotationProcessingCompileTask.call(AnnotationProcessingCompileTask.java:92)
at 
org.gradle.api.internal.tasks.compile.ResourceCleaningCompilationTask.call(ResourceCleaningCompilationTask.java:57)
at 
org.gradle.api.internal.tasks.compile.JdkJavaCompiler.execute(JdkJavaCompiler.java:50)
at 
org.gradle.api.internal.tasks.compile.JdkJavaCompiler.execute(JdkJavaCompiler.java:36)
at 
org.gradle.api.internal.tasks.compile.daemon.AbstractDaemonCompiler$CompilerCallable.call(AbstractDaemonCompiler.java:86)
at 
org.gradle.api.internal.tasks.compile.daemon.AbstractDaemonCompiler$CompilerCallable.call(AbstractDaemonCompiler.java:74)
at 
org.gradle.workers.internal.DefaultWorkerServer.execute(DefaultWorkerServer.java:42)
at 
org.gradle.workers.internal.WorkerDaemonServer.execute(WorkerDaemonServer.java:38)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.process.internal.worker.request.WorkerAction.run(WorkerAction.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175)
at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157)
at 
org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
at 
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
at 
org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at 
org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
at java.lang.Thread.run(Thread.java:745)
{code}


was (Author: łukaszg):
I've found out that we're fine as long as guava's version is greater or equal 
to 23.5-jre

> Errorprone 0.0.13 fails during JDK11 build
> --
>
> Key: BEAM-8319
> URL: https://issues.apache.org/jira/browse/BEAM-8319
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Lukasz Gajowy
>Assignee: Lukasz

[jira] [Work logged] (BEAM-8319) Errorprone 0.0.13 fails during JDK11 build

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8319?focusedWorklogId=323490&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323490
 ]

ASF GitHub Bot logged work on BEAM-8319:


Author: ASF GitHub Bot
Created on: 04/Oct/19 15:52
Start Date: 04/Oct/19 15:52
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #9729: [BEAM-8319] 
Errorprone + java11 prototype, not to be merged
URL: https://github.com/apache/beam/pull/9729
 
 
   This configuration was tested only on `sdks/java/core` module. The build on 
openjdk11.0.2 succeeds when configured as follows: 
   
   ```
   ./gradlew clean build -p sdks/java/core/ -xtest -xspotbugsMain 
   ```
   
   Sharing this so that more people could experiment and spot potential errors 
or point new directions (if any). 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.or

[jira] [Commented] (BEAM-8319) Errorprone 0.0.13 fails during JDK11 build

2019-10-04 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944618#comment-16944618
 ] 

Lukasz Gajowy commented on BEAM-8319:
-

Thank you for your help. When guava is not forced, the version is 27.0.1 (I'm 
compiling only sdks/java/core to have a quick feedback). This is natural for 
Gradle - it by default uses the newest of conflicting versions and this is the 
case of 27.0.1 in sdks/java/core. 

I double checked once again, and to sum up: 

I got errorprone working on versions (greater or equal to): 
 - errorprone plugin version: 0.8.1
 - com.google.errorprone:error_prone_annotations:2.3.3
 - com.google.errorprone:error_prone_core:2.3.3
 - com.google.errorprone:javac:9+181-r4173-1
 - guava version 23.1-jre

I used openjdk11.0.2.

Link to a pull request draft where I'm currently working on errorprone: 
[https://github.com/apache/beam/pull/9729]  



And the command I used for testing it on core:
{code:java}
 ./gradlew clean build -p sdks/java/core/ -xtest -xspotbugsMain {code}
Spotbugs and tests failed so I skipped them for now.


Suprisingly, errorprone in this configuration shows lot's of warnings - is 
errorprone working properly now?



My plan for now is to try to upgrade guava to the desired version (at least 
23.1-jre). It may possibly require bumping other dependencies too. I will track 
progress here: https://issues.apache.org/jira/browse/BEAM-5559

> Errorprone 0.0.13 fails during JDK11 build
> --
>
> Key: BEAM-8319
> URL: https://issues.apache.org/jira/browse/BEAM-8319
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I'm using openjdk 1.11.02. After switching version to;
> {code:java}
> javaVersion = 11 {code}
> in BeamModule Plugin and running
> {code:java}
> ./gradlew clean build -p sdks/java/code -xtest {code}
> building fails. I was able to run errorprone after upgrading it but had 
> problems with conflicting guava version. See more here: 
> https://issues.apache.org/jira/browse/BEAM-5085
>  
> Stacktrace:
> {code:java}
> org.gradle.api.tasks.TaskExecutionException: Execution failed for task 
> ':model:pipeline:compileJava'.
> at 
> org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:121)
> at 
> org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:117)
> at org.gradle.internal.Try$Failure.ifSuccessfulOrElse(Try.java:184)
> at 
> org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.execute(ExecuteActionsTaskExecuter.java:110)
> at 
> org.gradle.api.internal.tasks.execution.ResolveIncrementalChangesTaskExecuter.execute(ResolveIncrementalChangesTaskExecuter.java:84)
> at 
> org.gradle.api.internal.tasks.execution.ResolveTaskOutputCachingStateExecuter.execute(ResolveTaskOutputCachingStateExecuter.java:91)
> at 
> org.gradle.api.internal.tasks.execution.FinishSnapshotTaskInputsBuildOperationTaskExecuter.execute(FinishSnapshotTaskInputsBuildOperationTaskExecuter.java:51)
> at 
> org.gradle.api.internal.tasks.execution.ResolveBuildCacheKeyExecuter.execute(ResolveBuildCacheKeyExecuter.java:102)
> at 
> org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionStateTaskExecuter.execute(ResolveBeforeExecutionStateTaskExecuter.java:74)
> at 
> org.gradle.api.internal.tasks.execution.ValidatingTaskExecuter.execute(ValidatingTaskExecuter.java:58)
> at 
> org.gradle.api.internal.tasks.execution.SkipEmptySourceFilesTaskExecuter.execute(SkipEmptySourceFilesTaskExecuter.java:109)
> at 
> org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionOutputsTaskExecuter.execute(ResolveBeforeExecutionOutputsTaskExecuter.java:67)
> at 
> org.gradle.api.internal.tasks.execution.StartSnapshotTaskInputsBuildOperationTaskExecuter.execute(StartSnapshotTaskInputsBuildOperationTaskExecuter.java:52)
> at 
> org.gradle.api.internal.tasks.execution.ResolveAfterPreviousExecutionStateTaskExecuter.execute(ResolveAfterPreviousExecutionStateTaskExecuter.java:46)
> at 
> org.gradle.api.internal.tasks.execution.CleanupStaleOutputsExecuter.execute(CleanupStaleOutputsExecuter.java:93)
> at 
> org.gradle.api.internal.tasks.execution.FinalizePropertiesTaskExecuter.execute(FinalizePropertiesTaskExecuter.java:45)
> at 
> org.gradle.api.internal.tasks.execution.ResolveTaskExecutionModeExecuter.execute(ResolveTaskExecutionModeExecuter.java:94)
> at 
> org.gradle.api.internal.tasks.execution.SkipTaskWithNoActionsExecuter.execute(SkipTaskWithNoActionsExecuter.java:57)

[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=323497&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323497
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 04/Oct/19 15:58
Start Date: 04/Oct/19 15:58
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-538457947
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323497)
Time Spent: 50m  (was: 40m)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=323498&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323498
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 04/Oct/19 15:58
Start Date: 04/Oct/19 15:58
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-538458010
 
 
   Run Python2_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323498)
Time Spent: 1h  (was: 50m)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=323503&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323503
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 04/Oct/19 15:59
Start Date: 04/Oct/19 15:59
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-538169256
 
 
   Here's a breakdown of the changes required to get to pylint 2.4:
   
   - fix a bunch of warnings about deprecated methods.  mostly `logger.warn` 
and various unittest methods
   - update the names of a few error codes: `disable=unused-import` and 
`possibly-unused-variable`
   - ignore a bunch of newly introduced style warnings that did not seem 
important
   - run the lint on python-3.7: this ensures that it can run on test files 
that only work on python-37 due to syntax features
   - merge the lint tests into one test:
 - `run_pylint_2to3.sh` was just testing futurization.  seems fine to do 
this all the time now that our code is python2 compliant
 - there was a "mini" test just for python3-compatibility.  not needed 
anymore now that everything is running on python3
 - stop running `pycodestyle`: it's run as part of `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323503)
Time Spent: 1h 20m  (was: 1h 10m)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=323502&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323502
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 04/Oct/19 15:59
Start Date: 04/Oct/19 15:59
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-538169256
 
 
   Here's a breakdown of the changes required to get to pylint 2.4:
   
   - fix a bunch of warnings about deprecated methods.  mostly `logger.warn` 
and various unittest methods
   - update the names of a few error codes: `disable=unused-import` and 
`possibly-unused-variable`
   - ignore a bunch of newly introduced style warnings that did not seem 
important
   - run the lint on python-3.7: this ensures that it can run on test files 
that only work on python-37 due to syntax features
   - merge the lint tests into one test:
 - `run_pylint_2to3.sh` was a test just for testing the futurization.  
seems fine to do this all the time now that our code is python2 compliant
 - there was a "mini" test just for python3-compatibility.  not needed 
anymore now that everything is running on python3
 - stop running `pycodestyle`: it's run as part of `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323502)
Time Spent: 1h 10m  (was: 1h)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-5559) Beam Dependency Update Request: com.google.guava:guava

2019-10-04 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944623#comment-16944623
 ] 

Lukasz Gajowy commented on BEAM-5559:
-

Thank you Luke - this is super helpful and I learned a lot.

I tried to find a higher consitent guava version and bump it as high as 
possible but I'm currently blocked on: 
https://issues.apache.org/jira/browse/HADOOP-14891. We use version that is 
affected by the issue - for example when sdks/java/io/hadoop-file-system/ is 
built and guava is in version 21.0. I will try to update the hadoop version as 
well and see what happens. I will keep Jira posted. 

> Beam Dependency Update Request: com.google.guava:guava
> --
>
> Key: BEAM-5559
> URL: https://issues.apache.org/jira/browse/BEAM-5559
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: Major
> Fix For: 2.15.0
>
>
>  - 2018-10-01 19:30:53.471497 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 20.0. The latest version is 26.0-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-08 12:18:05.174889 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 20.0. The latest version is 26.0-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-04-15 12:32:27.737694 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 20.0. The latest version is 27.1-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-04-22 12:10:18.539470 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 20.0. The latest version is 27.1-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3845) Avoid calling Class#newInstance

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3845?focusedWorklogId=323511&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323511
 ]

ASF GitHub Bot logged work on BEAM-3845:


Author: ASF GitHub Bot
Created on: 04/Oct/19 16:04
Start Date: 04/Oct/19 16:04
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #9613: [BEAM-3845] Remove 
deprecated Class.newInstance() method usage
URL: https://github.com/apache/beam/pull/9613#issuecomment-538460375
 
 
   @iemejia does this PR LGTY? :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323511)
Time Spent: 1h 10m  (was: 1h)

> Avoid calling Class#newInstance
> ---
>
> Key: BEAM-3845
> URL: https://issues.apache.org/jira/browse/BEAM-3845
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Ted Yu
>Assignee: Lukasz Gajowy
>Priority: Minor
>  Labels: triaged
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Class#newInstance is deprecated starting in Java 9 - 
> https://bugs.openjdk.java.net/browse/JDK-6850612 - because it may throw 
> undeclared checked exceptions.
> The suggested replacement is getDeclaredConstructor().newInstance(), which 
> wraps the checked exceptions in InvocationException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7596) List is not parallel: Make it parallel

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7596?focusedWorklogId=323528&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323528
 ]

ASF GitHub Bot logged work on BEAM-7596:


Author: ASF GitHub Bot
Created on: 04/Oct/19 16:16
Start Date: 04/Oct/19 16:16
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #8912: [BEAM-7596] Align 
wording of list in web docs
URL: https://github.com/apache/beam/pull/8912#issuecomment-538464772
 
 
   Should this be re-based and merged? Or closed?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323528)
Time Spent: 2h 20m  (was: 2h 10m)

> List is not parallel: Make it parallel
> --
>
> Key: BEAM-7596
> URL: https://issues.apache.org/jira/browse/BEAM-7596
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Riona MacNamara
>Assignee: Jonas Grabber
>Priority: Trivial
>  Labels: Starter, ccoss2019
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> In [https://beam.apache.org/documentation/], the list under *Concepts* is not 
> parallel:
>  * The [Programming 
> Guide|https://beam.apache.org/documentation/programming-guide/] introduces 
> all the key Beam concepts.
>  * Learn about Beam’s [execution 
> model|https://beam.apache.org/documentation/execution-model/] to better 
> understand how pipelines execute.
>  * Visit [Learning 
> Resources|https://beam.apache.org/documentation/resources/learning-resources] 
> for some of our favorite articles and talks about Beam.
> The first item should either begin with a verb, or the second two items 
> should be framed as a description of the doc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7917) Python datastore v1new fails on retry

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7917?focusedWorklogId=323538&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323538
 ]

ASF GitHub Bot logged work on BEAM-7917:


Author: ASF GitHub Bot
Created on: 04/Oct/19 16:23
Start Date: 04/Oct/19 16:23
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9294: [BEAM-7917] Fix 
datastore writes failing on retry
URL: https://github.com/apache/beam/pull/9294#issuecomment-538467447
 
 
   What are the next steps for this PR?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323538)
Time Spent: 2h 50m  (was: 2h 40m)

> Python datastore v1new fails on retry
> -
>
> Key: BEAM-7917
> URL: https://issues.apache.org/jira/browse/BEAM-7917
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, runner-dataflow
>Affects Versions: 2.14.0
> Environment: Python 3.7 on Dataflow
>Reporter: Dmytro Sadovnychyi
>Assignee: Dmytro Sadovnychyi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Traceback (most recent call last):
>   File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>   File "apache_beam/runners/common.py", line 454, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py",
>  line 334, in process
> self._flush_batch()
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py",
>  line 349, in _flush_batch
> throttle_delay=util.WRITE_BATCH_TARGET_LATENCY_MS // 1000)
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/utils/retry.py", 
> line 197, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/helper.py",
>  line 99, in write_mutations
> batch.commit()
>   File 
> "/usr/local/lib/python3.7/site-packages/google/cloud/datastore/batch.py", 
> line 271, in commit
> raise ValueError("Batch must be in progress to commit()")
> ValueError: Batch must be in progress to commit()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7495) Add support for dynamic worker re-balancing when reading BigQuery data using Cloud Dataflow

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7495?focusedWorklogId=323547&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323547
 ]

ASF GitHub Bot logged work on BEAM-7495:


Author: ASF GitHub Bot
Created on: 04/Oct/19 16:30
Start Date: 04/Oct/19 16:30
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9083: [BEAM-7495] Improve 
the test that compares EXPORT and DIRECT_READ
URL: https://github.com/apache/beam/pull/9083#issuecomment-538469590
 
 
   Run Java PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323547)
Remaining Estimate: 490h 40m  (was: 490h 50m)
Time Spent: 13h 20m  (was: 13h 10m)

> Add support for dynamic worker re-balancing when reading BigQuery data using 
> Cloud Dataflow
> ---
>
> Key: BEAM-7495
> URL: https://issues.apache.org/jira/browse/BEAM-7495
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Aryan Naraghi
>Assignee: Aryan Naraghi
>Priority: Major
>   Original Estimate: 504h
>  Time Spent: 13h 20m
>  Remaining Estimate: 490h 40m
>
> Currently, the BigQuery connector for reading data using the BigQuery Storage 
> API does not support any of the facilities on the source for Dataflow to 
> split streams.
>  
> On the server side, the BigQuery Storage API supports splitting streams at a 
> fraction. By adding support to the connector, we enable Dataflow to split 
> streams, which unlocks dynamic worker re-balancing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8351?focusedWorklogId=323549&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323549
 ]

ASF GitHub Bot logged work on BEAM-8351:


Author: ASF GitHub Bot
Created on: 04/Oct/19 16:32
Start Date: 04/Oct/19 16:32
Worklog Time Spent: 10m 
  Work Description: violalyu commented on pull request #9730: [BEAM-8351] 
Support passing in arbitrary KV pairs to sdk worker via external environment 
config
URL: https://github.com/apache/beam/pull/9730
 
 
   Originally, the environment config for environment type of EXTERNAL only 
support passing in an url for the external worker pool; We want to support 
passing in arbitrary KV pairs to sdk worker via external environment config, so 
that the when starting the sdk harness we could get the values from 
`StartWorkerRequest.params`.
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Bu

[jira] [Work logged] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8351?focusedWorklogId=323550&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323550
 ]

ASF GitHub Bot logged work on BEAM-8351:


Author: ASF GitHub Bot
Created on: 04/Oct/19 16:33
Start Date: 04/Oct/19 16:33
Worklog Time Spent: 10m 
  Work Description: violalyu commented on issue #9730: [BEAM-8351] Support 
passing in arbitrary KV pairs to sdk worker via external environment config
URL: https://github.com/apache/beam/pull/9730#issuecomment-538470626
 
 
   R: @robertwb 
   R: @chadrik 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323550)
Time Spent: 20m  (was: 10m)

> Support passing in arbitrary KV pairs to sdk worker via external environment 
> config
> ---
>
> Key: BEAM-8351
> URL: https://issues.apache.org/jira/browse/BEAM-8351
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Wanqi Lyu
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Originally, the environment config for environment type of EXTERNAL only 
> support passing in an url for the external worker pool; We want to support 
> passing in arbitrary KV pairs to sdk worker via external environment config, 
> so that the when starting the sdk harness we could get the values from 
> `StartWorkerRequest.params`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8209) Document custom docker containers

2019-10-04 Thread Mark Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Liu updated BEAM-8209:
---
Fix Version/s: (was: 2.16.0)
   2.17.0

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.17.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8209) Document custom docker containers

2019-10-04 Thread Mark Liu (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944654#comment-16944654
 ] 

Mark Liu commented on BEAM-8209:


Move this issue to 2.17 since it's a documentation change which should not 
block release itself.

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.17.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7917) Python datastore v1new fails on retry

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7917?focusedWorklogId=323553&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323553
 ]

ASF GitHub Bot logged work on BEAM-7917:


Author: ASF GitHub Bot
Created on: 04/Oct/19 16:38
Start Date: 04/Oct/19 16:38
Worklog Time Spent: 10m 
  Work Description: sadovnychyi commented on issue #9294: [BEAM-7917] Fix 
datastore writes failing on retry
URL: https://github.com/apache/beam/pull/9294#issuecomment-538472297
 
 
   Another review pass would be helpful.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323553)
Time Spent: 3h  (was: 2h 50m)

> Python datastore v1new fails on retry
> -
>
> Key: BEAM-7917
> URL: https://issues.apache.org/jira/browse/BEAM-7917
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, runner-dataflow
>Affects Versions: 2.14.0
> Environment: Python 3.7 on Dataflow
>Reporter: Dmytro Sadovnychyi
>Assignee: Dmytro Sadovnychyi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Traceback (most recent call last):
>   File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>   File "apache_beam/runners/common.py", line 454, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py",
>  line 334, in process
> self._flush_batch()
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py",
>  line 349, in _flush_batch
> throttle_delay=util.WRITE_BATCH_TARGET_LATENCY_MS // 1000)
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/utils/retry.py", 
> line 197, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/helper.py",
>  line 99, in write_mutations
> batch.commit()
>   File 
> "/usr/local/lib/python3.7/site-packages/google/cloud/datastore/batch.py", 
> line 271, in commit
> raise ValueError("Batch must be in progress to commit()")
> ValueError: Batch must be in progress to commit()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7907) Support customized container

2019-10-04 Thread Mark Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Liu resolved BEAM-7907.

Resolution: Fixed

> Support customized container
> 
>
> Key: BEAM-7907
> URL: https://issues.apache.org/jira/browse/BEAM-7907
> Project: Beam
>  Issue Type: New Feature
>  Components: build-system, sdk-py-harness
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Labels: portability
> Fix For: 2.16.0
>
>
> Support customized container.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7907) Support customized container

2019-10-04 Thread Mark Liu (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944655#comment-16944655
 ] 

Mark Liu commented on BEAM-7907:


The major tasks are closed in 2.16. The only one left is documentation which is 
moved to 2.17. I think it's okay to close it as an 2.16 release tracker.

> Support customized container
> 
>
> Key: BEAM-7907
> URL: https://issues.apache.org/jira/browse/BEAM-7907
> Project: Beam
>  Issue Type: New Feature
>  Components: build-system, sdk-py-harness
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Labels: portability
> Fix For: 2.16.0
>
>
> Support customized container.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8351?focusedWorklogId=323559&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323559
 ]

ASF GitHub Bot logged work on BEAM-8351:


Author: ASF GitHub Bot
Created on: 04/Oct/19 16:45
Start Date: 04/Oct/19 16:45
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9730: [BEAM-8351] Support 
passing in arbitrary KV pairs to sdk worker via external environment config
URL: https://github.com/apache/beam/pull/9730#issuecomment-538474791
 
 
   R: @mxm
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323559)
Time Spent: 0.5h  (was: 20m)

> Support passing in arbitrary KV pairs to sdk worker via external environment 
> config
> ---
>
> Key: BEAM-8351
> URL: https://issues.apache.org/jira/browse/BEAM-8351
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Wanqi Lyu
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Originally, the environment config for environment type of EXTERNAL only 
> support passing in an url for the external worker pool; We want to support 
> passing in arbitrary KV pairs to sdk worker via external environment config, 
> so that the when starting the sdk harness we could get the values from 
> `StartWorkerRequest.params`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-04 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8343:

Description: 
The objective is to create a universal way for Beam SQL IO APIs to support 
predicate/project push-down.
 A proposed way to achieve that is by introducing an interface responsible for 
identifying what portion(s) of a Calc can be moved down to IO layer. Also, 
adding following methods to a BeamSqlTable interface to pass necessary 
parameters to IO APIs:
 - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
 - Boolean supportsProjects()
 - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
List fieldNames)
  

Design doc 
[link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].

  was:
The objective is to create a universal way for Beam SQL IO APIs to support 
predicate/project push-down.
A proposed way to achieve that is by introducing an interface responsible for 
identifying what portion(s) of a Calc can be moved down to IO layer. Also, 
adding following methods to a BeamSqlTable interface to pass necessary 
parameters to IO APIs:
- BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
- Boolean supportsProjects()
- PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
List fieldNames)
 

Design doc 
[link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].


> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
>  - Boolean supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8351?focusedWorklogId=323564&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323564
 ]

ASF GitHub Bot logged work on BEAM-8351:


Author: ASF GitHub Bot
Created on: 04/Oct/19 16:49
Start Date: 04/Oct/19 16:49
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9730: [BEAM-8351] 
Support passing in arbitrary KV pairs to sdk worker via external environment 
config
URL: https://github.com/apache/beam/pull/9730#discussion_r331592994
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/portable_runner.py
 ##
 @@ -129,11 +129,25 @@ def _create_environment(options):
   env=(config.get('env') or '')
   ).SerializeToString())
 elif environment_urn == common_urns.environments.EXTERNAL.urn:
+  def _looks_like_json(environment_config):
+import re
+return re.match(r'\{.+\}', environment_config)
 
 Review comment:
   use `re.search()`.  also, I don't think this needs to be private, so remove 
the leading underscore.
   
   maybe add a comment like:  "we don't use json.loads to test validity because 
we don't want to propagate json syntax errors downstream to the runner"
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323564)
Time Spent: 40m  (was: 0.5h)

> Support passing in arbitrary KV pairs to sdk worker via external environment 
> config
> ---
>
> Key: BEAM-8351
> URL: https://issues.apache.org/jira/browse/BEAM-8351
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Wanqi Lyu
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Originally, the environment config for environment type of EXTERNAL only 
> support passing in an url for the external worker pool; We want to support 
> passing in arbitrary KV pairs to sdk worker via external environment config, 
> so that the when starting the sdk harness we could get the values from 
> `StartWorkerRequest.params`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8351?focusedWorklogId=323570&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323570
 ]

ASF GitHub Bot logged work on BEAM-8351:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:00
Start Date: 04/Oct/19 17:00
Worklog Time Spent: 10m 
  Work Description: violalyu commented on pull request #9730: [BEAM-8351] 
Support passing in arbitrary KV pairs to sdk worker via external environment 
config
URL: https://github.com/apache/beam/pull/9730#discussion_r331597120
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/portable_runner.py
 ##
 @@ -129,11 +129,25 @@ def _create_environment(options):
   env=(config.get('env') or '')
   ).SerializeToString())
 elif environment_urn == common_urns.environments.EXTERNAL.urn:
+  def _looks_like_json(environment_config):
+import re
+return re.match(r'\{.+\}', environment_config)
 
 Review comment:
   Thanks! For the `re.search()` part, do we want it to be not searching from 
the start? I was thinking that if it is valid json string it should start and 
end with '{}', maybe `re.match(r'\{.+\}$')` or `re.search(r'^\{.+\}$')`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323570)
Time Spent: 50m  (was: 40m)

> Support passing in arbitrary KV pairs to sdk worker via external environment 
> config
> ---
>
> Key: BEAM-8351
> URL: https://issues.apache.org/jira/browse/BEAM-8351
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Wanqi Lyu
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Originally, the environment config for environment type of EXTERNAL only 
> support passing in an url for the external worker pool; We want to support 
> passing in arbitrary KV pairs to sdk worker via external environment config, 
> so that the when starting the sdk harness we could get the values from 
> `StartWorkerRequest.params`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=323579&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323579
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:01
Start Date: 04/Oct/19 17:01
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9720: [BEAM-8335] Add 
initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r331594582
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -0,0 +1,123 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamRecord
+from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.utils import timestamp
+
+from google.protobuf import timestamp_pb2
+
+
+def to_timestamp(timestamp_secs):
+  """Converts seconds since epoch to an apache_beam.util.Timestamp.
+  """
+  return timestamp.Timestamp.of(timestamp_secs)
+
+def from_timestamp_proto(timestamp_proto):
+  return timestamp.Timestamp(seconds=timestamp_proto.seconds,
+ micros=timestamp_proto.nanos * 1000)
+
+def to_timestamp_usecs(ts):
+  """Converts a google.protobuf.Timestamp and
+ apache_beam.util.timestamp.Timestamp to seconds since epoch.
+  """
+  if isinstance(ts, timestamp_pb2.Timestamp):
+return (ts.seconds * 10**6) + (ts.nanos * 10**-3)
+  if isinstance(ts, timestamp.Timestamp):
+return ts.micros
+
+class StreamingCache(object):
+  """Abstraction that holds the logic for reading and writing to cache.
+  """
+  def __init__(self, readers):
+self._readers = readers
+
+  class Reader(object):
+"""Abstraction that reads from PCollection readers.
+
+This class is an Abstraction layer over multiple PCollection readers to be
+used for supplying the Interactive Service with TestStream events.
+
+This class is also responsible for holding the state of the clock, 
injecting
+clock advancement events, and watermark advancement events.
+"""
+def __init__(self, readers):
+  self._readers = [reader.read() for reader in readers]
 
 Review comment:
   Is `self._readers` actually `events` returned from passed in readers?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323579)
Time Spent: 1.5h  (was: 1h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=323575&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323575
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:01
Start Date: 04/Oct/19 17:01
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9720: [BEAM-8335] Add 
initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r331592783
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -0,0 +1,123 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamRecord
+from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.utils import timestamp
+
+from google.protobuf import timestamp_pb2
+
+
+def to_timestamp(timestamp_secs):
+  """Converts seconds since epoch to an apache_beam.util.Timestamp.
+  """
+  return timestamp.Timestamp.of(timestamp_secs)
+
+def from_timestamp_proto(timestamp_proto):
+  return timestamp.Timestamp(seconds=timestamp_proto.seconds,
+ micros=timestamp_proto.nanos * 1000)
+
+def to_timestamp_usecs(ts):
 
 Review comment:
   We have a very similar method in monitoring_infos.py. Should we move them to 
a common place?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323575)
Time Spent: 1h 10m  (was: 1h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=323571&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323571
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:01
Start Date: 04/Oct/19 17:01
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9720: [BEAM-8335] Add 
initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r331589935
 
 

 ##
 File path: model/fn-execution/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.fn_execution.v1;
+
+option go_package = "fnexecution_v1";
+option java_package = "org.apache.beam.model.fnexecution.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+
+service InteractiveService {
 
 Review comment:
   Do we need a Reset request?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323571)
Time Spent: 50m  (was: 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=323574&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323574
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:01
Start Date: 04/Oct/19 17:01
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9720: [BEAM-8335] Add 
initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r331594849
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -0,0 +1,123 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamRecord
+from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.utils import timestamp
+
+from google.protobuf import timestamp_pb2
+
+
+def to_timestamp(timestamp_secs):
+  """Converts seconds since epoch to an apache_beam.util.Timestamp.
+  """
+  return timestamp.Timestamp.of(timestamp_secs)
+
+def from_timestamp_proto(timestamp_proto):
+  return timestamp.Timestamp(seconds=timestamp_proto.seconds,
+ micros=timestamp_proto.nanos * 1000)
+
+def to_timestamp_usecs(ts):
+  """Converts a google.protobuf.Timestamp and
+ apache_beam.util.timestamp.Timestamp to seconds since epoch.
+  """
+  if isinstance(ts, timestamp_pb2.Timestamp):
+return (ts.seconds * 10**6) + (ts.nanos * 10**-3)
+  if isinstance(ts, timestamp.Timestamp):
+return ts.micros
+
+class StreamingCache(object):
+  """Abstraction that holds the logic for reading and writing to cache.
+  """
+  def __init__(self, readers):
+self._readers = readers
+
+  class Reader(object):
+"""Abstraction that reads from PCollection readers.
+
+This class is an Abstraction layer over multiple PCollection readers to be
+used for supplying the Interactive Service with TestStream events.
+
+This class is also responsible for holding the state of the clock, 
injecting
+clock advancement events, and watermark advancement events.
+"""
+def __init__(self, readers):
+  self._readers = [reader.read() for reader in readers]
+  self._watermark = timestamp.MIN_TIMESTAMP
 
 Review comment:
   Should the default for watermark and timestamp be reflect something from the 
`readers`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323574)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=323572&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323572
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:01
Start Date: 04/Oct/19 17:01
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9720: [BEAM-8335] Add 
initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r331596793
 
 

 ##
 File path: sdks/python/apache_beam/testing/interactive_stream.py
 ##
 @@ -0,0 +1,123 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import grpc
+import time
+
+from apache_beam.portability.api import beam_interactive_api_pb2
+from apache_beam.portability.api import beam_interactive_api_pb2_grpc
+from apache_beam.portability.api.beam_interactive_api_pb2_grpc import 
InteractiveServiceServicer
+from concurrent.futures import ThreadPoolExecutor
+
+
+def to_api_state(state):
+  if state == 'STOPPED':
 
 Review comment:
   Suggest a dict from strings -> proto states and return directly from 
dictionary.get(). That way if a bogus state is passed, this can throw an error 
instead of defaulting to running.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323572)
Time Spent: 1h  (was: 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=323576&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323576
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:01
Start Date: 04/Oct/19 17:01
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9720: [BEAM-8335] Add 
initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r331592276
 
 

 ##
 File path: model/fn-execution/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,106 @@
+/*
 
 Review comment:
   I am not sure if mode/fn-execution is the right place. Maybe 
model/interactive.
   
   Also this file is very streaming related not generically usable for 
controlling all interactive pipelines. If that is accurate, the name could 
reflect that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323576)
Time Spent: 1h 10m  (was: 1h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=323577&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323577
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:01
Start Date: 04/Oct/19 17:01
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9720: [BEAM-8335] Add 
initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r331596932
 
 

 ##
 File path: sdks/python/apache_beam/testing/interactive_stream.py
 ##
 @@ -0,0 +1,123 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import grpc
+import time
+
+from apache_beam.portability.api import beam_interactive_api_pb2
+from apache_beam.portability.api import beam_interactive_api_pb2_grpc
+from apache_beam.portability.api.beam_interactive_api_pb2_grpc import 
InteractiveServiceServicer
+from concurrent.futures import ThreadPoolExecutor
+
+
+def to_api_state(state):
+  if state == 'STOPPED':
+return beam_interactive_api_pb2.StatusResponse.STOPPED
+  if state == 'PAUSED':
+return beam_interactive_api_pb2.StatusResponse.PAUSED
+  return beam_interactive_api_pb2.StatusResponse.RUNNING
+
+class InteractiveStreamController(InteractiveServiceServicer):
+  def __init__(self, endpoint, streaming_cache):
+self._endpoint = endpoint
+self._server = grpc.server(ThreadPoolExecutor(max_workers=10))
 
 Review comment:
   do we need max 10 workers for this controller?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323577)
Time Spent: 1h 10m  (was: 1h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=323578&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323578
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:01
Start Date: 04/Oct/19 17:01
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9720: [BEAM-8335] Add 
initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r331591095
 
 

 ##
 File path: sdks/python/apache_beam/testing/interactive_stream.py
 ##
 @@ -0,0 +1,123 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import grpc
+import time
+
+from apache_beam.portability.api import beam_interactive_api_pb2
+from apache_beam.portability.api import beam_interactive_api_pb2_grpc
+from apache_beam.portability.api.beam_interactive_api_pb2_grpc import 
InteractiveServiceServicer
+from concurrent.futures import ThreadPoolExecutor
+
+
+def to_api_state(state):
+  if state == 'STOPPED':
+return beam_interactive_api_pb2.StatusResponse.STOPPED
+  if state == 'PAUSED':
+return beam_interactive_api_pb2.StatusResponse.PAUSED
+  return beam_interactive_api_pb2.StatusResponse.RUNNING
+
+class InteractiveStreamController(InteractiveServiceServicer):
+  def __init__(self, endpoint, streaming_cache):
+self._endpoint = endpoint
+self._server = grpc.server(ThreadPoolExecutor(max_workers=10))
+beam_interactive_api_pb2_grpc.add_InteractiveServiceServicer_to_server(
+self, self._server)
+self._server.add_insecure_port(self._endpoint)
+self._server.start()
+
+self._streaming_cache = streaming_cache
+self._state = 'STOPPED'
+self._playback_speed = 1.0
+
+  def Start(self, request, context):
+"""Requests that the Service starts emitting elements.
+"""
+
+self._next_state('RUNNING')
+self._playback_speed = request.playback_speed or 1.0
+self._playback_speed = max(min(self._playback_speed, 100.0), 0.001)
+return beam_interactive_api_pb2.StartResponse()
+
+  def Stop(self, request, context):
+"""Requests that the Service stop emitting elements.
+"""
+self._next_state('STOPPED')
+return beam_interactive_api_pb2.StartResponse()
+
+  def Pause(self, request, context):
+"""Requests that the Service pause emitting elements.
+"""
+self._next_state('PAUSED')
+return beam_interactive_api_pb2.PauseResponse()
+
+  def Step(self, request, context):
+"""Requests that the Service emit a single element from each cached source.
+"""
+self._next_state('STEP')
+return beam_interactive_api_pb2.StepResponse()
+
+  def Status(self, request, context):
+"""Returns the status of the service.
+"""
+resp = beam_interactive_api_pb2.StatusResponse()
+resp.stream_time.GetCurrentTime()
+resp.state = to_api_state(self._state)
+return resp
+
+  def _reset_state(self):
+self._reader = None
+self._playback_speed = 1.0
+self._state = 'STOPPED'
+
+  def _next_state(self, state):
+if not self._state or self._state == 'STOPPED':
 
 Review comment:
   When would `not self._state` be true. It seems to be alway set to something.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323578)
Time Spent: 1h 20m  (was: 1h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=323573&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323573
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:01
Start Date: 04/Oct/19 17:01
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9720: [BEAM-8335] Add 
initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r331596241
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -0,0 +1,123 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamRecord
+from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.utils import timestamp
+
+from google.protobuf import timestamp_pb2
+
+
+def to_timestamp(timestamp_secs):
+  """Converts seconds since epoch to an apache_beam.util.Timestamp.
+  """
+  return timestamp.Timestamp.of(timestamp_secs)
+
+def from_timestamp_proto(timestamp_proto):
+  return timestamp.Timestamp(seconds=timestamp_proto.seconds,
+ micros=timestamp_proto.nanos * 1000)
+
+def to_timestamp_usecs(ts):
+  """Converts a google.protobuf.Timestamp and
+ apache_beam.util.timestamp.Timestamp to seconds since epoch.
+  """
+  if isinstance(ts, timestamp_pb2.Timestamp):
+return (ts.seconds * 10**6) + (ts.nanos * 10**-3)
+  if isinstance(ts, timestamp.Timestamp):
+return ts.micros
+
+class StreamingCache(object):
+  """Abstraction that holds the logic for reading and writing to cache.
+  """
+  def __init__(self, readers):
+self._readers = readers
+
+  class Reader(object):
+"""Abstraction that reads from PCollection readers.
+
+This class is an Abstraction layer over multiple PCollection readers to be
+used for supplying the Interactive Service with TestStream events.
+
+This class is also responsible for holding the state of the clock, 
injecting
+clock advancement events, and watermark advancement events.
+"""
+def __init__(self, readers):
+  self._readers = [reader.read() for reader in readers]
+  self._watermark = timestamp.MIN_TIMESTAMP
+  self._timestamp = timestamp.MIN_TIMESTAMP
+
+def read(self):
+  """Reads records from PCollection readers.
+  """
+  records = []
+  for r in self._readers:
+try:
+  record = InteractiveStreamRecord()
+  record.ParseFromString(next(r))
+  records.append(record)
+except StopIteration:
+  pass
+
+  events = []
+  if not records:
+self.advance_watermark(timestamp.MAX_TIMESTAMP, events)
+
+  records.sort(key=lambda x: x.processing_time)
+  for r in records:
+self.advance_processing_time(
+from_timestamp_proto(r.processing_time), events)
+self.advance_watermark(from_timestamp_proto(r.watermark), events)
+
+events.append(TestStreamPayload.Event(
+element_event=TestStreamPayload.Event.AddElements(
+elements=[r.element])))
+  return events
+
+def advance_processing_time(self, processing_time, events):
+  """Advances the internal clock state and injects an AdvanceProcessingTime
+ event.
+  """
+  if self._timestamp != processing_time:
+duration = timestamp.Duration(
+micros=processing_time.micros - self._timestamp.micros)
+if self._timestamp == timestamp.MIN_TIMESTAMP:
+  duration = timestamp.Duration(micros=processing_time.micros)
 
 Review comment:
   For my learning, is this special case required for MIN_TIMESTAMP? Would not 
MIN_TIMESTAMP also have some non-zero self._timestamp.micros that needs to be 
accounted for duration calculation?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about thi

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=323580&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323580
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:04
Start Date: 04/Oct/19 17:04
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9664: [BEAM-7389] Created 
elementwise for consistency with docs
URL: https://github.com/apache/beam/pull/9664#issuecomment-538481052
 
 
   I do not see any deleted files or doc changes. Am I missing something?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323580)
Time Spent: 63h  (was: 62h 50m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 63h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=323581&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323581
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:05
Start Date: 04/Oct/19 17:05
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9669: [BEAM-7389] 
Update include buttons to support multiple languages
URL: https://github.com/apache/beam/pull/9669
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323581)
Time Spent: 63h 10m  (was: 63h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 63h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=323584&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323584
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:05
Start Date: 04/Oct/19 17:05
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9692: [BEAM-7389] Update 
docs with matching code files
URL: https://github.com/apache/beam/pull/9692#issuecomment-538481357
 
 
   Please rebase this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323584)
Time Spent: 63h 20m  (was: 63h 10m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 63h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config

2019-10-04 Thread Thomas Weise (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise reassigned BEAM-8351:
--

Assignee: Wanqi Lyu

> Support passing in arbitrary KV pairs to sdk worker via external environment 
> config
> ---
>
> Key: BEAM-8351
> URL: https://issues.apache.org/jira/browse/BEAM-8351
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Wanqi Lyu
>Assignee: Wanqi Lyu
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Originally, the environment config for environment type of EXTERNAL only 
> support passing in an url for the external worker pool; We want to support 
> passing in arbitrary KV pairs to sdk worker via external environment config, 
> so that the when starting the sdk harness we could get the values from 
> `StartWorkerRequest.params`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=323586&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323586
 ]

ASF GitHub Bot logged work on BEAM-5707:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:15
Start Date: 04/Oct/19 17:15
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9728: [BEAM-5707] 
Modify Flink streaming impulse function to include a counter
URL: https://github.com/apache/beam/pull/9728
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323586)
Time Spent: 8h  (was: 7h 50m)

> Add a portable Flink streaming synthetic source for testing
> ---
>
> Key: BEAM-5707
> URL: https://issues.apache.org/jira/browse/BEAM-5707
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Micah Wylde
>Assignee: Micah Wylde
>Priority: Minor
> Fix For: 2.9.0
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Currently there are no built-in streaming sources for portable pipelines. 
> This makes it hard to test streaming functionality in the Python SDK.
> It would be very useful to add a periodic impulse source that (with some 
> configurable frequency) outputs an empty byte array, which can then be 
> transformed as desired inside the python pipeline. More context in this 
> [mailing list 
> discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8351?focusedWorklogId=323587&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323587
 ]

ASF GitHub Bot logged work on BEAM-8351:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:16
Start Date: 04/Oct/19 17:16
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9730: [BEAM-8351] 
Support passing in arbitrary KV pairs to sdk worker via external environment 
config
URL: https://github.com/apache/beam/pull/9730#discussion_r331603033
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/portable_runner.py
 ##
 @@ -129,11 +129,25 @@ def _create_environment(options):
   env=(config.get('env') or '')
   ).SerializeToString())
 elif environment_urn == common_urns.environments.EXTERNAL.urn:
+  def _looks_like_json(environment_config):
+import re
+return re.match(r'\{.+\}', environment_config)
 
 Review comment:
   There's also whitespace to consider.  So if we're using `re.match` we have 
to do `re.match(r'\s*\{.*\}\s*$')`.  
   
   We just need a simple heuristic that tells us that it looks json and not a 
url.  The most important thing is that we don't end up with a regex that 
incorrectly returns False for something that is *actually* json, so I suggested 
the `re.search` option because I think it's sufficient at detecting something 
that's *not a url* and thus probably json.  That said, I think the `re.match` 
regex above looks pretty safe.  Obviously, we're only talking about json 
objects (i.e. dict) and not arrays and other scalars.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323587)
Time Spent: 1h  (was: 50m)

> Support passing in arbitrary KV pairs to sdk worker via external environment 
> config
> ---
>
> Key: BEAM-8351
> URL: https://issues.apache.org/jira/browse/BEAM-8351
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Wanqi Lyu
>Assignee: Wanqi Lyu
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Originally, the environment config for environment type of EXTERNAL only 
> support passing in an url for the external worker pool; We want to support 
> passing in arbitrary KV pairs to sdk worker via external environment config, 
> so that the when starting the sdk harness we could get the values from 
> `StartWorkerRequest.params`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=323588&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323588
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:18
Start Date: 04/Oct/19 17:18
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #9664: [BEAM-7389] 
Created elementwise for consistency with docs
URL: https://github.com/apache/beam/pull/9664#issuecomment-538486069
 
 
   That would break the staged versions and the notebooks would have to be 
generated in a second pass anyways. I'm doing that here #9692, that way we 
don't temporarily break things in the website.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323588)
Time Spent: 63.5h  (was: 63h 20m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 63.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8351?focusedWorklogId=323589&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323589
 ]

ASF GitHub Bot logged work on BEAM-8351:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:19
Start Date: 04/Oct/19 17:19
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9730: [BEAM-8351] 
Support passing in arbitrary KV pairs to sdk worker via external environment 
config
URL: https://github.com/apache/beam/pull/9730#discussion_r331604295
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/portable_runner.py
 ##
 @@ -129,11 +129,25 @@ def _create_environment(options):
   env=(config.get('env') or '')
   ).SerializeToString())
 elif environment_urn == common_urns.environments.EXTERNAL.urn:
+  def _looks_like_json(environment_config):
+import re
+return re.match(r'\{.+\}', environment_config)
 
 Review comment:
   Just to reiterate, we want json syntax errors to occur here, at submission 
time, and not later on, so we want to answer "did the user attempt to pass a 
json map or a url?"  So I was even considering `return '{' in config or '}' in 
config`, since curly braces should not be in a url.  
   
   I'll defer the final answer on this to the reviewer.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323589)
Time Spent: 1h 10m  (was: 1h)

> Support passing in arbitrary KV pairs to sdk worker via external environment 
> config
> ---
>
> Key: BEAM-8351
> URL: https://issues.apache.org/jira/browse/BEAM-8351
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Wanqi Lyu
>Assignee: Wanqi Lyu
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Originally, the environment config for environment type of EXTERNAL only 
> support passing in an url for the external worker pool; We want to support 
> passing in arbitrary KV pairs to sdk worker via external environment config, 
> so that the when starting the sdk harness we could get the values from 
> `StartWorkerRequest.params`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=323592&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323592
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:24
Start Date: 04/Oct/19 17:24
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-538488002
 
 
   Run Python2_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323592)
Time Spent: 1.5h  (was: 1h 20m)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=323593&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323593
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:24
Start Date: 04/Oct/19 17:24
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-538488071
 
 
   Python2_PVR_Flink seems incredibly flaky
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323593)
Time Spent: 1h 40m  (was: 1.5h)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=323602&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323602
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:38
Start Date: 04/Oct/19 17:38
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-538169256
 
 
   Here's a breakdown of the changes required to get to pylint 2.4:
   
   - fix a bunch of warnings about deprecated methods.  mostly `logger.warn` 
and various unittest methods
   - update the names of a few error codes: `disable=unused-import` and 
`possibly-unused-variable`
   - ignore a bunch of newly introduced style warnings that did not seem 
important
   - run the lint using python-3.7: this ensures that it can run on test files 
that only work on python-37 due to syntax features
   - merge the lint tests into one test:
 - `run_pylint_2to3.sh` was just testing futurization.  seems fine to do 
this all the time now that our code is python2 compliant
 - there was a "mini" test just for python3-compatibility.  not needed 
anymore now that everything is running on python3
 - stop running `pycodestyle`: it's run as part of `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323602)
Time Spent: 1h 50m  (was: 1h 40m)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8303) Filesystems not properly registered using FileIO.write() on FlinkRunner

2019-10-04 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8303:
--
Summary: Filesystems not properly registered using FileIO.write() on 
FlinkRunner  (was: Filesystems not properly registered using FileIO.write())

> Filesystems not properly registered using FileIO.write() on FlinkRunner
> ---
>
> Key: BEAM-8303
> URL: https://issues.apache.org/jira/browse/BEAM-8303
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.15.0
>Reporter: Preston Koprivica
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> I’m getting the following error when attempting to use the FileIO apis 
> (beam-2.15.0) and integrating with AWS S3.  I have setup the PipelineOptions 
> with all the relevant AWS options, so the filesystem registry **should** be 
> properly seeded by the time the graph is compiled and executed:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
>     at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
>     at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
>     at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
>     at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
>     at 
> org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
>     at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
>     at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> For reference, the write code resembles this:
> {code:java}
>  FileIO.Write write = FileIO.write()
>     .via(ParquetIO.sink(schema))
>     .to(options.getOutputDir()). // will be something like: 
> s3:///
>     .withSuffix(".parquet");
> records.apply(String.format("Write(%s)", options.getOutputDir()), 
> write);{code}
> The issue does not appear to be related to ParquetIO.sink().  I am able to 
> reliably reproduce the issue using JSON formatted records and TextIO.sink(), 
> as well.  Moreover, AvroIO is affected if withWindowedWrites() option is 
> added.
> Just trying some different knobs, I went ahead and set the following option:
> {code:java}
> write = write.withNoSpilling();{code}
> This actually seemed to fix the issue, only to have it reemerge as I scaled 
> up the data set size.  The stack trace, while very similar, reads:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:82)
>     at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:36)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(Win

[jira] [Updated] (BEAM-8303) Filesystems not properly registered using FileIO.write()

2019-10-04 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8303:
--
Component/s: (was: sdk-java-core)

> Filesystems not properly registered using FileIO.write()
> 
>
> Key: BEAM-8303
> URL: https://issues.apache.org/jira/browse/BEAM-8303
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.15.0
>Reporter: Preston Koprivica
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> I’m getting the following error when attempting to use the FileIO apis 
> (beam-2.15.0) and integrating with AWS S3.  I have setup the PipelineOptions 
> with all the relevant AWS options, so the filesystem registry **should** be 
> properly seeded by the time the graph is compiled and executed:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
>     at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
>     at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
>     at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
>     at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
>     at 
> org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
>     at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
>     at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> For reference, the write code resembles this:
> {code:java}
>  FileIO.Write write = FileIO.write()
>     .via(ParquetIO.sink(schema))
>     .to(options.getOutputDir()). // will be something like: 
> s3:///
>     .withSuffix(".parquet");
> records.apply(String.format("Write(%s)", options.getOutputDir()), 
> write);{code}
> The issue does not appear to be related to ParquetIO.sink().  I am able to 
> reliably reproduce the issue using JSON formatted records and TextIO.sink(), 
> as well.  Moreover, AvroIO is affected if withWindowedWrites() option is 
> added.
> Just trying some different knobs, I went ahead and set the following option:
> {code:java}
> write = write.withNoSpilling();{code}
> This actually seemed to fix the issue, only to have it reemerge as I scaled 
> up the data set size.  The stack trace, while very similar, reads:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:82)
>     at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:36)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 
> o

[jira] [Updated] (BEAM-8303) Filesystems not properly registered using FileIO.write() on FlinkRunner

2019-10-04 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8303:
--
Component/s: sdk-java-core

> Filesystems not properly registered using FileIO.write() on FlinkRunner
> ---
>
> Key: BEAM-8303
> URL: https://issues.apache.org/jira/browse/BEAM-8303
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Preston Koprivica
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> I’m getting the following error when attempting to use the FileIO apis 
> (beam-2.15.0) and integrating with AWS S3.  I have setup the PipelineOptions 
> with all the relevant AWS options, so the filesystem registry **should** be 
> properly seeded by the time the graph is compiled and executed:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
>     at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
>     at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
>     at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
>     at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
>     at 
> org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
>     at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
>     at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> For reference, the write code resembles this:
> {code:java}
>  FileIO.Write write = FileIO.write()
>     .via(ParquetIO.sink(schema))
>     .to(options.getOutputDir()). // will be something like: 
> s3:///
>     .withSuffix(".parquet");
> records.apply(String.format("Write(%s)", options.getOutputDir()), 
> write);{code}
> The issue does not appear to be related to ParquetIO.sink().  I am able to 
> reliably reproduce the issue using JSON formatted records and TextIO.sink(), 
> as well.  Moreover, AvroIO is affected if withWindowedWrites() option is 
> added.
> Just trying some different knobs, I went ahead and set the following option:
> {code:java}
> write = write.withNoSpilling();{code}
> This actually seemed to fix the issue, only to have it reemerge as I scaled 
> up the data set size.  The stack trace, while very similar, reads:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:82)
>     at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:36)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(Win

[jira] [Updated] (BEAM-8303) Filesystems not properly registered using FileIO.write()

2019-10-04 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8303:
--
Component/s: runner-flink

> Filesystems not properly registered using FileIO.write()
> 
>
> Key: BEAM-8303
> URL: https://issues.apache.org/jira/browse/BEAM-8303
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Preston Koprivica
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> I’m getting the following error when attempting to use the FileIO apis 
> (beam-2.15.0) and integrating with AWS S3.  I have setup the PipelineOptions 
> with all the relevant AWS options, so the filesystem registry **should** be 
> properly seeded by the time the graph is compiled and executed:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
>     at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
>     at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
>     at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
>     at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
>     at 
> org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
>     at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
>     at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> For reference, the write code resembles this:
> {code:java}
>  FileIO.Write write = FileIO.write()
>     .via(ParquetIO.sink(schema))
>     .to(options.getOutputDir()). // will be something like: 
> s3:///
>     .withSuffix(".parquet");
> records.apply(String.format("Write(%s)", options.getOutputDir()), 
> write);{code}
> The issue does not appear to be related to ParquetIO.sink().  I am able to 
> reliably reproduce the issue using JSON formatted records and TextIO.sink(), 
> as well.  Moreover, AvroIO is affected if withWindowedWrites() option is 
> added.
> Just trying some different knobs, I went ahead and set the following option:
> {code:java}
> write = write.withNoSpilling();{code}
> This actually seemed to fix the issue, only to have it reemerge as I scaled 
> up the data set size.  The stack trace, while very similar, reads:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:82)
>     at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:36)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 

[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=323607&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323607
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:41
Start Date: 04/Oct/19 17:41
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9731: [BEAM-8343] 
Added nessesary methods to BeamSqlTable to enable support for predicate/project 
push-down
URL: https://github.com/apache/beam/pull/9731
 
 
   - Added methods needed for predicate/project push-down to BeamSqlTable 
interface.
   - Create a sql.meta.BeamSqlTableFilter interface and a default 
implementation for it sql.meta.DefaultTableFilter.
   
   Based on top of #9718 
   
   Design doc 
[link](https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing).
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Pytho

[jira] [Work logged] (BEAM-8092) Switch to java Optional in DirectRunner

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8092?focusedWorklogId=323605&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323605
 ]

ASF GitHub Bot logged work on BEAM-8092:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:41
Start Date: 04/Oct/19 17:41
Worklog Time Spent: 10m 
  Work Description: je-ik commented on issue #9431: [BEAM-8092] change 
guava's Optional to java.util in DirectRunner
URL: https://github.com/apache/beam/pull/9431#issuecomment-538494089
 
 
   @apilloud thanks
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323605)
Time Spent: 50m  (was: 40m)

> Switch to java Optional in DirectRunner
> ---
>
> Key: BEAM-8092
> URL: https://issues.apache.org/jira/browse/BEAM-8092
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Affects Versions: 2.15.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8092) Switch to java Optional in DirectRunner

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8092?focusedWorklogId=323608&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323608
 ]

ASF GitHub Bot logged work on BEAM-8092:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:41
Start Date: 04/Oct/19 17:41
Worklog Time Spent: 10m 
  Work Description: je-ik commented on pull request #9431: [BEAM-8092] 
change guava's Optional to java.util in DirectRunner
URL: https://github.com/apache/beam/pull/9431
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323608)
Time Spent: 1h  (was: 50m)

> Switch to java Optional in DirectRunner
> ---
>
> Key: BEAM-8092
> URL: https://issues.apache.org/jira/browse/BEAM-8092
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Affects Versions: 2.15.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=323609&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323609
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:41
Start Date: 04/Oct/19 17:41
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9731: [BEAM-8343] Added 
nessesary methods to BeamSqlTable to enable support for predicate/project 
push-down
URL: https://github.com/apache/beam/pull/9731#issuecomment-538494247
 
 
   R: @apilloud 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323609)
Time Spent: 1h  (was: 50m)

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
>  - Boolean supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7765) Add test for snippet accessing_valueprovider_info_after_run

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7765?focusedWorklogId=323606&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323606
 ]

ASF GitHub Bot logged work on BEAM-7765:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:41
Start Date: 04/Oct/19 17:41
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9685: [BEAM-7765] - Add 
test for snippet accessing_valueprovider_info_after_run
URL: https://github.com/apache/beam/pull/9685#issuecomment-538494124
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323606)
Time Spent: 1.5h  (was: 1h 20m)

> Add test for snippet accessing_valueprovider_info_after_run
> ---
>
> Key: BEAM-7765
> URL: https://issues.apache.org/jira/browse/BEAM-7765
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: John Patoch
>Priority: Major
>  Labels: easy
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This snippet needs a unit test.
> It has bugs. For example:
> - apache_beam.utils.value_provider doesn't exist
> - beam.combiners.Sum doesn't exist
> - unused import of: WriteToText
> cc: [~pabloem]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8303) Filesystems not properly registered using FileIO.write() on FlinkRunner

2019-10-04 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944708#comment-16944708
 ] 

Kenneth Knowles commented on BEAM-8303:
---

Is the pain mild, given the presence of a workaround?

> Filesystems not properly registered using FileIO.write() on FlinkRunner
> ---
>
> Key: BEAM-8303
> URL: https://issues.apache.org/jira/browse/BEAM-8303
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Preston Koprivica
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> I’m getting the following error when attempting to use the FileIO apis 
> (beam-2.15.0) and integrating with AWS S3.  I have setup the PipelineOptions 
> with all the relevant AWS options, so the filesystem registry **should** be 
> properly seeded by the time the graph is compiled and executed:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
>     at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
>     at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
>     at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
>     at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
>     at 
> org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
>     at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
>     at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> For reference, the write code resembles this:
> {code:java}
>  FileIO.Write write = FileIO.write()
>     .via(ParquetIO.sink(schema))
>     .to(options.getOutputDir()). // will be something like: 
> s3:///
>     .withSuffix(".parquet");
> records.apply(String.format("Write(%s)", options.getOutputDir()), 
> write);{code}
> The issue does not appear to be related to ParquetIO.sink().  I am able to 
> reliably reproduce the issue using JSON formatted records and TextIO.sink(), 
> as well.  Moreover, AvroIO is affected if withWindowedWrites() option is 
> added.
> Just trying some different knobs, I went ahead and set the following option:
> {code:java}
> write = write.withNoSpilling();{code}
> This actually seemed to fix the issue, only to have it reemerge as I scaled 
> up the data set size.  The stack trace, while very similar, reads:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:82)
>     at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:36)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>  

[jira] [Resolved] (BEAM-8092) Switch to java Optional in DirectRunner

2019-10-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Lukavský resolved BEAM-8092.

Resolution: Fixed

> Switch to java Optional in DirectRunner
> ---
>
> Key: BEAM-8092
> URL: https://issues.apache.org/jira/browse/BEAM-8092
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Affects Versions: 2.15.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Minor
> Fix For: 2.17.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8092) Switch to java Optional in DirectRunner

2019-10-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Lukavský updated BEAM-8092:
---
Fix Version/s: 2.17.0

> Switch to java Optional in DirectRunner
> ---
>
> Key: BEAM-8092
> URL: https://issues.apache.org/jira/browse/BEAM-8092
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Affects Versions: 2.15.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Minor
> Fix For: 2.17.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=323611&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323611
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:45
Start Date: 04/Oct/19 17:45
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9718: [BEAM-8343] Moved 
BeamTable classes to a more appropriate location
URL: https://github.com/apache/beam/pull/9718#issuecomment-538495582
 
 
   > I think this should be broken up so the refactoring is separate from the 
new features.
   > 
   > PR 1:
   > 
   > * Moved sql.BeamSqlTable → sql.meta.BeamSqlTable.
   > * Renamed sql.imp.schema.BaseBeamTable → 
sql.imp.schema.SchemaBaseBeamTable.
   > * Created a sql.meta.BaseBeamTable to be a default implementation of 
BeamSqlTable interface.
   > * Updated existing IO Table classes to extend sql.meta.BaseBeamTable.
   > 
   > PR 2:
   > 
   > * Added methods needed for predicate/project push-down to BeamSqlTable 
interface.
   > * Create a sql.meta.BeamSqlTableFilter interface and a default 
implementation for it sql.meta.DefaultTableFilter.
   
   Makes sense, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323611)
Time Spent: 1h 10m  (was: 1h)

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
>  - Boolean supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-8092) Switch to java Optional in DirectRunner

2019-10-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Lukavský closed BEAM-8092.
--

> Switch to java Optional in DirectRunner
> ---
>
> Key: BEAM-8092
> URL: https://issues.apache.org/jira/browse/BEAM-8092
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Affects Versions: 2.15.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Minor
> Fix For: 2.17.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=323614&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323614
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:47
Start Date: 04/Oct/19 17:47
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9718: [BEAM-8343] 
Moved BeamTable classes to a more appropriate location
URL: https://github.com/apache/beam/pull/9718#discussion_r331615203
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BaseBeamTable.java
 ##
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta;
+
+import java.util.List;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+
+/** Basic implementation of {@link BeamSqlTable} methods used by predicate and 
filter push-down. */
+public abstract class BaseBeamTable implements BeamSqlTable {
+
+  @Override
+  public PCollection buildIOReader(
+  PBegin begin, BeamSqlTableFilter filters, List fieldNames) {
+return buildIOReader(begin);
 
 Review comment:
   Modified default implementation for this method to throw an exception in PR2.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323614)
Time Spent: 1h 20m  (was: 1h 10m)

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
>  - Boolean supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7765) Add test for snippet accessing_valueprovider_info_after_run

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7765?focusedWorklogId=323617&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323617
 ]

ASF GitHub Bot logged work on BEAM-7765:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:48
Start Date: 04/Oct/19 17:48
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9685: [BEAM-7765] - 
Add test for snippet accessing_valueprovider_info_after_run
URL: https://github.com/apache/beam/pull/9685#discussion_r331615148
 
 

 ##
 File path: sdks/python/apache_beam/examples/snippets/snippets_test.py
 ##
 @@ -1266,6 +1266,59 @@ def expand(self, pcoll):
   lengths = p | beam.Create(["a", "ab", "abc"]) | ComputeWordLengths()
   assert_that(lengths, equal_to([1, 2, 3]))
 
+class AccessingValueProviderInfoAfterRunTest(unittest.TestCase):
+  """Tests for accessing value provider info after run."""
+
+  def test_accessing_valueprovider_info_after_run(self):
+# [START AccessingValueProviderInfoAfterRunSnip1]
+
+class MyOptions(PipelineOptions):
+  @classmethod
+  def _add_argparse_args(cls, parser):
+# Use add_value_provider_argument for arguments to be templatable
+# Use add_argument as usual for non-templatable arguments
+parser.add_value_provider_argument('--string_value', type=str,
+   default='the quick brown fox jumps 
over the lazy dog')
+
+class LogValueProvidersFn(beam.DoFn):
+  def __init__(self, string_vp):
+self.string_vp = string_vp
+
+  # Define the DoFn that logs the ValueProvider value.
+  # The DoFn is called when creating the pipeline branch.
+  # This example logs the ValueProvider value, but
+  # you could store it by pushing it to an external database.
+  def process(self, an_int, **kwargs):
 
 Review comment:
   You don't need the `kwargs`.
   
   ```suggestion
 def process(self, an_int):
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323617)
Time Spent: 1h 40m  (was: 1.5h)

> Add test for snippet accessing_valueprovider_info_after_run
> ---
>
> Key: BEAM-7765
> URL: https://issues.apache.org/jira/browse/BEAM-7765
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: John Patoch
>Priority: Major
>  Labels: easy
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This snippet needs a unit test.
> It has bugs. For example:
> - apache_beam.utils.value_provider doesn't exist
> - beam.combiners.Sum doesn't exist
> - unused import of: WriteToText
> cc: [~pabloem]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7765) Add test for snippet accessing_valueprovider_info_after_run

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7765?focusedWorklogId=323618&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323618
 ]

ASF GitHub Bot logged work on BEAM-7765:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:48
Start Date: 04/Oct/19 17:48
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9685: [BEAM-7765] - 
Add test for snippet accessing_valueprovider_info_after_run
URL: https://github.com/apache/beam/pull/9685#discussion_r331615477
 
 

 ##
 File path: sdks/python/apache_beam/examples/snippets/snippets_test.py
 ##
 @@ -1266,6 +1266,59 @@ def expand(self, pcoll):
   lengths = p | beam.Create(["a", "ab", "abc"]) | ComputeWordLengths()
   assert_that(lengths, equal_to([1, 2, 3]))
 
+class AccessingValueProviderInfoAfterRunTest(unittest.TestCase):
+  """Tests for accessing value provider info after run."""
+
+  def test_accessing_valueprovider_info_after_run(self):
+# [START AccessingValueProviderInfoAfterRunSnip1]
+
+class MyOptions(PipelineOptions):
+  @classmethod
+  def _add_argparse_args(cls, parser):
+# Use add_value_provider_argument for arguments to be templatable
+# Use add_argument as usual for non-templatable arguments
+parser.add_value_provider_argument('--string_value', type=str,
+   default='the quick brown fox jumps 
over the lazy dog')
+
+class LogValueProvidersFn(beam.DoFn):
+  def __init__(self, string_vp):
+self.string_vp = string_vp
+
+  # Define the DoFn that logs the ValueProvider value.
+  # The DoFn is called when creating the pipeline branch.
+  # This example logs the ValueProvider value, but
+  # you could store it by pushing it to an external database.
+  def process(self, an_int, **kwargs):
+logging.info('The string_value is %s' % self.string_vp.get())
+
+yield self.string_vp.get()
+
+pipeline_options = PipelineOptions()
+
+my_options = pipeline_options.view_as(MyOptions)
+
+# Create pipeline.
+with beam.Pipeline(options=my_options) as p:
+  # Add a branch for logging the ValueProvider value.
+  vp_output = (p
+   | beam.Create([None])
+   | 'LogValueProvider' >> beam.ParDo(
+LogValueProvidersFn(my_options.string_value)))
+
+  # Test value provider argument : is equal to given string 'the quick 
brown fox jumps over the lazy dog'
+  assert_that(vp_output, equal_to(['the quick brown fox jumps over the 
lazy dog']),
 
 Review comment:
   If you move this to the end, you can end the snippet before the assertion
   ```
   # [END AccessingValueProviderInfoAfterRunSnip1]
   
   assert_that(main_pipeline_output, equal_to([6]), label='assert_main_output')
   assert_that(vp_output, equal_to(['the quick brown fox jumps over the lazy 
dog']),
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323618)
Time Spent: 1h 40m  (was: 1.5h)

> Add test for snippet accessing_valueprovider_info_after_run
> ---
>
> Key: BEAM-7765
> URL: https://issues.apache.org/jira/browse/BEAM-7765
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: John Patoch
>Priority: Major
>  Labels: easy
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This snippet needs a unit test.
> It has bugs. For example:
> - apache_beam.utils.value_provider doesn't exist
> - beam.combiners.Sum doesn't exist
> - unused import of: WriteToText
> cc: [~pabloem]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7765) Add test for snippet accessing_valueprovider_info_after_run

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7765?focusedWorklogId=323619&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323619
 ]

ASF GitHub Bot logged work on BEAM-7765:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:48
Start Date: 04/Oct/19 17:48
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9685: [BEAM-7765] - 
Add test for snippet accessing_valueprovider_info_after_run
URL: https://github.com/apache/beam/pull/9685#discussion_r331614775
 
 

 ##
 File path: sdks/python/apache_beam/examples/snippets/snippets.py
 ##
 @@ -1394,28 +1397,24 @@ def __init__(self, string_vp):
 # The DoFn is called when creating the pipeline branch.
 # This example logs the ValueProvider value, but
 # you could store it by pushing it to an external database.
-def process(self, an_int):
+def process(self, an_int, **kwargs):
   logging.info('The string_value is %s' % self.string_vp.get())
-  # Another option (where you don't need to pass the value at all) is:
-  logging.info('The string value is %s' %
-   RuntimeValueProvider.get_value('string_value', str, ''))
 
   pipeline_options = PipelineOptions()
-  # Create pipeline.
-  p = beam.Pipeline(options=pipeline_options)
 
   my_options = pipeline_options.view_as(MyOptions)
-  # Add a branch for logging the ValueProvider value.
-  _ = (p
-   | beam.Create([None])
-   | 'LogValueProvs' >> beam.ParDo(
-   LogValueProvidersFn(my_options.string_value)))
-
-  # The main pipeline.
-  result_pc = (p
-   | "main_pc" >> beam.Create([1, 2, 3])
-   | beam.combiners.Sum.Globally())
 
-  p.run().wait_until_finish()
-
-  # [END AccessingValueProviderInfoAfterRunSnip1]
+  # Create pipeline.
+  with beam.Pipeline(options=my_options) as p:
+# Add a branch for logging the ValueProvider value.
+_ = (p
+ | beam.Create([None])
+ | 'LogValueProvider' >> beam.ParDo(
+  LogValueProvidersFn(my_options.string_value)))
+
+# The main pipeline.
 
 Review comment:
   Does it make sense to completely move the snippet to the `snippets_test.py` 
file, and remove it from snippets.py?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323619)
Time Spent: 1h 40m  (was: 1.5h)

> Add test for snippet accessing_valueprovider_info_after_run
> ---
>
> Key: BEAM-7765
> URL: https://issues.apache.org/jira/browse/BEAM-7765
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: John Patoch
>Priority: Major
>  Labels: easy
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This snippet needs a unit test.
> It has bugs. For example:
> - apache_beam.utils.value_provider doesn't exist
> - beam.combiners.Sum doesn't exist
> - unused import of: WriteToText
> cc: [~pabloem]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7977) DataflowRunner writes to System.out instead of logging

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7977?focusedWorklogId=323623&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323623
 ]

ASF GitHub Bot logged work on BEAM-7977:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:50
Start Date: 04/Oct/19 17:50
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9337: [BEAM-7977] 
use LOG.info instead of System.out.println in DataflowRunner
URL: https://github.com/apache/beam/pull/9337
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323623)
Time Spent: 50m  (was: 40m)

> DataflowRunner writes to System.out instead of logging
> --
>
> Key: BEAM-7977
> URL: https://issues.apache.org/jira/browse/BEAM-7977
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sebastian Jambor
>Assignee: Sebastian Jambor
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> DataflowRunner writes two lines to stdout for every job, which bypasses the 
> logger setup. This is slightly annoying, especially if many jobs are started 
> programmatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-6228) Website build from source release fails due to git dependency

2019-10-04 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-6228:
-

Assignee: Kenneth Knowles

> Website build from source release fails due to git dependency
> -
>
> Key: BEAM-6228
> URL: https://issues.apache.org/jira/browse/BEAM-6228
> Project: Beam
>  Issue Type: New Feature
>  Components: website
>Reporter: Scott Wegner
>Assignee: Kenneth Knowles
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The website build assumes a git environment, and will fail if built outside 
> of git. As a result, we cannot build the website from the source release.
> [~kenn] noticed this during 2.9.0 release validation. See: 
> https://lists.apache.org/thread.html/dc816a5f8c82dd4d19e82666209305ec67d46440fe8edba4534f9b63@%3Cdev.beam.apache.org%3E
> {code}
> > Configure project :beam-website
> No git repository found for :beam-website. Accessing grgit will cause an NPE.
> FAILURE: Build failed with an exception.
> * Where:
> Build file 'website/build.gradle' line: 143
> * What went wrong:
> A problem occurred evaluating project ':beam-website'.
> > Cannot get property 'branch' on null object
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7977) DataflowRunner writes to System.out instead of logging

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7977?focusedWorklogId=323622&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323622
 ]

ASF GitHub Bot logged work on BEAM-7977:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:50
Start Date: 04/Oct/19 17:50
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9337: [BEAM-7977] use 
LOG.info instead of System.out.println in DataflowRunner
URL: https://github.com/apache/beam/pull/9337#issuecomment-538497321
 
 
   Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323622)
Time Spent: 40m  (was: 0.5h)

> DataflowRunner writes to System.out instead of logging
> --
>
> Key: BEAM-7977
> URL: https://issues.apache.org/jira/browse/BEAM-7977
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sebastian Jambor
>Assignee: Sebastian Jambor
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> DataflowRunner writes two lines to stdout for every job, which bypasses the 
> logger setup. This is slightly annoying, especially if many jobs are started 
> programmatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-6228) Website build from source release fails due to git dependency

2019-10-04 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-6228:
--
Priority: Critical  (was: Minor)

> Website build from source release fails due to git dependency
> -
>
> Key: BEAM-6228
> URL: https://issues.apache.org/jira/browse/BEAM-6228
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Kenneth Knowles
>Priority: Critical
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The website build assumes a git environment, and will fail if built outside 
> of git. As a result, we cannot build the website from the source release.
> [~kenn] noticed this during 2.9.0 release validation. See: 
> https://lists.apache.org/thread.html/dc816a5f8c82dd4d19e82666209305ec67d46440fe8edba4534f9b63@%3Cdev.beam.apache.org%3E
> {code}
> > Configure project :beam-website
> No git repository found for :beam-website. Accessing grgit will cause an NPE.
> FAILURE: Build failed with an exception.
> * Where:
> Build file 'website/build.gradle' line: 143
> * What went wrong:
> A problem occurred evaluating project ':beam-website'.
> > Cannot get property 'branch' on null object
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-6228) Website build from source release fails due to git dependency

2019-10-04 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-6228:
--
Issue Type: Bug  (was: New Feature)

> Website build from source release fails due to git dependency
> -
>
> Key: BEAM-6228
> URL: https://issues.apache.org/jira/browse/BEAM-6228
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Scott Wegner
>Assignee: Kenneth Knowles
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The website build assumes a git environment, and will fail if built outside 
> of git. As a result, we cannot build the website from the source release.
> [~kenn] noticed this during 2.9.0 release validation. See: 
> https://lists.apache.org/thread.html/dc816a5f8c82dd4d19e82666209305ec67d46440fe8edba4534f9b63@%3Cdev.beam.apache.org%3E
> {code}
> > Configure project :beam-website
> No git repository found for :beam-website. Accessing grgit will cause an NPE.
> FAILURE: Build failed with an exception.
> * Where:
> Build file 'website/build.gradle' line: 143
> * What went wrong:
> A problem occurred evaluating project ':beam-website'.
> > Cannot get property 'branch' on null object
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-6228) Website build from source release fails due to git dependency

2019-10-04 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-6228:
--
Component/s: build-system

> Website build from source release fails due to git dependency
> -
>
> Key: BEAM-6228
> URL: https://issues.apache.org/jira/browse/BEAM-6228
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Kenneth Knowles
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The website build assumes a git environment, and will fail if built outside 
> of git. As a result, we cannot build the website from the source release.
> [~kenn] noticed this during 2.9.0 release validation. See: 
> https://lists.apache.org/thread.html/dc816a5f8c82dd4d19e82666209305ec67d46440fe8edba4534f9b63@%3Cdev.beam.apache.org%3E
> {code}
> > Configure project :beam-website
> No git repository found for :beam-website. Accessing grgit will cause an NPE.
> FAILURE: Build failed with an exception.
> * Where:
> Build file 'website/build.gradle' line: 143
> * What went wrong:
> A problem occurred evaluating project ':beam-website'.
> > Cannot get property 'branch' on null object
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-6228) Website build from source release fails due to git dependency

2019-10-04 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-6228:
--
Fix Version/s: 2.17.0

> Website build from source release fails due to git dependency
> -
>
> Key: BEAM-6228
> URL: https://issues.apache.org/jira/browse/BEAM-6228
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Kenneth Knowles
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The website build assumes a git environment, and will fail if built outside 
> of git. As a result, we cannot build the website from the source release.
> [~kenn] noticed this during 2.9.0 release validation. See: 
> https://lists.apache.org/thread.html/dc816a5f8c82dd4d19e82666209305ec67d46440fe8edba4534f9b63@%3Cdev.beam.apache.org%3E
> {code}
> > Configure project :beam-website
> No git repository found for :beam-website. Accessing grgit will cause an NPE.
> FAILURE: Build failed with an exception.
> * Where:
> Build file 'website/build.gradle' line: 143
> * What went wrong:
> A problem occurred evaluating project ':beam-website'.
> > Cannot get property 'branch' on null object
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=323627&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323627
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:53
Start Date: 04/Oct/19 17:53
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9718: [BEAM-8343] 
Moved BeamTable classes to a more appropriate location
URL: https://github.com/apache/beam/pull/9718#discussion_r331617476
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/DefaultTableFilter.java
 ##
 @@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta;
+
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+
+/**
+ * This default implementation of {@link BeamSqlTableFilter} interface. 
Assumes that predicate
+ * push-down is not supported.
+ */
+public class DefaultTableFilter implements BeamSqlTableFilter {
+  private final RexProgram program;
+  private final RexNode filter;
+
+  public DefaultTableFilter(RexProgram program, RexNode filter) {
+this.program = program;
+this.filter = filter;
+  }
+
+  /**
+   * Since predicate push-down is assumed not to be supported by default - 
return an unchanged
+   * filter to be preserved.
+   *
+   * @return Predicate {@code RexNode} which is not supported
+   */
+  @Override
+  public RexNode getNotSupported() {
 
 Review comment:
   Yes, I think updating it to a List of RexNodes, where each node is an 
operand for a top-level AND, is a better approach here. Addressed in #9731.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323627)
Time Spent: 1h 50m  (was: 1h 40m)

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
>  - Boolean supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=323625&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323625
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:53
Start Date: 04/Oct/19 17:53
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9718: [BEAM-8343] 
Moved BeamTable classes to a more appropriate location
URL: https://github.com/apache/beam/pull/9718#discussion_r331617476
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/DefaultTableFilter.java
 ##
 @@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta;
+
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+
+/**
+ * This default implementation of {@link BeamSqlTableFilter} interface. 
Assumes that predicate
+ * push-down is not supported.
+ */
+public class DefaultTableFilter implements BeamSqlTableFilter {
+  private final RexProgram program;
+  private final RexNode filter;
+
+  public DefaultTableFilter(RexProgram program, RexNode filter) {
+this.program = program;
+this.filter = filter;
+  }
+
+  /**
+   * Since predicate push-down is assumed not to be supported by default - 
return an unchanged
+   * filter to be preserved.
+   *
+   * @return Predicate {@code RexNode} which is not supported
+   */
+  @Override
+  public RexNode getNotSupported() {
 
 Review comment:
   Yes, I think updating it to a List of RexNodes, where each node is an 
operand for a top-level AND, is a better approach here. Addressed in #9731 .
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323625)
Time Spent: 1.5h  (was: 1h 20m)

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
>  - Boolean supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=323626&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323626
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:53
Start Date: 04/Oct/19 17:53
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9718: [BEAM-8343] 
Moved BeamTable classes to a more appropriate location
URL: https://github.com/apache/beam/pull/9718#discussion_r331615203
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BaseBeamTable.java
 ##
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta;
+
+import java.util.List;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+
+/** Basic implementation of {@link BeamSqlTable} methods used by predicate and 
filter push-down. */
+public abstract class BaseBeamTable implements BeamSqlTable {
+
+  @Override
+  public PCollection buildIOReader(
+  PBegin begin, BeamSqlTableFilter filters, List fieldNames) {
+return buildIOReader(begin);
 
 Review comment:
   Modified default implementation for this method to throw an exception in 
#9731.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323626)
Time Spent: 1h 40m  (was: 1.5h)

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
>  - Boolean supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=323628&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323628
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:53
Start Date: 04/Oct/19 17:53
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #9692: [BEAM-7389] 
Update docs with matching code files
URL: https://github.com/apache/beam/pull/9692#issuecomment-538498541
 
 
   * Rebased to latest version
   * Added buttons for every snippet
   
   TODO (until #9664 merges)
   * Generate notebooks
   * Test and stage website
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323628)
Time Spent: 63h 40m  (was: 63.5h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 63h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-7049) Merge multiple input to one BeamUnionRel

2019-10-04 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944063#comment-16944063
 ] 

Rui Wang edited comment on BEAM-7049 at 10/4/19 5:54 PM:
-

Ah ok. I found a tricky problem: given the merged cost based optimization, with 
enabled UnionMerge rule, the calcite planner falls into an infinite loop 
without choosing a plan. 

It becomes a very tricky problem. I will need to spend many hours to understand 
Calcite planner, BeamSQL's CBO implementation and others to understand the root 
cause. 


To answer your last question, what I was thinking was to have two rules for 
UNION ALL and UNION respectively and each rule should overwrite [1]. So UNION 
ALL rule will fire only for UNION ALL queries. UNION is the same. By doing so 
you can separate implementation of underlying PTransform.


[1]: 
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/plan/RelOptRule.java#L495


was (Author: amaliujia):
Ah ok. I found a tricky problem: given the merged cost based optimization, with 
enabled UnionMerge rule, the calcite planner falls into an infinite loop 
without choosing a plan. 

It becomes a very tricky problem. I will need to spend many hours to understand 
Calcite planner, BeamSQL's CBO implementation and others to understand the root 
cause. 


To answer your last question, what I was thinking was to have two rules for 
UNION ALL and UNION respectively and each rule should overwrite [1]. So UNION 
ALL rule will fire only for UNION ALL queries. UNION is the same. By doing so 
you can separate implementation of underlying PTransform.


[1]: 
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/plan/RelOptRule.java#L511
 

> Merge multiple input to one BeamUnionRel
> 
>
> Key: BEAM-7049
> URL: https://issues.apache.org/jira/browse/BEAM-7049
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: sridhar Reddy
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c` 
> will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If 
> BeamUnionRel can handle multiple shuffles, we will have only one shuffle



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=323629&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323629
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:54
Start Date: 04/Oct/19 17:54
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9718: [BEAM-8343] 
Moved BeamTable classes to a more appropriate location
URL: https://github.com/apache/beam/pull/9718#discussion_r331617476
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/DefaultTableFilter.java
 ##
 @@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta;
+
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+
+/**
+ * This default implementation of {@link BeamSqlTableFilter} interface. 
Assumes that predicate
+ * push-down is not supported.
+ */
+public class DefaultTableFilter implements BeamSqlTableFilter {
+  private final RexProgram program;
+  private final RexNode filter;
+
+  public DefaultTableFilter(RexProgram program, RexNode filter) {
+this.program = program;
+this.filter = filter;
+  }
+
+  /**
+   * Since predicate push-down is assumed not to be supported by default - 
return an unchanged
+   * filter to be preserved.
+   *
+   * @return Predicate {@code RexNode} which is not supported
+   */
+  @Override
+  public RexNode getNotSupported() {
 
 Review comment:
   Yes, I think updating it to a List of `RexNode`s, where each node is an 
operand for a top-level AND, is a better approach here. Addressed in #9731.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323629)
Time Spent: 2h  (was: 1h 50m)

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
>  - Boolean supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=323630&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323630
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:55
Start Date: 04/Oct/19 17:55
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9718: [BEAM-8343] 
Moved BeamTable classes to a more appropriate location
URL: https://github.com/apache/beam/pull/9718#discussion_r331618465
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BeamSqlTable.java
 ##
 @@ -15,24 +15,36 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-package org.apache.beam.sdk.extensions.sql;
+package org.apache.beam.sdk.extensions.sql.meta;
 
+import java.util.List;
 import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics;
 import org.apache.beam.sdk.options.PipelineOptions;
 import org.apache.beam.sdk.schemas.Schema;
 import org.apache.beam.sdk.values.PBegin;
 import org.apache.beam.sdk.values.PCollection;
 import org.apache.beam.sdk.values.POutput;
 import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
 
 /** This interface defines a Beam Sql Table. */
 public interface BeamSqlTable {
   /** create a {@code PCollection} from source. */
   PCollection buildIOReader(PBegin begin);
 
+  /** create a {@code PCollection} from source with predicate and/or 
project pushed-down. */
+  PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
List fieldNames);
+
   /** create a {@code IO.write()} instance to write to target. */
   POutput buildIOWriter(PCollection input);
 
+  /** Generate an IO implementation of {@code BeamSqlTableFilter} for 
predicate push-down. */
+  BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter);
 
 Review comment:
   Yes, I believe `List` can be used as an argument here, addressed in 
#9731.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323630)
Time Spent: 2h 10m  (was: 2h)

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
>  - Boolean supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6995) SQL aggregation with where clause fails to plan

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6995?focusedWorklogId=323631&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323631
 ]

ASF GitHub Bot logged work on BEAM-6995:


Author: ASF GitHub Bot
Created on: 04/Oct/19 17:59
Start Date: 04/Oct/19 17:59
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #9703: [BEAM-6995] 
Beam basic aggregation rule only when not windowed
URL: https://github.com/apache/beam/pull/9703#discussion_r331620109
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java
 ##
 @@ -84,6 +85,31 @@ public BeamAggregationRel(
 this.windowFieldIndex = windowFieldIndex;
   }
 
+  public BeamAggregationRel(
+  RelOptCluster cluster,
+  RelTraitSet traits,
+  RelNode child,
+  RelDataType rowType,
+  boolean indicator,
+  ImmutableBitSet groupSet,
+  List groupSets,
+  List aggCalls,
+  @Nullable WindowFn windowFn,
+  int windowFieldIndex) {
+this(
+cluster,
+traits,
+child,
+indicator,
+groupSet,
+groupSets,
+aggCalls,
+windowFn,
+windowFieldIndex);
+
+this.rowType = rowType;
 
 Review comment:
   Where is this rowType used?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323631)
Time Spent: 2.5h  (was: 2h 20m)

> SQL aggregation with where clause fails to plan
> ---
>
> Key: BEAM-6995
> URL: https://issues.apache.org/jira/browse/BEAM-6995
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.11.0
>Reporter: David McIntosh
>Assignee: Kirill Kozlov
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> I'm finding that this code fails with a CannotPlanException listed below.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .build();
> Row row = Row.withSchema(schema).addValues(1, 2).build();
> PCollection inputData = p.apply("row input", 
> Create.of(row).withRowSchema(schema));
> inputData.apply("sql",
> SqlTransform.query(
> "SELECT id, SUM(val) "
> + "FROM PCOLLECTION "
> + "WHERE val > 0 "
> + "GROUP BY id"));{code}
> If the WHERE clause is removed the code runs successfully.
> This may be similar to BEAM-5384 since I was able to work around this by 
> adding an extra column to the input that isn't reference in the sql.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .addInt32Field("extra")
>     .build();{code}
>  
> {code:java}
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException:
>  Node [rel#100:Subset#2.BEAM_LOGICAL] could not be implemented; planner state:
> Root: rel#100:Subset#2.BEAM_LOGICAL
> Original rel:
> LogicalAggregate(subset=[rel#100:Subset#2.BEAM_LOGICAL], group=[{0}], 
> EXPR$1=[SUM($1)]): rowcount = 5.0, cumulative cost = {5.687500238418579 rows, 
> 0.0 cpu, 0.0 io}, id = 98
>   LogicalFilter(subset=[rel#97:Subset#1.NONE], condition=[>($1, 0)]): 
> rowcount = 50.0, cumulative cost = {50.0 rows, 100.0 cpu, 0.0 io}, id = 96
> BeamIOSourceRel(subset=[rel#95:Subset#0.BEAM_LOGICAL], table=[[beam, 
> PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 
> 0.0 io}, id = 92
> Sets:
> Set#0, type: RecordType(INTEGER id, INTEGER val)
> rel#95:Subset#0.BEAM_LOGICAL, best=rel#92, 
> importance=0.7291
> rel#92:BeamIOSourceRel.BEAM_LOGICAL(table=[beam, 
> PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> rel#110:Subset#0.ENUMERABLE, best=rel#109, 
> importance=0.36455
> 
> rel#109:BeamEnumerableConverter.ENUMERABLE(input=rel#95:Subset#0.BEAM_LOGICAL),
>  rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, 
> 1.7976931348623157E308 cpu, 1.7976931348623157E308 io}
> Set#1, type: RecordType(INTEGER id, INTEGER val)
> rel#97:Subset#1.NONE, best=null, importance=0.81
> 
> rel#96:LogicalFilter.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,condition=>($1, 
> 0)), rowcount=50.0, cumulative cost={inf}
> 
> rel#102:LogicalCalc.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,expr#0..1={inputs},expr#2=0,expr#3=>($t1,
>  $t2),i

[jira] [Commented] (BEAM-8319) Errorprone 0.0.13 fails during JDK11 build

2019-10-04 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944719#comment-16944719
 ] 

Pablo Estrada commented on BEAM-8319:
-

I've seen on Guava that the version they call out as having official JDK 11 
support is 27.1-jre:

https://github.com/google/guava/issues/3272
https://github.com/google/guava/issues/3288

> Errorprone 0.0.13 fails during JDK11 build
> --
>
> Key: BEAM-8319
> URL: https://issues.apache.org/jira/browse/BEAM-8319
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I'm using openjdk 1.11.02. After switching version to;
> {code:java}
> javaVersion = 11 {code}
> in BeamModule Plugin and running
> {code:java}
> ./gradlew clean build -p sdks/java/code -xtest {code}
> building fails. I was able to run errorprone after upgrading it but had 
> problems with conflicting guava version. See more here: 
> https://issues.apache.org/jira/browse/BEAM-5085
>  
> Stacktrace:
> {code:java}
> org.gradle.api.tasks.TaskExecutionException: Execution failed for task 
> ':model:pipeline:compileJava'.
> at 
> org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:121)
> at 
> org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:117)
> at org.gradle.internal.Try$Failure.ifSuccessfulOrElse(Try.java:184)
> at 
> org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.execute(ExecuteActionsTaskExecuter.java:110)
> at 
> org.gradle.api.internal.tasks.execution.ResolveIncrementalChangesTaskExecuter.execute(ResolveIncrementalChangesTaskExecuter.java:84)
> at 
> org.gradle.api.internal.tasks.execution.ResolveTaskOutputCachingStateExecuter.execute(ResolveTaskOutputCachingStateExecuter.java:91)
> at 
> org.gradle.api.internal.tasks.execution.FinishSnapshotTaskInputsBuildOperationTaskExecuter.execute(FinishSnapshotTaskInputsBuildOperationTaskExecuter.java:51)
> at 
> org.gradle.api.internal.tasks.execution.ResolveBuildCacheKeyExecuter.execute(ResolveBuildCacheKeyExecuter.java:102)
> at 
> org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionStateTaskExecuter.execute(ResolveBeforeExecutionStateTaskExecuter.java:74)
> at 
> org.gradle.api.internal.tasks.execution.ValidatingTaskExecuter.execute(ValidatingTaskExecuter.java:58)
> at 
> org.gradle.api.internal.tasks.execution.SkipEmptySourceFilesTaskExecuter.execute(SkipEmptySourceFilesTaskExecuter.java:109)
> at 
> org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionOutputsTaskExecuter.execute(ResolveBeforeExecutionOutputsTaskExecuter.java:67)
> at 
> org.gradle.api.internal.tasks.execution.StartSnapshotTaskInputsBuildOperationTaskExecuter.execute(StartSnapshotTaskInputsBuildOperationTaskExecuter.java:52)
> at 
> org.gradle.api.internal.tasks.execution.ResolveAfterPreviousExecutionStateTaskExecuter.execute(ResolveAfterPreviousExecutionStateTaskExecuter.java:46)
> at 
> org.gradle.api.internal.tasks.execution.CleanupStaleOutputsExecuter.execute(CleanupStaleOutputsExecuter.java:93)
> at 
> org.gradle.api.internal.tasks.execution.FinalizePropertiesTaskExecuter.execute(FinalizePropertiesTaskExecuter.java:45)
> at 
> org.gradle.api.internal.tasks.execution.ResolveTaskExecutionModeExecuter.execute(ResolveTaskExecutionModeExecuter.java:94)
> at 
> org.gradle.api.internal.tasks.execution.SkipTaskWithNoActionsExecuter.execute(SkipTaskWithNoActionsExecuter.java:57)
> at 
> org.gradle.api.internal.tasks.execution.SkipOnlyIfTaskExecuter.execute(SkipOnlyIfTaskExecuter.java:56)
> at 
> org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:36)
> at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.executeTask(EventFiringTaskExecuter.java:63)
> at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:49)
> at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:46)
> at 
> org.gradle.internal.operations.DefaultBuildOperationExecutor$CallableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:416)
> at 
> org.gradle.internal.operations.DefaultBuildOperationExecutor$CallableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:406)
> at 
> org.gradle.internal.operations.DefaultBuildOperationExecutor$1.execute(DefaultBuildOperationExecutor.java:165)
> at 
> or

[jira] [Commented] (BEAM-8319) Errorprone 0.0.13 fails during JDK11 build

2019-10-04 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944728#comment-16944728
 ] 

Lukasz Gajowy commented on BEAM-8319:
-

Thanks [~pabloem] - good to know this. In that case, I think we should update 
vendored guava too (it's currently in version 26.0-jre). 

> Errorprone 0.0.13 fails during JDK11 build
> --
>
> Key: BEAM-8319
> URL: https://issues.apache.org/jira/browse/BEAM-8319
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I'm using openjdk 1.11.02. After switching version to;
> {code:java}
> javaVersion = 11 {code}
> in BeamModule Plugin and running
> {code:java}
> ./gradlew clean build -p sdks/java/code -xtest {code}
> building fails. I was able to run errorprone after upgrading it but had 
> problems with conflicting guava version. See more here: 
> https://issues.apache.org/jira/browse/BEAM-5085
>  
> Stacktrace:
> {code:java}
> org.gradle.api.tasks.TaskExecutionException: Execution failed for task 
> ':model:pipeline:compileJava'.
> at 
> org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:121)
> at 
> org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:117)
> at org.gradle.internal.Try$Failure.ifSuccessfulOrElse(Try.java:184)
> at 
> org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.execute(ExecuteActionsTaskExecuter.java:110)
> at 
> org.gradle.api.internal.tasks.execution.ResolveIncrementalChangesTaskExecuter.execute(ResolveIncrementalChangesTaskExecuter.java:84)
> at 
> org.gradle.api.internal.tasks.execution.ResolveTaskOutputCachingStateExecuter.execute(ResolveTaskOutputCachingStateExecuter.java:91)
> at 
> org.gradle.api.internal.tasks.execution.FinishSnapshotTaskInputsBuildOperationTaskExecuter.execute(FinishSnapshotTaskInputsBuildOperationTaskExecuter.java:51)
> at 
> org.gradle.api.internal.tasks.execution.ResolveBuildCacheKeyExecuter.execute(ResolveBuildCacheKeyExecuter.java:102)
> at 
> org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionStateTaskExecuter.execute(ResolveBeforeExecutionStateTaskExecuter.java:74)
> at 
> org.gradle.api.internal.tasks.execution.ValidatingTaskExecuter.execute(ValidatingTaskExecuter.java:58)
> at 
> org.gradle.api.internal.tasks.execution.SkipEmptySourceFilesTaskExecuter.execute(SkipEmptySourceFilesTaskExecuter.java:109)
> at 
> org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionOutputsTaskExecuter.execute(ResolveBeforeExecutionOutputsTaskExecuter.java:67)
> at 
> org.gradle.api.internal.tasks.execution.StartSnapshotTaskInputsBuildOperationTaskExecuter.execute(StartSnapshotTaskInputsBuildOperationTaskExecuter.java:52)
> at 
> org.gradle.api.internal.tasks.execution.ResolveAfterPreviousExecutionStateTaskExecuter.execute(ResolveAfterPreviousExecutionStateTaskExecuter.java:46)
> at 
> org.gradle.api.internal.tasks.execution.CleanupStaleOutputsExecuter.execute(CleanupStaleOutputsExecuter.java:93)
> at 
> org.gradle.api.internal.tasks.execution.FinalizePropertiesTaskExecuter.execute(FinalizePropertiesTaskExecuter.java:45)
> at 
> org.gradle.api.internal.tasks.execution.ResolveTaskExecutionModeExecuter.execute(ResolveTaskExecutionModeExecuter.java:94)
> at 
> org.gradle.api.internal.tasks.execution.SkipTaskWithNoActionsExecuter.execute(SkipTaskWithNoActionsExecuter.java:57)
> at 
> org.gradle.api.internal.tasks.execution.SkipOnlyIfTaskExecuter.execute(SkipOnlyIfTaskExecuter.java:56)
> at 
> org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:36)
> at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.executeTask(EventFiringTaskExecuter.java:63)
> at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:49)
> at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:46)
> at 
> org.gradle.internal.operations.DefaultBuildOperationExecutor$CallableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:416)
> at 
> org.gradle.internal.operations.DefaultBuildOperationExecutor$CallableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:406)
> at 
> org.gradle.internal.operations.DefaultBuildOperationExecutor$1.execute(DefaultBuildOperationExecutor.java:165)
> at 
> org.gradle.internal.operations.DefaultBuildOperation

[jira] [Comment Edited] (BEAM-8319) Errorprone 0.0.13 fails during JDK11 build

2019-10-04 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944728#comment-16944728
 ] 

Lukasz Gajowy edited comment on BEAM-8319 at 10/4/19 6:11 PM:
--

Thanks [~pabloem] - good to know this. In that case, I think we may need to 
update vendored guava too (it's currently in version 26.0-jre). 


was (Author: łukaszg):
Thanks [~pabloem] - good to know this. In that case, I think we should update 
vendored guava too (it's currently in version 26.0-jre). 

> Errorprone 0.0.13 fails during JDK11 build
> --
>
> Key: BEAM-8319
> URL: https://issues.apache.org/jira/browse/BEAM-8319
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I'm using openjdk 1.11.02. After switching version to;
> {code:java}
> javaVersion = 11 {code}
> in BeamModule Plugin and running
> {code:java}
> ./gradlew clean build -p sdks/java/code -xtest {code}
> building fails. I was able to run errorprone after upgrading it but had 
> problems with conflicting guava version. See more here: 
> https://issues.apache.org/jira/browse/BEAM-5085
>  
> Stacktrace:
> {code:java}
> org.gradle.api.tasks.TaskExecutionException: Execution failed for task 
> ':model:pipeline:compileJava'.
> at 
> org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:121)
> at 
> org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:117)
> at org.gradle.internal.Try$Failure.ifSuccessfulOrElse(Try.java:184)
> at 
> org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.execute(ExecuteActionsTaskExecuter.java:110)
> at 
> org.gradle.api.internal.tasks.execution.ResolveIncrementalChangesTaskExecuter.execute(ResolveIncrementalChangesTaskExecuter.java:84)
> at 
> org.gradle.api.internal.tasks.execution.ResolveTaskOutputCachingStateExecuter.execute(ResolveTaskOutputCachingStateExecuter.java:91)
> at 
> org.gradle.api.internal.tasks.execution.FinishSnapshotTaskInputsBuildOperationTaskExecuter.execute(FinishSnapshotTaskInputsBuildOperationTaskExecuter.java:51)
> at 
> org.gradle.api.internal.tasks.execution.ResolveBuildCacheKeyExecuter.execute(ResolveBuildCacheKeyExecuter.java:102)
> at 
> org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionStateTaskExecuter.execute(ResolveBeforeExecutionStateTaskExecuter.java:74)
> at 
> org.gradle.api.internal.tasks.execution.ValidatingTaskExecuter.execute(ValidatingTaskExecuter.java:58)
> at 
> org.gradle.api.internal.tasks.execution.SkipEmptySourceFilesTaskExecuter.execute(SkipEmptySourceFilesTaskExecuter.java:109)
> at 
> org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionOutputsTaskExecuter.execute(ResolveBeforeExecutionOutputsTaskExecuter.java:67)
> at 
> org.gradle.api.internal.tasks.execution.StartSnapshotTaskInputsBuildOperationTaskExecuter.execute(StartSnapshotTaskInputsBuildOperationTaskExecuter.java:52)
> at 
> org.gradle.api.internal.tasks.execution.ResolveAfterPreviousExecutionStateTaskExecuter.execute(ResolveAfterPreviousExecutionStateTaskExecuter.java:46)
> at 
> org.gradle.api.internal.tasks.execution.CleanupStaleOutputsExecuter.execute(CleanupStaleOutputsExecuter.java:93)
> at 
> org.gradle.api.internal.tasks.execution.FinalizePropertiesTaskExecuter.execute(FinalizePropertiesTaskExecuter.java:45)
> at 
> org.gradle.api.internal.tasks.execution.ResolveTaskExecutionModeExecuter.execute(ResolveTaskExecutionModeExecuter.java:94)
> at 
> org.gradle.api.internal.tasks.execution.SkipTaskWithNoActionsExecuter.execute(SkipTaskWithNoActionsExecuter.java:57)
> at 
> org.gradle.api.internal.tasks.execution.SkipOnlyIfTaskExecuter.execute(SkipOnlyIfTaskExecuter.java:56)
> at 
> org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:36)
> at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.executeTask(EventFiringTaskExecuter.java:63)
> at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:49)
> at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:46)
> at 
> org.gradle.internal.operations.DefaultBuildOperationExecutor$CallableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:416)
> at 
> org.gradle.internal.operations.DefaultBuildOperationExecutor$CallableBuildOperationWorker.execute(DefaultBuildOperati

[jira] [Work logged] (BEAM-8344) Add infer schema support in ParquetIO and refactor ParquetTableProvider

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8344?focusedWorklogId=323638&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323638
 ]

ASF GitHub Bot logged work on BEAM-8344:


Author: ASF GitHub Bot
Created on: 04/Oct/19 18:11
Start Date: 04/Oct/19 18:11
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #9721: [BEAM-8344] 
Add inferSchema support in ParquetIO and refactor ParquetTableProvider
URL: https://github.com/apache/beam/pull/9721#discussion_r331624690
 
 

 ##
 File path: 
sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java
 ##
 @@ -122,15 +122,34 @@
* pattern).
*/
   public static Read read(Schema schema) {
-return new AutoValue_ParquetIO_Read.Builder().setSchema(schema).build();
+return new AutoValue_ParquetIO_Read.Builder()
+.setSchema(schema)
+.setInferBeamSchema(false)
+.build();
   }
 
   /**
* Like {@link #read(Schema)}, but reads each file in a {@link PCollection} 
of {@link
* org.apache.beam.sdk.io.FileIO.ReadableFile}, which allows more flexible 
usage.
*/
   public static ReadFiles readFiles(Schema schema) {
-return new 
AutoValue_ParquetIO_ReadFiles.Builder().setSchema(schema).build();
+return new AutoValue_ParquetIO_ReadFiles.Builder()
+.setSchema(schema)
+.setInferBeamSchema(false)
+.build();
+  }
+
+  private static  PCollection setBeamSchema(
+  PCollection pc, Class clazz, @Nullable Schema schema) {
+org.apache.beam.sdk.schemas.Schema beamSchema =
+org.apache.beam.sdk.schemas.utils.AvroUtils.getSchema(clazz, schema);
+if (beamSchema != null) {
 
 Review comment:
   Because here you won't throw exception if there is not beamSchema, why not 
just make "inforSchema" as a non-optional action so we don't need set the 
boolean to control it?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323638)
Time Spent: 20m  (was: 10m)

> Add infer schema support in ParquetIO and refactor ParquetTableProvider
> ---
>
> Key: BEAM-8344
> URL: https://issues.apache.org/jira/browse/BEAM-8344
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql, io-java-parquet
>Reporter: Vishwas
>Assignee: Vishwas
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add support for inferring Beam Schema in ParquetIO.
> Refactor ParquetTable code to use Convert.rows().
> Remove unnecessary java class GenericRecordReadConverter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6995) SQL aggregation with where clause fails to plan

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6995?focusedWorklogId=323640&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323640
 ]

ASF GitHub Bot logged work on BEAM-6995:


Author: ASF GitHub Bot
Created on: 04/Oct/19 18:12
Start Date: 04/Oct/19 18:12
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #9703: [BEAM-6995] 
Beam basic aggregation rule only when not windowed
URL: https://github.com/apache/beam/pull/9703#discussion_r331625043
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java
 ##
 @@ -84,6 +85,31 @@ public BeamAggregationRel(
 this.windowFieldIndex = windowFieldIndex;
   }
 
+  public BeamAggregationRel(
+  RelOptCluster cluster,
+  RelTraitSet traits,
+  RelNode child,
+  RelDataType rowType,
+  boolean indicator,
+  ImmutableBitSet groupSet,
+  List groupSets,
+  List aggCalls,
+  @Nullable WindowFn windowFn,
+  int windowFieldIndex) {
+this(
+cluster,
+traits,
+child,
+indicator,
+groupSet,
+groupSets,
+aggCalls,
+windowFn,
+windowFieldIndex);
+
+this.rowType = rowType;
 
 Review comment:
   RelNode type is already inferred from input nodes. Usually when you need to 
use it, you can use getRowType() function to get it than save it as a class 
member.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323640)
Time Spent: 2h 40m  (was: 2.5h)

> SQL aggregation with where clause fails to plan
> ---
>
> Key: BEAM-6995
> URL: https://issues.apache.org/jira/browse/BEAM-6995
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.11.0
>Reporter: David McIntosh
>Assignee: Kirill Kozlov
>Priority: Minor
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> I'm finding that this code fails with a CannotPlanException listed below.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .build();
> Row row = Row.withSchema(schema).addValues(1, 2).build();
> PCollection inputData = p.apply("row input", 
> Create.of(row).withRowSchema(schema));
> inputData.apply("sql",
> SqlTransform.query(
> "SELECT id, SUM(val) "
> + "FROM PCOLLECTION "
> + "WHERE val > 0 "
> + "GROUP BY id"));{code}
> If the WHERE clause is removed the code runs successfully.
> This may be similar to BEAM-5384 since I was able to work around this by 
> adding an extra column to the input that isn't reference in the sql.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .addInt32Field("extra")
>     .build();{code}
>  
> {code:java}
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException:
>  Node [rel#100:Subset#2.BEAM_LOGICAL] could not be implemented; planner state:
> Root: rel#100:Subset#2.BEAM_LOGICAL
> Original rel:
> LogicalAggregate(subset=[rel#100:Subset#2.BEAM_LOGICAL], group=[{0}], 
> EXPR$1=[SUM($1)]): rowcount = 5.0, cumulative cost = {5.687500238418579 rows, 
> 0.0 cpu, 0.0 io}, id = 98
>   LogicalFilter(subset=[rel#97:Subset#1.NONE], condition=[>($1, 0)]): 
> rowcount = 50.0, cumulative cost = {50.0 rows, 100.0 cpu, 0.0 io}, id = 96
> BeamIOSourceRel(subset=[rel#95:Subset#0.BEAM_LOGICAL], table=[[beam, 
> PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 
> 0.0 io}, id = 92
> Sets:
> Set#0, type: RecordType(INTEGER id, INTEGER val)
> rel#95:Subset#0.BEAM_LOGICAL, best=rel#92, 
> importance=0.7291
> rel#92:BeamIOSourceRel.BEAM_LOGICAL(table=[beam, 
> PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> rel#110:Subset#0.ENUMERABLE, best=rel#109, 
> importance=0.36455
> 
> rel#109:BeamEnumerableConverter.ENUMERABLE(input=rel#95:Subset#0.BEAM_LOGICAL),
>  rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, 
> 1.7976931348623157E308 cpu, 1.7976931348623157E308 io}
> Set#1, type: RecordType(INTEGER id, INTEGER val)
> rel#97:Subset#1.NONE, best=null, importance=0.81
> 
> rel#96:LogicalFilter.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,condition=>($1, 
> 0)), rowcount=50.0, cumulative cost={i

[jira] [Work logged] (BEAM-6995) SQL aggregation with where clause fails to plan

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6995?focusedWorklogId=323642&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323642
 ]

ASF GitHub Bot logged work on BEAM-6995:


Author: ASF GitHub Bot
Created on: 04/Oct/19 18:15
Start Date: 04/Oct/19 18:15
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9703: [BEAM-6995] 
Beam basic aggregation rule only when not windowed
URL: https://github.com/apache/beam/pull/9703#discussion_r331626110
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java
 ##
 @@ -84,6 +85,31 @@ public BeamAggregationRel(
 this.windowFieldIndex = windowFieldIndex;
   }
 
+  public BeamAggregationRel(
+  RelOptCluster cluster,
+  RelTraitSet traits,
+  RelNode child,
+  RelDataType rowType,
+  boolean indicator,
+  ImmutableBitSet groupSet,
+  List groupSets,
+  List aggCalls,
+  @Nullable WindowFn windowFn,
+  int windowFieldIndex) {
+this(
+cluster,
+traits,
+child,
+indicator,
+groupSet,
+groupSets,
+aggCalls,
+windowFn,
+windowFieldIndex);
+
+this.rowType = rowType;
 
 Review comment:
   There is a JIRA issue: https://jira.apache.org/jira/browse/BEAM-7609, when 
running queries with "SELECT DISTINCT + JOIN", resulting field names are not 
assigned proper name.
   Even though it does not solve this particular issue, preserving the rowType 
should not hurt (where previously it would just ignore it and set it to null).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323642)
Time Spent: 2h 50m  (was: 2h 40m)

> SQL aggregation with where clause fails to plan
> ---
>
> Key: BEAM-6995
> URL: https://issues.apache.org/jira/browse/BEAM-6995
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.11.0
>Reporter: David McIntosh
>Assignee: Kirill Kozlov
>Priority: Minor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> I'm finding that this code fails with a CannotPlanException listed below.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .build();
> Row row = Row.withSchema(schema).addValues(1, 2).build();
> PCollection inputData = p.apply("row input", 
> Create.of(row).withRowSchema(schema));
> inputData.apply("sql",
> SqlTransform.query(
> "SELECT id, SUM(val) "
> + "FROM PCOLLECTION "
> + "WHERE val > 0 "
> + "GROUP BY id"));{code}
> If the WHERE clause is removed the code runs successfully.
> This may be similar to BEAM-5384 since I was able to work around this by 
> adding an extra column to the input that isn't reference in the sql.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .addInt32Field("extra")
>     .build();{code}
>  
> {code:java}
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException:
>  Node [rel#100:Subset#2.BEAM_LOGICAL] could not be implemented; planner state:
> Root: rel#100:Subset#2.BEAM_LOGICAL
> Original rel:
> LogicalAggregate(subset=[rel#100:Subset#2.BEAM_LOGICAL], group=[{0}], 
> EXPR$1=[SUM($1)]): rowcount = 5.0, cumulative cost = {5.687500238418579 rows, 
> 0.0 cpu, 0.0 io}, id = 98
>   LogicalFilter(subset=[rel#97:Subset#1.NONE], condition=[>($1, 0)]): 
> rowcount = 50.0, cumulative cost = {50.0 rows, 100.0 cpu, 0.0 io}, id = 96
> BeamIOSourceRel(subset=[rel#95:Subset#0.BEAM_LOGICAL], table=[[beam, 
> PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 
> 0.0 io}, id = 92
> Sets:
> Set#0, type: RecordType(INTEGER id, INTEGER val)
> rel#95:Subset#0.BEAM_LOGICAL, best=rel#92, 
> importance=0.7291
> rel#92:BeamIOSourceRel.BEAM_LOGICAL(table=[beam, 
> PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> rel#110:Subset#0.ENUMERABLE, best=rel#109, 
> importance=0.36455
> 
> rel#109:BeamEnumerableConverter.ENUMERABLE(input=rel#95:Subset#0.BEAM_LOGICAL),
>  rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, 
> 1.7976931348623157E308 cpu, 1.7976931348623157E308 io}
> Set#1, type: RecordType(INTEGER id, INTEGER val)
> rel#97:Subset#

[jira] [Resolved] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-10-04 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-6923.
-
Fix Version/s: 2.17.0
   Resolution: Fixed

I think this has been fixed huh ? : )

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.17.0
>
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump 
> size-sorted.png
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8351?focusedWorklogId=323643&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323643
 ]

ASF GitHub Bot logged work on BEAM-8351:


Author: ASF GitHub Bot
Created on: 04/Oct/19 18:19
Start Date: 04/Oct/19 18:19
Worklog Time Spent: 10m 
  Work Description: violalyu commented on pull request #9730: [BEAM-8351] 
Support passing in arbitrary KV pairs to sdk worker via external environment 
config
URL: https://github.com/apache/beam/pull/9730#discussion_r331627588
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/portable_runner.py
 ##
 @@ -129,11 +129,25 @@ def _create_environment(options):
   env=(config.get('env') or '')
   ).SerializeToString())
 elif environment_urn == common_urns.environments.EXTERNAL.urn:
+  def _looks_like_json(environment_config):
+import re
+return re.match(r'\{.+\}', environment_config)
 
 Review comment:
   Thanks Chad! Updated for now!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323643)
Time Spent: 1h 20m  (was: 1h 10m)

> Support passing in arbitrary KV pairs to sdk worker via external environment 
> config
> ---
>
> Key: BEAM-8351
> URL: https://issues.apache.org/jira/browse/BEAM-8351
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Wanqi Lyu
>Assignee: Wanqi Lyu
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Originally, the environment config for environment type of EXTERNAL only 
> support passing in an url for the external worker pool; We want to support 
> passing in arbitrary KV pairs to sdk worker via external environment config, 
> so that the when starting the sdk harness we could get the values from 
> `StartWorkerRequest.params`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8351?focusedWorklogId=323644&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323644
 ]

ASF GitHub Bot logged work on BEAM-8351:


Author: ASF GitHub Bot
Created on: 04/Oct/19 18:19
Start Date: 04/Oct/19 18:19
Worklog Time Spent: 10m 
  Work Description: violalyu commented on pull request #9730: [BEAM-8351] 
Support passing in arbitrary KV pairs to sdk worker via external environment 
config
URL: https://github.com/apache/beam/pull/9730#discussion_r331627588
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/portable_runner.py
 ##
 @@ -129,11 +129,25 @@ def _create_environment(options):
   env=(config.get('env') or '')
   ).SerializeToString())
 elif environment_urn == common_urns.environments.EXTERNAL.urn:
+  def _looks_like_json(environment_config):
+import re
+return re.match(r'\{.+\}', environment_config)
 
 Review comment:
   Thanks Chad! Updated for now in 
https://github.com/apache/beam/pull/9730/commits/7fed22b5ad4c845770bac84944c4743b77495367!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323644)
Time Spent: 1.5h  (was: 1h 20m)

> Support passing in arbitrary KV pairs to sdk worker via external environment 
> config
> ---
>
> Key: BEAM-8351
> URL: https://issues.apache.org/jira/browse/BEAM-8351
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Wanqi Lyu
>Assignee: Wanqi Lyu
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Originally, the environment config for environment type of EXTERNAL only 
> support passing in an url for the external worker pool; We want to support 
> passing in arbitrary KV pairs to sdk worker via external environment config, 
> so that the when starting the sdk harness we could get the values from 
> `StartWorkerRequest.params`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6995) SQL aggregation with where clause fails to plan

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6995?focusedWorklogId=323647&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323647
 ]

ASF GitHub Bot logged work on BEAM-6995:


Author: ASF GitHub Bot
Created on: 04/Oct/19 18:24
Start Date: 04/Oct/19 18:24
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9703: [BEAM-6995] 
Beam basic aggregation rule only when not windowed
URL: https://github.com/apache/beam/pull/9703#discussion_r331296410
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationTest.java
 ##
 @@ -701,7 +700,6 @@ public void testSupportsAggregationWithoutProjection() 
throws Exception {
   }
 
   @Test
-  @Ignore("https://issues.apache.org/jira/browse/BEAM-8317";)
   public void testSupportsAggregationWithFilterWithoutProjection() throws 
Exception {
 
 Review comment:
   Found a useful reference link with examples: 
https://github.com/Pragmatists/JUnitParams/blob/master/src/test/java/junitparams/usage/SamplesOfUsageTest.java
   And this one: https://github.com/junit-team/junit4/wiki/parameterized-tests
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323647)
Time Spent: 3h  (was: 2h 50m)

> SQL aggregation with where clause fails to plan
> ---
>
> Key: BEAM-6995
> URL: https://issues.apache.org/jira/browse/BEAM-6995
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.11.0
>Reporter: David McIntosh
>Assignee: Kirill Kozlov
>Priority: Minor
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> I'm finding that this code fails with a CannotPlanException listed below.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .build();
> Row row = Row.withSchema(schema).addValues(1, 2).build();
> PCollection inputData = p.apply("row input", 
> Create.of(row).withRowSchema(schema));
> inputData.apply("sql",
> SqlTransform.query(
> "SELECT id, SUM(val) "
> + "FROM PCOLLECTION "
> + "WHERE val > 0 "
> + "GROUP BY id"));{code}
> If the WHERE clause is removed the code runs successfully.
> This may be similar to BEAM-5384 since I was able to work around this by 
> adding an extra column to the input that isn't reference in the sql.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .addInt32Field("extra")
>     .build();{code}
>  
> {code:java}
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException:
>  Node [rel#100:Subset#2.BEAM_LOGICAL] could not be implemented; planner state:
> Root: rel#100:Subset#2.BEAM_LOGICAL
> Original rel:
> LogicalAggregate(subset=[rel#100:Subset#2.BEAM_LOGICAL], group=[{0}], 
> EXPR$1=[SUM($1)]): rowcount = 5.0, cumulative cost = {5.687500238418579 rows, 
> 0.0 cpu, 0.0 io}, id = 98
>   LogicalFilter(subset=[rel#97:Subset#1.NONE], condition=[>($1, 0)]): 
> rowcount = 50.0, cumulative cost = {50.0 rows, 100.0 cpu, 0.0 io}, id = 96
> BeamIOSourceRel(subset=[rel#95:Subset#0.BEAM_LOGICAL], table=[[beam, 
> PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 
> 0.0 io}, id = 92
> Sets:
> Set#0, type: RecordType(INTEGER id, INTEGER val)
> rel#95:Subset#0.BEAM_LOGICAL, best=rel#92, 
> importance=0.7291
> rel#92:BeamIOSourceRel.BEAM_LOGICAL(table=[beam, 
> PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> rel#110:Subset#0.ENUMERABLE, best=rel#109, 
> importance=0.36455
> 
> rel#109:BeamEnumerableConverter.ENUMERABLE(input=rel#95:Subset#0.BEAM_LOGICAL),
>  rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, 
> 1.7976931348623157E308 cpu, 1.7976931348623157E308 io}
> Set#1, type: RecordType(INTEGER id, INTEGER val)
> rel#97:Subset#1.NONE, best=null, importance=0.81
> 
> rel#96:LogicalFilter.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,condition=>($1, 
> 0)), rowcount=50.0, cumulative cost={inf}
> 
> rel#102:LogicalCalc.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,expr#0..1={inputs},expr#2=0,expr#3=>($t1,
>  $t2),id=$t0,val=$t1,$condition=$t3), rowcount=50.0, cumulative cost={inf}
> rel#104:Subset#1.BEAM_LOGICAL, best=rel#103, importance=0.405
> 
> re

[jira] [Work logged] (BEAM-876) Support schemaUpdateOption in BigQueryIO

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-876?focusedWorklogId=323671&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323671
 ]

ASF GitHub Bot logged work on BEAM-876:
---

Author: ASF GitHub Bot
Created on: 04/Oct/19 19:45
Start Date: 04/Oct/19 19:45
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9524: [BEAM-876] Support 
schemaUpdateOption in BigQueryIO
URL: https://github.com/apache/beam/pull/9524#issuecomment-538535933
 
 
   Yes, I'll be glad to review.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323671)
Time Spent: 50m  (was: 40m)

> Support schemaUpdateOption in BigQueryIO
> 
>
> Key: BEAM-876
> URL: https://issues.apache.org/jira/browse/BEAM-876
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Eugene Kirpichov
>Assignee: canaan silberberg
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> BigQuery recently added support for updating the schema as a side effect of 
> the load job.
> Here is the relevant API method in JobConfigurationLoad: 
> https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/JobConfigurationLoad.html#setSchemaUpdateOptions(java.util.List)
> BigQueryIO should support this too. See user request for this: 
> http://stackoverflow.com/questions/40333245/is-it-possible-to-update-schema-while-doing-a-load-into-an-existing-bigquery-tab



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6995) SQL aggregation with where clause fails to plan

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6995?focusedWorklogId=323672&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323672
 ]

ASF GitHub Bot logged work on BEAM-6995:


Author: ASF GitHub Bot
Created on: 04/Oct/19 19:46
Start Date: 04/Oct/19 19:46
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9703: [BEAM-6995] 
Beam basic aggregation rule only when not windowed
URL: https://github.com/apache/beam/pull/9703#discussion_r331659215
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java
 ##
 @@ -84,6 +85,31 @@ public BeamAggregationRel(
 this.windowFieldIndex = windowFieldIndex;
   }
 
+  public BeamAggregationRel(
+  RelOptCluster cluster,
+  RelTraitSet traits,
+  RelNode child,
+  RelDataType rowType,
+  boolean indicator,
+  ImmutableBitSet groupSet,
+  List groupSets,
+  List aggCalls,
+  @Nullable WindowFn windowFn,
+  int windowFieldIndex) {
+this(
+cluster,
+traits,
+child,
+indicator,
+groupSet,
+groupSets,
+aggCalls,
+windowFn,
+windowFieldIndex);
+
+this.rowType = rowType;
 
 Review comment:
   I see, in that case I will remove this constructor. Do you think adding 
`deriveRowType();` to the original constructor makes sense or it would be 
redundant?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323672)
Time Spent: 3h 10m  (was: 3h)

> SQL aggregation with where clause fails to plan
> ---
>
> Key: BEAM-6995
> URL: https://issues.apache.org/jira/browse/BEAM-6995
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.11.0
>Reporter: David McIntosh
>Assignee: Kirill Kozlov
>Priority: Minor
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> I'm finding that this code fails with a CannotPlanException listed below.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .build();
> Row row = Row.withSchema(schema).addValues(1, 2).build();
> PCollection inputData = p.apply("row input", 
> Create.of(row).withRowSchema(schema));
> inputData.apply("sql",
> SqlTransform.query(
> "SELECT id, SUM(val) "
> + "FROM PCOLLECTION "
> + "WHERE val > 0 "
> + "GROUP BY id"));{code}
> If the WHERE clause is removed the code runs successfully.
> This may be similar to BEAM-5384 since I was able to work around this by 
> adding an extra column to the input that isn't reference in the sql.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .addInt32Field("extra")
>     .build();{code}
>  
> {code:java}
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException:
>  Node [rel#100:Subset#2.BEAM_LOGICAL] could not be implemented; planner state:
> Root: rel#100:Subset#2.BEAM_LOGICAL
> Original rel:
> LogicalAggregate(subset=[rel#100:Subset#2.BEAM_LOGICAL], group=[{0}], 
> EXPR$1=[SUM($1)]): rowcount = 5.0, cumulative cost = {5.687500238418579 rows, 
> 0.0 cpu, 0.0 io}, id = 98
>   LogicalFilter(subset=[rel#97:Subset#1.NONE], condition=[>($1, 0)]): 
> rowcount = 50.0, cumulative cost = {50.0 rows, 100.0 cpu, 0.0 io}, id = 96
> BeamIOSourceRel(subset=[rel#95:Subset#0.BEAM_LOGICAL], table=[[beam, 
> PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 
> 0.0 io}, id = 92
> Sets:
> Set#0, type: RecordType(INTEGER id, INTEGER val)
> rel#95:Subset#0.BEAM_LOGICAL, best=rel#92, 
> importance=0.7291
> rel#92:BeamIOSourceRel.BEAM_LOGICAL(table=[beam, 
> PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> rel#110:Subset#0.ENUMERABLE, best=rel#109, 
> importance=0.36455
> 
> rel#109:BeamEnumerableConverter.ENUMERABLE(input=rel#95:Subset#0.BEAM_LOGICAL),
>  rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, 
> 1.7976931348623157E308 cpu, 1.7976931348623157E308 io}
> Set#1, type: RecordType(INTEGER id, INTEGER val)
> rel#97:Subset#1.NONE, best=null, importance=0.81
> 
> rel#96:LogicalFilter.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,condition=>($1, 
> 0)), rowcount=50.0, cumulative cost={inf}
> 

[jira] [Work logged] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8351?focusedWorklogId=323677&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323677
 ]

ASF GitHub Bot logged work on BEAM-8351:


Author: ASF GitHub Bot
Created on: 04/Oct/19 20:06
Start Date: 04/Oct/19 20:06
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9730: [BEAM-8351] Support 
passing in arbitrary KV pairs to sdk worker via external environment config
URL: https://github.com/apache/beam/pull/9730#issuecomment-538542170
 
 
   ```
   11:58:18 * Module 
apache_beam.runners.portability.portable_runner_test
   11:58:18 W:328, 0: Bad indentation. Found 8 spaces, expected 6 
(bad-indentation)
   11:58:18 W:333, 0: Bad indentation. Found 8 spaces, expected 6 
(bad-indentation)
   11:58:18 C:337, 0: Line too long (82/80) (line-too-long)
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323677)
Time Spent: 1h 40m  (was: 1.5h)

> Support passing in arbitrary KV pairs to sdk worker via external environment 
> config
> ---
>
> Key: BEAM-8351
> URL: https://issues.apache.org/jira/browse/BEAM-8351
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Wanqi Lyu
>Assignee: Wanqi Lyu
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Originally, the environment config for environment type of EXTERNAL only 
> support passing in an url for the external worker pool; We want to support 
> passing in arbitrary KV pairs to sdk worker via external environment config, 
> so that the when starting the sdk harness we could get the values from 
> `StartWorkerRequest.params`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-5559) Beam Dependency Update Request: com.google.guava:guava

2019-10-04 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-5559:
-

Assignee: Lukasz Gajowy

> Beam Dependency Update Request: com.google.guava:guava
> --
>
> Key: BEAM-5559
> URL: https://issues.apache.org/jira/browse/BEAM-5559
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: 2.15.0
>
>
>  - 2018-10-01 19:30:53.471497 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 20.0. The latest version is 26.0-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-08 12:18:05.174889 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 20.0. The latest version is 26.0-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-04-15 12:32:27.737694 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 20.0. The latest version is 27.1-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-04-22 12:10:18.539470 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 20.0. The latest version is 27.1-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-5559) Beam Dependency Update Request: com.google.guava:guava

2019-10-04 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-5559:
--
Fix Version/s: (was: 2.15.0)

> Beam Dependency Update Request: com.google.guava:guava
> --
>
> Key: BEAM-5559
> URL: https://issues.apache.org/jira/browse/BEAM-5559
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Lukasz Gajowy
>Priority: Major
>
>  - 2018-10-01 19:30:53.471497 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 20.0. The latest version is 26.0-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-08 12:18:05.174889 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 20.0. The latest version is 26.0-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-04-15 12:32:27.737694 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 20.0. The latest version is 27.1-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-04-22 12:10:18.539470 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 20.0. The latest version is 27.1-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8329) AvroCoder ReflectData doesn't use TimestampConversion

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8329?focusedWorklogId=323693&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323693
 ]

ASF GitHub Bot logged work on BEAM-8329:


Author: ASF GitHub Bot
Created on: 04/Oct/19 20:41
Start Date: 04/Oct/19 20:41
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9702: 
[BEAM-8329] Add TimestampConversion to AvroCoder ReflectData
URL: https://github.com/apache/beam/pull/9702
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323693)
Time Spent: 0.5h  (was: 20m)

> AvroCoder ReflectData doesn't use TimestampConversion
> -
>
> Key: BEAM-8329
> URL: https://issues.apache.org/jira/browse/BEAM-8329
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.16.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The ReflectData created by AvroCoder doesn't have 
> {{TimeConversions.TimestampConversion}} registered which prevents it from 
> working with millis-instant joda DateTime instances. 
> AvroUtils does this statically 
> [here|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L78].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=323710&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323710
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 04/Oct/19 20:44
Start Date: 04/Oct/19 20:44
Worklog Time Spent: 10m 
  Work Description: soyrice commented on pull request #9607: [BEAM-8209] 
Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r331596572
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam";>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
+
+To specify an older Python SDK version, like 2.3.0, build the container with 
the `docker-tag` option:
+
+```
+./gradlew docker -Pdocker-tag=2.3.0
+```
+
+To change the `docker` target, build the container with the 
`docker-repository-root` option:
+
+```
+./gradlew docker -Pdocker-repository-root=$LOCATION
+```
+
+## Customizing container images
+
+You can add extra dependencies or serialization files to container images so 
the execution engine doesn't need them.
+
+To customize a container image, either:
+* [Write a new](#writing-new-dockerfiles) 
[Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the 
original
+* [Modify](#modifying-dockerfiles) the [original 
Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile)
 and reimage the container
+
+It's often easier to write a new Dockerfile, but you can customize anything, 
including the base OS, by modifying the original.
+
+### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles}
+
+
+
+Pull a https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release";>prebuilt
 SDK container image for your target language and version.
+
+
+https://docs.docker.com/develop/develop-images/dockerfile_best-practices/";>Write
 a new Dockerfile that https://docs.docker.com/engine/reference/builder/#from";>designates 
the original as its https://docs.docker.com/glossary/?term=parent%20image";>parent
+
+
+Build a child image: docker build -f /path/to/new/Dockerfile
+
+
+
+### Modifying the original Dockerfile {#modifying-dockerfiles}
+
+1. Pull the [prebuilt SDK container 
image](https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release)
 for your target language and version
+2. Customize the 
[Dockerfile](https:/

[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=323702&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323702
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 04/Oct/19 20:44
Start Date: 04/Oct/19 20:44
Worklog Time Spent: 10m 
  Work Description: soyrice commented on pull request #9607: [BEAM-8209] 
Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r331574098
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam";>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323702)
Time Spent: 6h 50m  (was: 6h 40m)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.17.0
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=323719&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323719
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 04/Oct/19 20:44
Start Date: 04/Oct/19 20:44
Worklog Time Spent: 10m 
  Work Description: soyrice commented on pull request #9607: [BEAM-8209] 
Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r331590943
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam";>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
+
+To specify an older Python SDK version, like 2.3.0, build the container with 
the `docker-tag` option:
+
+```
+./gradlew docker -Pdocker-tag=2.3.0
+```
+
+To change the `docker` target, build the container with the 
`docker-repository-root` option:
+
+```
+./gradlew docker -Pdocker-repository-root=$LOCATION
+```
+
+## Customizing container images
+
+You can add extra dependencies or serialization files to container images so 
the execution engine doesn't need them.
+
+To customize a container image, either:
+* [Write a new](#writing-new-dockerfiles) 
[Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the 
original
+* [Modify](#modifying-dockerfiles) the [original 
Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile)
 and reimage the container
+
+It's often easier to write a new Dockerfile, but you can customize anything, 
including the base OS, by modifying the original.
+
+### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles}
+
+
+
+Pull a https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release";>prebuilt
 SDK container image for your target language and version.
+
+
+https://docs.docker.com/develop/develop-images/dockerfile_best-practices/";>Write
 a new Dockerfile that https://docs.docker.com/engine/reference/builder/#from";>designates 
the original as its https://docs.docker.com/glossary/?term=parent%20image";>parent
+
+
+Build a child image: docker build -f /path/to/new/Dockerfile
+
+
+
+### Modifying the original Dockerfile {#modifying-dockerfiles}
+
+1. Pull the [prebuilt SDK container 
image](https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release)
 for your target language and version
+2. Customize the 
[Dockerfile](https:/

[jira] [Work logged] (BEAM-8209) Document custom docker containers

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=323700&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323700
 ]

ASF GitHub Bot logged work on BEAM-8209:


Author: ASF GitHub Bot
Created on: 04/Oct/19 20:44
Start Date: 04/Oct/19 20:44
Worklog Time Spent: 10m 
  Work Description: soyrice commented on pull request #9607: [BEAM-8209] 
Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r331569705
 
 

 ##
 File path: website/src/documentation/runtime/environments.md
 ##
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+
+
+Navigate to your local copy of the https://github.com/apache/beam";>beam
+
+
+Run Gradle with the docker target: ./gradlew 
docker
+
+
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY   TAGIMAGE 
IDCREATED   SIZE
+$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 
 3 minutes ago 1.27GB
+$USER-docker-apache.bintray.io/beam/java   latest 0103512f1d8f 
34 minutes ago  780MB
+$USER-docker-apache.bintray.io/beam/go latest ce055985808a 
35 minutes ago  121MB
+```
+
+Although the respository names look like URLs, the container images are 
stored locally on your workstation. After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323700)
Time Spent: 6.5h  (was: 6h 20m)

> Document custom docker containers
> -
>
> Key: BEAM-8209
> URL: https://issues.apache.org/jira/browse/BEAM-8209
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
> Fix For: 2.17.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >