[jira] [Work logged] (BEAM-6855) Side inputs are not supported when using the state API
[ https://issues.apache.org/jira/browse/BEAM-6855?focusedWorklogId=323322&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323322 ] ASF GitHub Bot logged work on BEAM-6855: Author: ASF GitHub Bot Created on: 04/Oct/19 10:43 Start Date: 04/Oct/19 10:43 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #9612: [BEAM-6855] Side inputs are not supported when using the state API URL: https://github.com/apache/beam/pull/9612 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323322) Time Spent: 6h 40m (was: 6.5h) > Side inputs are not supported when using the state API > -- > > Key: BEAM-6855 > URL: https://issues.apache.org/jira/browse/BEAM-6855 > Project: Beam > Issue Type: Bug > Components: runner-core, runner-dataflow, runner-direct >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 6h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-8162) NPE error when add flink 1.9 runner
[ https://issues.apache.org/jira/browse/BEAM-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed BEAM-8162. Fix Version/s: Not applicable Resolution: Duplicate This has been fixed and merged for the 2.17.0 release. We decided not to backport because this becomes relevant for Flink 1.9 which also will be introduced in 2.17.0. > NPE error when add flink 1.9 runner > --- > > Key: BEAM-8162 > URL: https://issues.apache.org/jira/browse/BEAM-8162 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: Not applicable > > Time Spent: 1h 50m > Remaining Estimate: 0h > > When add flink 1.9 runner in https://github.com/apache/beam/pull/9296, we > find an NPE error when run the `PortableTimersExecutionTest`. > the detail can be found here: > https://github.com/apache/beam/pull/9296#issuecomment-525262607 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8353) Replace Flink StreamingImpulseSource with generic timer-based source
Maximilian Michels created BEAM-8353: Summary: Replace Flink StreamingImpulseSource with generic timer-based source Key: BEAM-8353 URL: https://issues.apache.org/jira/browse/BEAM-8353 Project: Beam Issue Type: Improvement Components: runner-flink Reporter: Maximilian Michels The {{StreamingImpulseSource}} is a Flink Runner specific transform for load tests. Now that timers are supported, we should look into whether a timer-based generator could generate the same load as the current source. This would allow us to remove the proprietary implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing
[ https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=323338&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323338 ] ASF GitHub Bot logged work on BEAM-5707: Author: ASF GitHub Bot Created on: 04/Oct/19 11:25 Start Date: 04/Oct/19 11:25 Worklog Time Spent: 10m Work Description: mxm commented on pull request #9728: [BEAM-5707] Modify Flink streaming impulse function to include a counter URL: https://github.com/apache/beam/pull/9728 Flink's streaming impulse function is used for load tests which used to be impossible to realize with timers. This extends the functionality to send a per-partition counter with the impulse, instead of just an empty byte array. I've also filed https://jira.apache.org/jira/browse/BEAM-8353 to start the process of removing the proprietary source. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds
[jira] [Comment Edited] (BEAM-8319) Errorprone 0.0.13 fails during JDK11 build
[ https://issues.apache.org/jira/browse/BEAM-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942969#comment-16942969 ] Lukasz Gajowy edited comment on BEAM-8319 at 10/4/19 1:47 PM: -- I've found out that we're fine as long as guava's version is greater or equal to 23.5-jre. EDIT: version 23.1-jre is also fine. Versions 23.0 and 23.1-android fail with the following reason: {code:java} java.lang.NoSuchMethodError: com.google.common.collect.ImmutableSet.toImmutableSet()Ljava/util/stream/Collector; at com.google.errorprone.BugCheckerInfo.(BugCheckerInfo.java:119) at com.google.errorprone.BugCheckerInfo.create(BugCheckerInfo.java:105) at com.google.errorprone.scanner.BuiltInCheckerSuppliers.getSuppliers(BuiltInCheckerSuppliers.java:420) at com.google.errorprone.scanner.BuiltInCheckerSuppliers.getSuppliers(BuiltInCheckerSuppliers.java:413) at com.google.errorprone.scanner.BuiltInCheckerSuppliers.(BuiltInCheckerSuppliers.java:449) at com.google.errorprone.ErrorProneJavacPlugin.init(ErrorProneJavacPlugin.java:44) at com.sun.tools.javac.api.BasicJavacTask.initPlugins(BasicJavacTask.java:214) at com.sun.tools.javac.api.JavacTaskImpl.prepareCompiler(JavacTaskImpl.java:192) at com.sun.tools.javac.api.JavacTaskImpl.lambda$doCall$0(JavacTaskImpl.java:97) at com.sun.tools.javac.api.JavacTaskImpl.handleExceptions(JavacTaskImpl.java:142) at com.sun.tools.javac.api.JavacTaskImpl.doCall(JavacTaskImpl.java:96) at com.sun.tools.javac.api.JavacTaskImpl.call(JavacTaskImpl.java:90) at org.gradle.api.internal.tasks.compile.AnnotationProcessingCompileTask.call(AnnotationProcessingCompileTask.java:92) at org.gradle.api.internal.tasks.compile.ResourceCleaningCompilationTask.call(ResourceCleaningCompilationTask.java:57) at org.gradle.api.internal.tasks.compile.JdkJavaCompiler.execute(JdkJavaCompiler.java:50) at org.gradle.api.internal.tasks.compile.JdkJavaCompiler.execute(JdkJavaCompiler.java:36) at org.gradle.api.internal.tasks.compile.daemon.AbstractDaemonCompiler$CompilerCallable.call(AbstractDaemonCompiler.java:86) at org.gradle.api.internal.tasks.compile.daemon.AbstractDaemonCompiler$CompilerCallable.call(AbstractDaemonCompiler.java:74) at org.gradle.workers.internal.DefaultWorkerServer.execute(DefaultWorkerServer.java:42) at org.gradle.workers.internal.WorkerDaemonServer.execute(WorkerDaemonServer.java:38) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.gradle.process.internal.worker.request.WorkerAction.run(WorkerAction.java:111) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157) at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404) at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63) at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55) at java.lang.Thread.run(Thread.java:745) {code} was (Author: łukaszg): I've found out that we're fine as long as guava's version is greater or equal to 23.5-jre > Errorprone 0.0.13 fails during JDK11 build > -- > > Key: BEAM-8319 > URL: https://issues.apache.org/jira/browse/BEAM-8319 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Lukasz Gajowy >Assignee: Lukasz
[jira] [Work logged] (BEAM-8319) Errorprone 0.0.13 fails during JDK11 build
[ https://issues.apache.org/jira/browse/BEAM-8319?focusedWorklogId=323490&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323490 ] ASF GitHub Bot logged work on BEAM-8319: Author: ASF GitHub Bot Created on: 04/Oct/19 15:52 Start Date: 04/Oct/19 15:52 Worklog Time Spent: 10m Work Description: lgajowy commented on pull request #9729: [BEAM-8319] Errorprone + java11 prototype, not to be merged URL: https://github.com/apache/beam/pull/9729 This configuration was tested only on `sdks/java/core` module. The build on openjdk11.0.2 succeeds when configured as follows: ``` ./gradlew clean build -p sdks/java/core/ -xtest -xspotbugsMain ``` Sharing this so that more people could experiment and spot potential errors or point new directions (if any). Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.or
[jira] [Commented] (BEAM-8319) Errorprone 0.0.13 fails during JDK11 build
[ https://issues.apache.org/jira/browse/BEAM-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944618#comment-16944618 ] Lukasz Gajowy commented on BEAM-8319: - Thank you for your help. When guava is not forced, the version is 27.0.1 (I'm compiling only sdks/java/core to have a quick feedback). This is natural for Gradle - it by default uses the newest of conflicting versions and this is the case of 27.0.1 in sdks/java/core. I double checked once again, and to sum up: I got errorprone working on versions (greater or equal to): - errorprone plugin version: 0.8.1 - com.google.errorprone:error_prone_annotations:2.3.3 - com.google.errorprone:error_prone_core:2.3.3 - com.google.errorprone:javac:9+181-r4173-1 - guava version 23.1-jre I used openjdk11.0.2. Link to a pull request draft where I'm currently working on errorprone: [https://github.com/apache/beam/pull/9729] And the command I used for testing it on core: {code:java} ./gradlew clean build -p sdks/java/core/ -xtest -xspotbugsMain {code} Spotbugs and tests failed so I skipped them for now. Suprisingly, errorprone in this configuration shows lot's of warnings - is errorprone working properly now? My plan for now is to try to upgrade guava to the desired version (at least 23.1-jre). It may possibly require bumping other dependencies too. I will track progress here: https://issues.apache.org/jira/browse/BEAM-5559 > Errorprone 0.0.13 fails during JDK11 build > -- > > Key: BEAM-8319 > URL: https://issues.apache.org/jira/browse/BEAM-8319 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > I'm using openjdk 1.11.02. After switching version to; > {code:java} > javaVersion = 11 {code} > in BeamModule Plugin and running > {code:java} > ./gradlew clean build -p sdks/java/code -xtest {code} > building fails. I was able to run errorprone after upgrading it but had > problems with conflicting guava version. See more here: > https://issues.apache.org/jira/browse/BEAM-5085 > > Stacktrace: > {code:java} > org.gradle.api.tasks.TaskExecutionException: Execution failed for task > ':model:pipeline:compileJava'. > at > org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:121) > at > org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:117) > at org.gradle.internal.Try$Failure.ifSuccessfulOrElse(Try.java:184) > at > org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.execute(ExecuteActionsTaskExecuter.java:110) > at > org.gradle.api.internal.tasks.execution.ResolveIncrementalChangesTaskExecuter.execute(ResolveIncrementalChangesTaskExecuter.java:84) > at > org.gradle.api.internal.tasks.execution.ResolveTaskOutputCachingStateExecuter.execute(ResolveTaskOutputCachingStateExecuter.java:91) > at > org.gradle.api.internal.tasks.execution.FinishSnapshotTaskInputsBuildOperationTaskExecuter.execute(FinishSnapshotTaskInputsBuildOperationTaskExecuter.java:51) > at > org.gradle.api.internal.tasks.execution.ResolveBuildCacheKeyExecuter.execute(ResolveBuildCacheKeyExecuter.java:102) > at > org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionStateTaskExecuter.execute(ResolveBeforeExecutionStateTaskExecuter.java:74) > at > org.gradle.api.internal.tasks.execution.ValidatingTaskExecuter.execute(ValidatingTaskExecuter.java:58) > at > org.gradle.api.internal.tasks.execution.SkipEmptySourceFilesTaskExecuter.execute(SkipEmptySourceFilesTaskExecuter.java:109) > at > org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionOutputsTaskExecuter.execute(ResolveBeforeExecutionOutputsTaskExecuter.java:67) > at > org.gradle.api.internal.tasks.execution.StartSnapshotTaskInputsBuildOperationTaskExecuter.execute(StartSnapshotTaskInputsBuildOperationTaskExecuter.java:52) > at > org.gradle.api.internal.tasks.execution.ResolveAfterPreviousExecutionStateTaskExecuter.execute(ResolveAfterPreviousExecutionStateTaskExecuter.java:46) > at > org.gradle.api.internal.tasks.execution.CleanupStaleOutputsExecuter.execute(CleanupStaleOutputsExecuter.java:93) > at > org.gradle.api.internal.tasks.execution.FinalizePropertiesTaskExecuter.execute(FinalizePropertiesTaskExecuter.java:45) > at > org.gradle.api.internal.tasks.execution.ResolveTaskExecutionModeExecuter.execute(ResolveTaskExecutionModeExecuter.java:94) > at > org.gradle.api.internal.tasks.execution.SkipTaskWithNoActionsExecuter.execute(SkipTaskWithNoActionsExecuter.java:57)
[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4
[ https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=323497&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323497 ] ASF GitHub Bot logged work on BEAM-8350: Author: ASF GitHub Bot Created on: 04/Oct/19 15:58 Start Date: 04/Oct/19 15:58 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade to Pylint 2.4 URL: https://github.com/apache/beam/pull/9725#issuecomment-538457947 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323497) Time Spent: 50m (was: 40m) > Upgrade to pylint 2.4 > - > > Key: BEAM-8350 > URL: https://issues.apache.org/jira/browse/BEAM-8350 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > pylint 2.4 provides a number of new features and fixes, but the most > important/pressing one for me is that 2.4 adds support for understanding > python type annotations, which fixes a bunch of spurious unused import errors > in the PR I'm working on for BEAM-7746. > As of 2.0, pylint dropped support for running tests in python2, so to make > the upgrade we have to move our lint jobs to python3. Doing so will put > pylint into "python3-mode" and there is not an option to run in > python2-compatible mode. That said, the beam code is intended to be python3 > compatible, so in practice, performing a python3 lint on the Beam code-base > is perfectly safe. The primary risk of doing this is that someone introduces > a python-3 only change that breaks python2, but these would largely be syntax > errors that would be immediately caught by the unit and integration tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4
[ https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=323498&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323498 ] ASF GitHub Bot logged work on BEAM-8350: Author: ASF GitHub Bot Created on: 04/Oct/19 15:58 Start Date: 04/Oct/19 15:58 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade to Pylint 2.4 URL: https://github.com/apache/beam/pull/9725#issuecomment-538458010 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323498) Time Spent: 1h (was: 50m) > Upgrade to pylint 2.4 > - > > Key: BEAM-8350 > URL: https://issues.apache.org/jira/browse/BEAM-8350 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > pylint 2.4 provides a number of new features and fixes, but the most > important/pressing one for me is that 2.4 adds support for understanding > python type annotations, which fixes a bunch of spurious unused import errors > in the PR I'm working on for BEAM-7746. > As of 2.0, pylint dropped support for running tests in python2, so to make > the upgrade we have to move our lint jobs to python3. Doing so will put > pylint into "python3-mode" and there is not an option to run in > python2-compatible mode. That said, the beam code is intended to be python3 > compatible, so in practice, performing a python3 lint on the Beam code-base > is perfectly safe. The primary risk of doing this is that someone introduces > a python-3 only change that breaks python2, but these would largely be syntax > errors that would be immediately caught by the unit and integration tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4
[ https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=323503&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323503 ] ASF GitHub Bot logged work on BEAM-8350: Author: ASF GitHub Bot Created on: 04/Oct/19 15:59 Start Date: 04/Oct/19 15:59 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade to Pylint 2.4 URL: https://github.com/apache/beam/pull/9725#issuecomment-538169256 Here's a breakdown of the changes required to get to pylint 2.4: - fix a bunch of warnings about deprecated methods. mostly `logger.warn` and various unittest methods - update the names of a few error codes: `disable=unused-import` and `possibly-unused-variable` - ignore a bunch of newly introduced style warnings that did not seem important - run the lint on python-3.7: this ensures that it can run on test files that only work on python-37 due to syntax features - merge the lint tests into one test: - `run_pylint_2to3.sh` was just testing futurization. seems fine to do this all the time now that our code is python2 compliant - there was a "mini" test just for python3-compatibility. not needed anymore now that everything is running on python3 - stop running `pycodestyle`: it's run as part of `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323503) Time Spent: 1h 20m (was: 1h 10m) > Upgrade to pylint 2.4 > - > > Key: BEAM-8350 > URL: https://issues.apache.org/jira/browse/BEAM-8350 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > pylint 2.4 provides a number of new features and fixes, but the most > important/pressing one for me is that 2.4 adds support for understanding > python type annotations, which fixes a bunch of spurious unused import errors > in the PR I'm working on for BEAM-7746. > As of 2.0, pylint dropped support for running tests in python2, so to make > the upgrade we have to move our lint jobs to python3. Doing so will put > pylint into "python3-mode" and there is not an option to run in > python2-compatible mode. That said, the beam code is intended to be python3 > compatible, so in practice, performing a python3 lint on the Beam code-base > is perfectly safe. The primary risk of doing this is that someone introduces > a python-3 only change that breaks python2, but these would largely be syntax > errors that would be immediately caught by the unit and integration tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4
[ https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=323502&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323502 ] ASF GitHub Bot logged work on BEAM-8350: Author: ASF GitHub Bot Created on: 04/Oct/19 15:59 Start Date: 04/Oct/19 15:59 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade to Pylint 2.4 URL: https://github.com/apache/beam/pull/9725#issuecomment-538169256 Here's a breakdown of the changes required to get to pylint 2.4: - fix a bunch of warnings about deprecated methods. mostly `logger.warn` and various unittest methods - update the names of a few error codes: `disable=unused-import` and `possibly-unused-variable` - ignore a bunch of newly introduced style warnings that did not seem important - run the lint on python-3.7: this ensures that it can run on test files that only work on python-37 due to syntax features - merge the lint tests into one test: - `run_pylint_2to3.sh` was a test just for testing the futurization. seems fine to do this all the time now that our code is python2 compliant - there was a "mini" test just for python3-compatibility. not needed anymore now that everything is running on python3 - stop running `pycodestyle`: it's run as part of `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323502) Time Spent: 1h 10m (was: 1h) > Upgrade to pylint 2.4 > - > > Key: BEAM-8350 > URL: https://issues.apache.org/jira/browse/BEAM-8350 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > pylint 2.4 provides a number of new features and fixes, but the most > important/pressing one for me is that 2.4 adds support for understanding > python type annotations, which fixes a bunch of spurious unused import errors > in the PR I'm working on for BEAM-7746. > As of 2.0, pylint dropped support for running tests in python2, so to make > the upgrade we have to move our lint jobs to python3. Doing so will put > pylint into "python3-mode" and there is not an option to run in > python2-compatible mode. That said, the beam code is intended to be python3 > compatible, so in practice, performing a python3 lint on the Beam code-base > is perfectly safe. The primary risk of doing this is that someone introduces > a python-3 only change that breaks python2, but these would largely be syntax > errors that would be immediately caught by the unit and integration tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-5559) Beam Dependency Update Request: com.google.guava:guava
[ https://issues.apache.org/jira/browse/BEAM-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944623#comment-16944623 ] Lukasz Gajowy commented on BEAM-5559: - Thank you Luke - this is super helpful and I learned a lot. I tried to find a higher consitent guava version and bump it as high as possible but I'm currently blocked on: https://issues.apache.org/jira/browse/HADOOP-14891. We use version that is affected by the issue - for example when sdks/java/io/hadoop-file-system/ is built and guava is in version 21.0. I will try to update the hadoop version as well and see what happens. I will keep Jira posted. > Beam Dependency Update Request: com.google.guava:guava > -- > > Key: BEAM-5559 > URL: https://issues.apache.org/jira/browse/BEAM-5559 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Priority: Major > Fix For: 2.15.0 > > > - 2018-10-01 19:30:53.471497 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 20.0. The latest version is 26.0-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-08 12:18:05.174889 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 20.0. The latest version is 26.0-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-04-15 12:32:27.737694 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 20.0. The latest version is 27.1-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-04-22 12:10:18.539470 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 20.0. The latest version is 27.1-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3845) Avoid calling Class#newInstance
[ https://issues.apache.org/jira/browse/BEAM-3845?focusedWorklogId=323511&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323511 ] ASF GitHub Bot logged work on BEAM-3845: Author: ASF GitHub Bot Created on: 04/Oct/19 16:04 Start Date: 04/Oct/19 16:04 Worklog Time Spent: 10m Work Description: lgajowy commented on issue #9613: [BEAM-3845] Remove deprecated Class.newInstance() method usage URL: https://github.com/apache/beam/pull/9613#issuecomment-538460375 @iemejia does this PR LGTY? :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323511) Time Spent: 1h 10m (was: 1h) > Avoid calling Class#newInstance > --- > > Key: BEAM-3845 > URL: https://issues.apache.org/jira/browse/BEAM-3845 > Project: Beam > Issue Type: Task > Components: sdk-java-core >Reporter: Ted Yu >Assignee: Lukasz Gajowy >Priority: Minor > Labels: triaged > Time Spent: 1h 10m > Remaining Estimate: 0h > > Class#newInstance is deprecated starting in Java 9 - > https://bugs.openjdk.java.net/browse/JDK-6850612 - because it may throw > undeclared checked exceptions. > The suggested replacement is getDeclaredConstructor().newInstance(), which > wraps the checked exceptions in InvocationException. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7596) List is not parallel: Make it parallel
[ https://issues.apache.org/jira/browse/BEAM-7596?focusedWorklogId=323528&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323528 ] ASF GitHub Bot logged work on BEAM-7596: Author: ASF GitHub Bot Created on: 04/Oct/19 16:16 Start Date: 04/Oct/19 16:16 Worklog Time Spent: 10m Work Description: aaltay commented on issue #8912: [BEAM-7596] Align wording of list in web docs URL: https://github.com/apache/beam/pull/8912#issuecomment-538464772 Should this be re-based and merged? Or closed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323528) Time Spent: 2h 20m (was: 2h 10m) > List is not parallel: Make it parallel > -- > > Key: BEAM-7596 > URL: https://issues.apache.org/jira/browse/BEAM-7596 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Riona MacNamara >Assignee: Jonas Grabber >Priority: Trivial > Labels: Starter, ccoss2019 > Time Spent: 2h 20m > Remaining Estimate: 0h > > In [https://beam.apache.org/documentation/], the list under *Concepts* is not > parallel: > * The [Programming > Guide|https://beam.apache.org/documentation/programming-guide/] introduces > all the key Beam concepts. > * Learn about Beam’s [execution > model|https://beam.apache.org/documentation/execution-model/] to better > understand how pipelines execute. > * Visit [Learning > Resources|https://beam.apache.org/documentation/resources/learning-resources] > for some of our favorite articles and talks about Beam. > The first item should either begin with a verb, or the second two items > should be framed as a description of the doc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7917) Python datastore v1new fails on retry
[ https://issues.apache.org/jira/browse/BEAM-7917?focusedWorklogId=323538&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323538 ] ASF GitHub Bot logged work on BEAM-7917: Author: ASF GitHub Bot Created on: 04/Oct/19 16:23 Start Date: 04/Oct/19 16:23 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9294: [BEAM-7917] Fix datastore writes failing on retry URL: https://github.com/apache/beam/pull/9294#issuecomment-538467447 What are the next steps for this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323538) Time Spent: 2h 50m (was: 2h 40m) > Python datastore v1new fails on retry > - > > Key: BEAM-7917 > URL: https://issues.apache.org/jira/browse/BEAM-7917 > Project: Beam > Issue Type: Bug > Components: io-py-gcp, runner-dataflow >Affects Versions: 2.14.0 > Environment: Python 3.7 on Dataflow >Reporter: Dmytro Sadovnychyi >Assignee: Dmytro Sadovnychyi >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > Traceback (most recent call last): > File "apache_beam/runners/common.py", line 782, in > apache_beam.runners.common.DoFnRunner.process > File "apache_beam/runners/common.py", line 454, in > apache_beam.runners.common.SimpleInvoker.invoke_process > File > "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py", > line 334, in process > self._flush_batch() > File > "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py", > line 349, in _flush_batch > throttle_delay=util.WRITE_BATCH_TARGET_LATENCY_MS // 1000) > File "/usr/local/lib/python3.7/site-packages/apache_beam/utils/retry.py", > line 197, in wrapper > return fun(*args, **kwargs) > File > "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/helper.py", > line 99, in write_mutations > batch.commit() > File > "/usr/local/lib/python3.7/site-packages/google/cloud/datastore/batch.py", > line 271, in commit > raise ValueError("Batch must be in progress to commit()") > ValueError: Batch must be in progress to commit() -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7495) Add support for dynamic worker re-balancing when reading BigQuery data using Cloud Dataflow
[ https://issues.apache.org/jira/browse/BEAM-7495?focusedWorklogId=323547&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323547 ] ASF GitHub Bot logged work on BEAM-7495: Author: ASF GitHub Bot Created on: 04/Oct/19 16:30 Start Date: 04/Oct/19 16:30 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9083: [BEAM-7495] Improve the test that compares EXPORT and DIRECT_READ URL: https://github.com/apache/beam/pull/9083#issuecomment-538469590 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323547) Remaining Estimate: 490h 40m (was: 490h 50m) Time Spent: 13h 20m (was: 13h 10m) > Add support for dynamic worker re-balancing when reading BigQuery data using > Cloud Dataflow > --- > > Key: BEAM-7495 > URL: https://issues.apache.org/jira/browse/BEAM-7495 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Aryan Naraghi >Assignee: Aryan Naraghi >Priority: Major > Original Estimate: 504h > Time Spent: 13h 20m > Remaining Estimate: 490h 40m > > Currently, the BigQuery connector for reading data using the BigQuery Storage > API does not support any of the facilities on the source for Dataflow to > split streams. > > On the server side, the BigQuery Storage API supports splitting streams at a > fraction. By adding support to the connector, we enable Dataflow to split > streams, which unlocks dynamic worker re-balancing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config
[ https://issues.apache.org/jira/browse/BEAM-8351?focusedWorklogId=323549&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323549 ] ASF GitHub Bot logged work on BEAM-8351: Author: ASF GitHub Bot Created on: 04/Oct/19 16:32 Start Date: 04/Oct/19 16:32 Worklog Time Spent: 10m Work Description: violalyu commented on pull request #9730: [BEAM-8351] Support passing in arbitrary KV pairs to sdk worker via external environment config URL: https://github.com/apache/beam/pull/9730 Originally, the environment config for environment type of EXTERNAL only support passing in an url for the external worker pool; We want to support passing in arbitrary KV pairs to sdk worker via external environment config, so that the when starting the sdk harness we could get the values from `StartWorkerRequest.params`. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Bu
[jira] [Work logged] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config
[ https://issues.apache.org/jira/browse/BEAM-8351?focusedWorklogId=323550&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323550 ] ASF GitHub Bot logged work on BEAM-8351: Author: ASF GitHub Bot Created on: 04/Oct/19 16:33 Start Date: 04/Oct/19 16:33 Worklog Time Spent: 10m Work Description: violalyu commented on issue #9730: [BEAM-8351] Support passing in arbitrary KV pairs to sdk worker via external environment config URL: https://github.com/apache/beam/pull/9730#issuecomment-538470626 R: @robertwb R: @chadrik This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323550) Time Spent: 20m (was: 10m) > Support passing in arbitrary KV pairs to sdk worker via external environment > config > --- > > Key: BEAM-8351 > URL: https://issues.apache.org/jira/browse/BEAM-8351 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Wanqi Lyu >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Originally, the environment config for environment type of EXTERNAL only > support passing in an url for the external worker pool; We want to support > passing in arbitrary KV pairs to sdk worker via external environment config, > so that the when starting the sdk harness we could get the values from > `StartWorkerRequest.params`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8209) Document custom docker containers
[ https://issues.apache.org/jira/browse/BEAM-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu updated BEAM-8209: --- Fix Version/s: (was: 2.16.0) 2.17.0 > Document custom docker containers > - > > Key: BEAM-8209 > URL: https://issues.apache.org/jira/browse/BEAM-8209 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Cyrus Maden >Assignee: Cyrus Maden >Priority: Minor > Fix For: 2.17.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8209) Document custom docker containers
[ https://issues.apache.org/jira/browse/BEAM-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944654#comment-16944654 ] Mark Liu commented on BEAM-8209: Move this issue to 2.17 since it's a documentation change which should not block release itself. > Document custom docker containers > - > > Key: BEAM-8209 > URL: https://issues.apache.org/jira/browse/BEAM-8209 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Cyrus Maden >Assignee: Cyrus Maden >Priority: Minor > Fix For: 2.17.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7917) Python datastore v1new fails on retry
[ https://issues.apache.org/jira/browse/BEAM-7917?focusedWorklogId=323553&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323553 ] ASF GitHub Bot logged work on BEAM-7917: Author: ASF GitHub Bot Created on: 04/Oct/19 16:38 Start Date: 04/Oct/19 16:38 Worklog Time Spent: 10m Work Description: sadovnychyi commented on issue #9294: [BEAM-7917] Fix datastore writes failing on retry URL: https://github.com/apache/beam/pull/9294#issuecomment-538472297 Another review pass would be helpful. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323553) Time Spent: 3h (was: 2h 50m) > Python datastore v1new fails on retry > - > > Key: BEAM-7917 > URL: https://issues.apache.org/jira/browse/BEAM-7917 > Project: Beam > Issue Type: Bug > Components: io-py-gcp, runner-dataflow >Affects Versions: 2.14.0 > Environment: Python 3.7 on Dataflow >Reporter: Dmytro Sadovnychyi >Assignee: Dmytro Sadovnychyi >Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > Traceback (most recent call last): > File "apache_beam/runners/common.py", line 782, in > apache_beam.runners.common.DoFnRunner.process > File "apache_beam/runners/common.py", line 454, in > apache_beam.runners.common.SimpleInvoker.invoke_process > File > "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py", > line 334, in process > self._flush_batch() > File > "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py", > line 349, in _flush_batch > throttle_delay=util.WRITE_BATCH_TARGET_LATENCY_MS // 1000) > File "/usr/local/lib/python3.7/site-packages/apache_beam/utils/retry.py", > line 197, in wrapper > return fun(*args, **kwargs) > File > "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/helper.py", > line 99, in write_mutations > batch.commit() > File > "/usr/local/lib/python3.7/site-packages/google/cloud/datastore/batch.py", > line 271, in commit > raise ValueError("Batch must be in progress to commit()") > ValueError: Batch must be in progress to commit() -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-7907) Support customized container
[ https://issues.apache.org/jira/browse/BEAM-7907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu resolved BEAM-7907. Resolution: Fixed > Support customized container > > > Key: BEAM-7907 > URL: https://issues.apache.org/jira/browse/BEAM-7907 > Project: Beam > Issue Type: New Feature > Components: build-system, sdk-py-harness >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Labels: portability > Fix For: 2.16.0 > > > Support customized container. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7907) Support customized container
[ https://issues.apache.org/jira/browse/BEAM-7907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944655#comment-16944655 ] Mark Liu commented on BEAM-7907: The major tasks are closed in 2.16. The only one left is documentation which is moved to 2.17. I think it's okay to close it as an 2.16 release tracker. > Support customized container > > > Key: BEAM-7907 > URL: https://issues.apache.org/jira/browse/BEAM-7907 > Project: Beam > Issue Type: New Feature > Components: build-system, sdk-py-harness >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Labels: portability > Fix For: 2.16.0 > > > Support customized container. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config
[ https://issues.apache.org/jira/browse/BEAM-8351?focusedWorklogId=323559&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323559 ] ASF GitHub Bot logged work on BEAM-8351: Author: ASF GitHub Bot Created on: 04/Oct/19 16:45 Start Date: 04/Oct/19 16:45 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9730: [BEAM-8351] Support passing in arbitrary KV pairs to sdk worker via external environment config URL: https://github.com/apache/beam/pull/9730#issuecomment-538474791 R: @mxm This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323559) Time Spent: 0.5h (was: 20m) > Support passing in arbitrary KV pairs to sdk worker via external environment > config > --- > > Key: BEAM-8351 > URL: https://issues.apache.org/jira/browse/BEAM-8351 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Wanqi Lyu >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > Originally, the environment config for environment type of EXTERNAL only > support passing in an url for the external worker pool; We want to support > passing in arbitrary KV pairs to sdk worker via external environment config, > so that the when starting the sdk harness we could get the values from > `StartWorkerRequest.params`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines
[ https://issues.apache.org/jira/browse/BEAM-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8343: Description: The objective is to create a universal way for Beam SQL IO APIs to support predicate/project push-down. A proposed way to achieve that is by introducing an interface responsible for identifying what portion(s) of a Calc can be moved down to IO layer. Also, adding following methods to a BeamSqlTable interface to pass necessary parameters to IO APIs: - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter) - Boolean supportsProjects() - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, List fieldNames) Design doc [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing]. was: The objective is to create a universal way for Beam SQL IO APIs to support predicate/project push-down. A proposed way to achieve that is by introducing an interface responsible for identifying what portion(s) of a Calc can be moved down to IO layer. Also, adding following methods to a BeamSqlTable interface to pass necessary parameters to IO APIs: - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter) - Boolean supportsProjects() - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, List fieldNames) Design doc [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing]. > Add means for IO APIs to support predicate and/or project push-down when > running SQL pipelines > -- > > Key: BEAM-8343 > URL: https://issues.apache.org/jira/browse/BEAM-8343 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > The objective is to create a universal way for Beam SQL IO APIs to support > predicate/project push-down. > A proposed way to achieve that is by introducing an interface responsible > for identifying what portion(s) of a Calc can be moved down to IO layer. > Also, adding following methods to a BeamSqlTable interface to pass necessary > parameters to IO APIs: > - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter) > - Boolean supportsProjects() > - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, > List fieldNames) > > Design doc > [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config
[ https://issues.apache.org/jira/browse/BEAM-8351?focusedWorklogId=323564&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323564 ] ASF GitHub Bot logged work on BEAM-8351: Author: ASF GitHub Bot Created on: 04/Oct/19 16:49 Start Date: 04/Oct/19 16:49 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9730: [BEAM-8351] Support passing in arbitrary KV pairs to sdk worker via external environment config URL: https://github.com/apache/beam/pull/9730#discussion_r331592994 ## File path: sdks/python/apache_beam/runners/portability/portable_runner.py ## @@ -129,11 +129,25 @@ def _create_environment(options): env=(config.get('env') or '') ).SerializeToString()) elif environment_urn == common_urns.environments.EXTERNAL.urn: + def _looks_like_json(environment_config): +import re +return re.match(r'\{.+\}', environment_config) Review comment: use `re.search()`. also, I don't think this needs to be private, so remove the leading underscore. maybe add a comment like: "we don't use json.loads to test validity because we don't want to propagate json syntax errors downstream to the runner" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323564) Time Spent: 40m (was: 0.5h) > Support passing in arbitrary KV pairs to sdk worker via external environment > config > --- > > Key: BEAM-8351 > URL: https://issues.apache.org/jira/browse/BEAM-8351 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Wanqi Lyu >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > Originally, the environment config for environment type of EXTERNAL only > support passing in an url for the external worker pool; We want to support > passing in arbitrary KV pairs to sdk worker via external environment config, > so that the when starting the sdk harness we could get the values from > `StartWorkerRequest.params`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config
[ https://issues.apache.org/jira/browse/BEAM-8351?focusedWorklogId=323570&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323570 ] ASF GitHub Bot logged work on BEAM-8351: Author: ASF GitHub Bot Created on: 04/Oct/19 17:00 Start Date: 04/Oct/19 17:00 Worklog Time Spent: 10m Work Description: violalyu commented on pull request #9730: [BEAM-8351] Support passing in arbitrary KV pairs to sdk worker via external environment config URL: https://github.com/apache/beam/pull/9730#discussion_r331597120 ## File path: sdks/python/apache_beam/runners/portability/portable_runner.py ## @@ -129,11 +129,25 @@ def _create_environment(options): env=(config.get('env') or '') ).SerializeToString()) elif environment_urn == common_urns.environments.EXTERNAL.urn: + def _looks_like_json(environment_config): +import re +return re.match(r'\{.+\}', environment_config) Review comment: Thanks! For the `re.search()` part, do we want it to be not searching from the start? I was thinking that if it is valid json string it should start and end with '{}', maybe `re.match(r'\{.+\}$')` or `re.search(r'^\{.+\}$')`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323570) Time Spent: 50m (was: 40m) > Support passing in arbitrary KV pairs to sdk worker via external environment > config > --- > > Key: BEAM-8351 > URL: https://issues.apache.org/jira/browse/BEAM-8351 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Wanqi Lyu >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > Originally, the environment config for environment type of EXTERNAL only > support passing in an url for the external worker pool; We want to support > passing in arbitrary KV pairs to sdk worker via external environment config, > so that the when starting the sdk harness we could get the values from > `StartWorkerRequest.params`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=323579&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323579 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 04/Oct/19 17:01 Start Date: 04/Oct/19 17:01 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r331594582 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -0,0 +1,123 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from apache_beam.portability.api.beam_interactive_api_pb2 import InteractiveStreamRecord +from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload +from apache_beam.utils import timestamp + +from google.protobuf import timestamp_pb2 + + +def to_timestamp(timestamp_secs): + """Converts seconds since epoch to an apache_beam.util.Timestamp. + """ + return timestamp.Timestamp.of(timestamp_secs) + +def from_timestamp_proto(timestamp_proto): + return timestamp.Timestamp(seconds=timestamp_proto.seconds, + micros=timestamp_proto.nanos * 1000) + +def to_timestamp_usecs(ts): + """Converts a google.protobuf.Timestamp and + apache_beam.util.timestamp.Timestamp to seconds since epoch. + """ + if isinstance(ts, timestamp_pb2.Timestamp): +return (ts.seconds * 10**6) + (ts.nanos * 10**-3) + if isinstance(ts, timestamp.Timestamp): +return ts.micros + +class StreamingCache(object): + """Abstraction that holds the logic for reading and writing to cache. + """ + def __init__(self, readers): +self._readers = readers + + class Reader(object): +"""Abstraction that reads from PCollection readers. + +This class is an Abstraction layer over multiple PCollection readers to be +used for supplying the Interactive Service with TestStream events. + +This class is also responsible for holding the state of the clock, injecting +clock advancement events, and watermark advancement events. +""" +def __init__(self, readers): + self._readers = [reader.read() for reader in readers] Review comment: Is `self._readers` actually `events` returned from passed in readers? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323579) Time Spent: 1.5h (was: 1h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=323575&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323575 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 04/Oct/19 17:01 Start Date: 04/Oct/19 17:01 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r331592783 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -0,0 +1,123 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from apache_beam.portability.api.beam_interactive_api_pb2 import InteractiveStreamRecord +from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload +from apache_beam.utils import timestamp + +from google.protobuf import timestamp_pb2 + + +def to_timestamp(timestamp_secs): + """Converts seconds since epoch to an apache_beam.util.Timestamp. + """ + return timestamp.Timestamp.of(timestamp_secs) + +def from_timestamp_proto(timestamp_proto): + return timestamp.Timestamp(seconds=timestamp_proto.seconds, + micros=timestamp_proto.nanos * 1000) + +def to_timestamp_usecs(ts): Review comment: We have a very similar method in monitoring_infos.py. Should we move them to a common place? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323575) Time Spent: 1h 10m (was: 1h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=323571&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323571 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 04/Oct/19 17:01 Start Date: 04/Oct/19 17:01 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r331589935 ## File path: model/fn-execution/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.fn_execution.v1; + +option go_package = "fnexecution_v1"; +option java_package = "org.apache.beam.model.fnexecution.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; + +service InteractiveService { Review comment: Do we need a Reset request? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323571) Time Spent: 50m (was: 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=323574&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323574 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 04/Oct/19 17:01 Start Date: 04/Oct/19 17:01 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r331594849 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -0,0 +1,123 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from apache_beam.portability.api.beam_interactive_api_pb2 import InteractiveStreamRecord +from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload +from apache_beam.utils import timestamp + +from google.protobuf import timestamp_pb2 + + +def to_timestamp(timestamp_secs): + """Converts seconds since epoch to an apache_beam.util.Timestamp. + """ + return timestamp.Timestamp.of(timestamp_secs) + +def from_timestamp_proto(timestamp_proto): + return timestamp.Timestamp(seconds=timestamp_proto.seconds, + micros=timestamp_proto.nanos * 1000) + +def to_timestamp_usecs(ts): + """Converts a google.protobuf.Timestamp and + apache_beam.util.timestamp.Timestamp to seconds since epoch. + """ + if isinstance(ts, timestamp_pb2.Timestamp): +return (ts.seconds * 10**6) + (ts.nanos * 10**-3) + if isinstance(ts, timestamp.Timestamp): +return ts.micros + +class StreamingCache(object): + """Abstraction that holds the logic for reading and writing to cache. + """ + def __init__(self, readers): +self._readers = readers + + class Reader(object): +"""Abstraction that reads from PCollection readers. + +This class is an Abstraction layer over multiple PCollection readers to be +used for supplying the Interactive Service with TestStream events. + +This class is also responsible for holding the state of the clock, injecting +clock advancement events, and watermark advancement events. +""" +def __init__(self, readers): + self._readers = [reader.read() for reader in readers] + self._watermark = timestamp.MIN_TIMESTAMP Review comment: Should the default for watermark and timestamp be reflect something from the `readers`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323574) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=323572&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323572 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 04/Oct/19 17:01 Start Date: 04/Oct/19 17:01 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r331596793 ## File path: sdks/python/apache_beam/testing/interactive_stream.py ## @@ -0,0 +1,123 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import grpc +import time + +from apache_beam.portability.api import beam_interactive_api_pb2 +from apache_beam.portability.api import beam_interactive_api_pb2_grpc +from apache_beam.portability.api.beam_interactive_api_pb2_grpc import InteractiveServiceServicer +from concurrent.futures import ThreadPoolExecutor + + +def to_api_state(state): + if state == 'STOPPED': Review comment: Suggest a dict from strings -> proto states and return directly from dictionary.get(). That way if a bogus state is passed, this can throw an error instead of defaulting to running. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323572) Time Spent: 1h (was: 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=323576&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323576 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 04/Oct/19 17:01 Start Date: 04/Oct/19 17:01 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r331592276 ## File path: model/fn-execution/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,106 @@ +/* Review comment: I am not sure if mode/fn-execution is the right place. Maybe model/interactive. Also this file is very streaming related not generically usable for controlling all interactive pipelines. If that is accurate, the name could reflect that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323576) Time Spent: 1h 10m (was: 1h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=323577&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323577 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 04/Oct/19 17:01 Start Date: 04/Oct/19 17:01 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r331596932 ## File path: sdks/python/apache_beam/testing/interactive_stream.py ## @@ -0,0 +1,123 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import grpc +import time + +from apache_beam.portability.api import beam_interactive_api_pb2 +from apache_beam.portability.api import beam_interactive_api_pb2_grpc +from apache_beam.portability.api.beam_interactive_api_pb2_grpc import InteractiveServiceServicer +from concurrent.futures import ThreadPoolExecutor + + +def to_api_state(state): + if state == 'STOPPED': +return beam_interactive_api_pb2.StatusResponse.STOPPED + if state == 'PAUSED': +return beam_interactive_api_pb2.StatusResponse.PAUSED + return beam_interactive_api_pb2.StatusResponse.RUNNING + +class InteractiveStreamController(InteractiveServiceServicer): + def __init__(self, endpoint, streaming_cache): +self._endpoint = endpoint +self._server = grpc.server(ThreadPoolExecutor(max_workers=10)) Review comment: do we need max 10 workers for this controller? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323577) Time Spent: 1h 10m (was: 1h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=323578&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323578 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 04/Oct/19 17:01 Start Date: 04/Oct/19 17:01 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r331591095 ## File path: sdks/python/apache_beam/testing/interactive_stream.py ## @@ -0,0 +1,123 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import grpc +import time + +from apache_beam.portability.api import beam_interactive_api_pb2 +from apache_beam.portability.api import beam_interactive_api_pb2_grpc +from apache_beam.portability.api.beam_interactive_api_pb2_grpc import InteractiveServiceServicer +from concurrent.futures import ThreadPoolExecutor + + +def to_api_state(state): + if state == 'STOPPED': +return beam_interactive_api_pb2.StatusResponse.STOPPED + if state == 'PAUSED': +return beam_interactive_api_pb2.StatusResponse.PAUSED + return beam_interactive_api_pb2.StatusResponse.RUNNING + +class InteractiveStreamController(InteractiveServiceServicer): + def __init__(self, endpoint, streaming_cache): +self._endpoint = endpoint +self._server = grpc.server(ThreadPoolExecutor(max_workers=10)) +beam_interactive_api_pb2_grpc.add_InteractiveServiceServicer_to_server( +self, self._server) +self._server.add_insecure_port(self._endpoint) +self._server.start() + +self._streaming_cache = streaming_cache +self._state = 'STOPPED' +self._playback_speed = 1.0 + + def Start(self, request, context): +"""Requests that the Service starts emitting elements. +""" + +self._next_state('RUNNING') +self._playback_speed = request.playback_speed or 1.0 +self._playback_speed = max(min(self._playback_speed, 100.0), 0.001) +return beam_interactive_api_pb2.StartResponse() + + def Stop(self, request, context): +"""Requests that the Service stop emitting elements. +""" +self._next_state('STOPPED') +return beam_interactive_api_pb2.StartResponse() + + def Pause(self, request, context): +"""Requests that the Service pause emitting elements. +""" +self._next_state('PAUSED') +return beam_interactive_api_pb2.PauseResponse() + + def Step(self, request, context): +"""Requests that the Service emit a single element from each cached source. +""" +self._next_state('STEP') +return beam_interactive_api_pb2.StepResponse() + + def Status(self, request, context): +"""Returns the status of the service. +""" +resp = beam_interactive_api_pb2.StatusResponse() +resp.stream_time.GetCurrentTime() +resp.state = to_api_state(self._state) +return resp + + def _reset_state(self): +self._reader = None +self._playback_speed = 1.0 +self._state = 'STOPPED' + + def _next_state(self, state): +if not self._state or self._state == 'STOPPED': Review comment: When would `not self._state` be true. It seems to be alway set to something. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323578) Time Spent: 1h 20m (was: 1h 10m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > This issue tracks the work items
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=323573&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323573 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 04/Oct/19 17:01 Start Date: 04/Oct/19 17:01 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r331596241 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -0,0 +1,123 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from apache_beam.portability.api.beam_interactive_api_pb2 import InteractiveStreamRecord +from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload +from apache_beam.utils import timestamp + +from google.protobuf import timestamp_pb2 + + +def to_timestamp(timestamp_secs): + """Converts seconds since epoch to an apache_beam.util.Timestamp. + """ + return timestamp.Timestamp.of(timestamp_secs) + +def from_timestamp_proto(timestamp_proto): + return timestamp.Timestamp(seconds=timestamp_proto.seconds, + micros=timestamp_proto.nanos * 1000) + +def to_timestamp_usecs(ts): + """Converts a google.protobuf.Timestamp and + apache_beam.util.timestamp.Timestamp to seconds since epoch. + """ + if isinstance(ts, timestamp_pb2.Timestamp): +return (ts.seconds * 10**6) + (ts.nanos * 10**-3) + if isinstance(ts, timestamp.Timestamp): +return ts.micros + +class StreamingCache(object): + """Abstraction that holds the logic for reading and writing to cache. + """ + def __init__(self, readers): +self._readers = readers + + class Reader(object): +"""Abstraction that reads from PCollection readers. + +This class is an Abstraction layer over multiple PCollection readers to be +used for supplying the Interactive Service with TestStream events. + +This class is also responsible for holding the state of the clock, injecting +clock advancement events, and watermark advancement events. +""" +def __init__(self, readers): + self._readers = [reader.read() for reader in readers] + self._watermark = timestamp.MIN_TIMESTAMP + self._timestamp = timestamp.MIN_TIMESTAMP + +def read(self): + """Reads records from PCollection readers. + """ + records = [] + for r in self._readers: +try: + record = InteractiveStreamRecord() + record.ParseFromString(next(r)) + records.append(record) +except StopIteration: + pass + + events = [] + if not records: +self.advance_watermark(timestamp.MAX_TIMESTAMP, events) + + records.sort(key=lambda x: x.processing_time) + for r in records: +self.advance_processing_time( +from_timestamp_proto(r.processing_time), events) +self.advance_watermark(from_timestamp_proto(r.watermark), events) + +events.append(TestStreamPayload.Event( +element_event=TestStreamPayload.Event.AddElements( +elements=[r.element]))) + return events + +def advance_processing_time(self, processing_time, events): + """Advances the internal clock state and injects an AdvanceProcessingTime + event. + """ + if self._timestamp != processing_time: +duration = timestamp.Duration( +micros=processing_time.micros - self._timestamp.micros) +if self._timestamp == timestamp.MIN_TIMESTAMP: + duration = timestamp.Duration(micros=processing_time.micros) Review comment: For my learning, is this special case required for MIN_TIMESTAMP? Would not MIN_TIMESTAMP also have some non-zero self._timestamp.micros that needs to be accounted for duration calculation? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about thi
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=323580&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323580 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 04/Oct/19 17:04 Start Date: 04/Oct/19 17:04 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9664: [BEAM-7389] Created elementwise for consistency with docs URL: https://github.com/apache/beam/pull/9664#issuecomment-538481052 I do not see any deleted files or doc changes. Am I missing something? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323580) Time Spent: 63h (was: 62h 50m) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 63h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=323581&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323581 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 04/Oct/19 17:05 Start Date: 04/Oct/19 17:05 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9669: [BEAM-7389] Update include buttons to support multiple languages URL: https://github.com/apache/beam/pull/9669 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323581) Time Spent: 63h 10m (was: 63h) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 63h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=323584&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323584 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 04/Oct/19 17:05 Start Date: 04/Oct/19 17:05 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9692: [BEAM-7389] Update docs with matching code files URL: https://github.com/apache/beam/pull/9692#issuecomment-538481357 Please rebase this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323584) Time Spent: 63h 20m (was: 63h 10m) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 63h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config
[ https://issues.apache.org/jira/browse/BEAM-8351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Weise reassigned BEAM-8351: -- Assignee: Wanqi Lyu > Support passing in arbitrary KV pairs to sdk worker via external environment > config > --- > > Key: BEAM-8351 > URL: https://issues.apache.org/jira/browse/BEAM-8351 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Wanqi Lyu >Assignee: Wanqi Lyu >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > Originally, the environment config for environment type of EXTERNAL only > support passing in an url for the external worker pool; We want to support > passing in arbitrary KV pairs to sdk worker via external environment config, > so that the when starting the sdk harness we could get the values from > `StartWorkerRequest.params`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing
[ https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=323586&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323586 ] ASF GitHub Bot logged work on BEAM-5707: Author: ASF GitHub Bot Created on: 04/Oct/19 17:15 Start Date: 04/Oct/19 17:15 Worklog Time Spent: 10m Work Description: tweise commented on pull request #9728: [BEAM-5707] Modify Flink streaming impulse function to include a counter URL: https://github.com/apache/beam/pull/9728 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323586) Time Spent: 8h (was: 7h 50m) > Add a portable Flink streaming synthetic source for testing > --- > > Key: BEAM-5707 > URL: https://issues.apache.org/jira/browse/BEAM-5707 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Micah Wylde >Assignee: Micah Wylde >Priority: Minor > Fix For: 2.9.0 > > Time Spent: 8h > Remaining Estimate: 0h > > Currently there are no built-in streaming sources for portable pipelines. > This makes it hard to test streaming functionality in the Python SDK. > It would be very useful to add a periodic impulse source that (with some > configurable frequency) outputs an empty byte array, which can then be > transformed as desired inside the python pipeline. More context in this > [mailing list > discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config
[ https://issues.apache.org/jira/browse/BEAM-8351?focusedWorklogId=323587&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323587 ] ASF GitHub Bot logged work on BEAM-8351: Author: ASF GitHub Bot Created on: 04/Oct/19 17:16 Start Date: 04/Oct/19 17:16 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9730: [BEAM-8351] Support passing in arbitrary KV pairs to sdk worker via external environment config URL: https://github.com/apache/beam/pull/9730#discussion_r331603033 ## File path: sdks/python/apache_beam/runners/portability/portable_runner.py ## @@ -129,11 +129,25 @@ def _create_environment(options): env=(config.get('env') or '') ).SerializeToString()) elif environment_urn == common_urns.environments.EXTERNAL.urn: + def _looks_like_json(environment_config): +import re +return re.match(r'\{.+\}', environment_config) Review comment: There's also whitespace to consider. So if we're using `re.match` we have to do `re.match(r'\s*\{.*\}\s*$')`. We just need a simple heuristic that tells us that it looks json and not a url. The most important thing is that we don't end up with a regex that incorrectly returns False for something that is *actually* json, so I suggested the `re.search` option because I think it's sufficient at detecting something that's *not a url* and thus probably json. That said, I think the `re.match` regex above looks pretty safe. Obviously, we're only talking about json objects (i.e. dict) and not arrays and other scalars. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323587) Time Spent: 1h (was: 50m) > Support passing in arbitrary KV pairs to sdk worker via external environment > config > --- > > Key: BEAM-8351 > URL: https://issues.apache.org/jira/browse/BEAM-8351 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Wanqi Lyu >Assignee: Wanqi Lyu >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > Originally, the environment config for environment type of EXTERNAL only > support passing in an url for the external worker pool; We want to support > passing in arbitrary KV pairs to sdk worker via external environment config, > so that the when starting the sdk harness we could get the values from > `StartWorkerRequest.params`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=323588&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323588 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 04/Oct/19 17:18 Start Date: 04/Oct/19 17:18 Worklog Time Spent: 10m Work Description: davidcavazos commented on issue #9664: [BEAM-7389] Created elementwise for consistency with docs URL: https://github.com/apache/beam/pull/9664#issuecomment-538486069 That would break the staged versions and the notebooks would have to be generated in a second pass anyways. I'm doing that here #9692, that way we don't temporarily break things in the website. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323588) Time Spent: 63.5h (was: 63h 20m) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 63.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config
[ https://issues.apache.org/jira/browse/BEAM-8351?focusedWorklogId=323589&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323589 ] ASF GitHub Bot logged work on BEAM-8351: Author: ASF GitHub Bot Created on: 04/Oct/19 17:19 Start Date: 04/Oct/19 17:19 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9730: [BEAM-8351] Support passing in arbitrary KV pairs to sdk worker via external environment config URL: https://github.com/apache/beam/pull/9730#discussion_r331604295 ## File path: sdks/python/apache_beam/runners/portability/portable_runner.py ## @@ -129,11 +129,25 @@ def _create_environment(options): env=(config.get('env') or '') ).SerializeToString()) elif environment_urn == common_urns.environments.EXTERNAL.urn: + def _looks_like_json(environment_config): +import re +return re.match(r'\{.+\}', environment_config) Review comment: Just to reiterate, we want json syntax errors to occur here, at submission time, and not later on, so we want to answer "did the user attempt to pass a json map or a url?" So I was even considering `return '{' in config or '}' in config`, since curly braces should not be in a url. I'll defer the final answer on this to the reviewer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323589) Time Spent: 1h 10m (was: 1h) > Support passing in arbitrary KV pairs to sdk worker via external environment > config > --- > > Key: BEAM-8351 > URL: https://issues.apache.org/jira/browse/BEAM-8351 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Wanqi Lyu >Assignee: Wanqi Lyu >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > Originally, the environment config for environment type of EXTERNAL only > support passing in an url for the external worker pool; We want to support > passing in arbitrary KV pairs to sdk worker via external environment config, > so that the when starting the sdk harness we could get the values from > `StartWorkerRequest.params`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4
[ https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=323592&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323592 ] ASF GitHub Bot logged work on BEAM-8350: Author: ASF GitHub Bot Created on: 04/Oct/19 17:24 Start Date: 04/Oct/19 17:24 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade to Pylint 2.4 URL: https://github.com/apache/beam/pull/9725#issuecomment-538488002 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323592) Time Spent: 1.5h (was: 1h 20m) > Upgrade to pylint 2.4 > - > > Key: BEAM-8350 > URL: https://issues.apache.org/jira/browse/BEAM-8350 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > pylint 2.4 provides a number of new features and fixes, but the most > important/pressing one for me is that 2.4 adds support for understanding > python type annotations, which fixes a bunch of spurious unused import errors > in the PR I'm working on for BEAM-7746. > As of 2.0, pylint dropped support for running tests in python2, so to make > the upgrade we have to move our lint jobs to python3. Doing so will put > pylint into "python3-mode" and there is not an option to run in > python2-compatible mode. That said, the beam code is intended to be python3 > compatible, so in practice, performing a python3 lint on the Beam code-base > is perfectly safe. The primary risk of doing this is that someone introduces > a python-3 only change that breaks python2, but these would largely be syntax > errors that would be immediately caught by the unit and integration tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4
[ https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=323593&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323593 ] ASF GitHub Bot logged work on BEAM-8350: Author: ASF GitHub Bot Created on: 04/Oct/19 17:24 Start Date: 04/Oct/19 17:24 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade to Pylint 2.4 URL: https://github.com/apache/beam/pull/9725#issuecomment-538488071 Python2_PVR_Flink seems incredibly flaky This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323593) Time Spent: 1h 40m (was: 1.5h) > Upgrade to pylint 2.4 > - > > Key: BEAM-8350 > URL: https://issues.apache.org/jira/browse/BEAM-8350 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > pylint 2.4 provides a number of new features and fixes, but the most > important/pressing one for me is that 2.4 adds support for understanding > python type annotations, which fixes a bunch of spurious unused import errors > in the PR I'm working on for BEAM-7746. > As of 2.0, pylint dropped support for running tests in python2, so to make > the upgrade we have to move our lint jobs to python3. Doing so will put > pylint into "python3-mode" and there is not an option to run in > python2-compatible mode. That said, the beam code is intended to be python3 > compatible, so in practice, performing a python3 lint on the Beam code-base > is perfectly safe. The primary risk of doing this is that someone introduces > a python-3 only change that breaks python2, but these would largely be syntax > errors that would be immediately caught by the unit and integration tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4
[ https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=323602&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323602 ] ASF GitHub Bot logged work on BEAM-8350: Author: ASF GitHub Bot Created on: 04/Oct/19 17:38 Start Date: 04/Oct/19 17:38 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade to Pylint 2.4 URL: https://github.com/apache/beam/pull/9725#issuecomment-538169256 Here's a breakdown of the changes required to get to pylint 2.4: - fix a bunch of warnings about deprecated methods. mostly `logger.warn` and various unittest methods - update the names of a few error codes: `disable=unused-import` and `possibly-unused-variable` - ignore a bunch of newly introduced style warnings that did not seem important - run the lint using python-3.7: this ensures that it can run on test files that only work on python-37 due to syntax features - merge the lint tests into one test: - `run_pylint_2to3.sh` was just testing futurization. seems fine to do this all the time now that our code is python2 compliant - there was a "mini" test just for python3-compatibility. not needed anymore now that everything is running on python3 - stop running `pycodestyle`: it's run as part of `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323602) Time Spent: 1h 50m (was: 1h 40m) > Upgrade to pylint 2.4 > - > > Key: BEAM-8350 > URL: https://issues.apache.org/jira/browse/BEAM-8350 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > pylint 2.4 provides a number of new features and fixes, but the most > important/pressing one for me is that 2.4 adds support for understanding > python type annotations, which fixes a bunch of spurious unused import errors > in the PR I'm working on for BEAM-7746. > As of 2.0, pylint dropped support for running tests in python2, so to make > the upgrade we have to move our lint jobs to python3. Doing so will put > pylint into "python3-mode" and there is not an option to run in > python2-compatible mode. That said, the beam code is intended to be python3 > compatible, so in practice, performing a python3 lint on the Beam code-base > is perfectly safe. The primary risk of doing this is that someone introduces > a python-3 only change that breaks python2, but these would largely be syntax > errors that would be immediately caught by the unit and integration tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8303) Filesystems not properly registered using FileIO.write() on FlinkRunner
[ https://issues.apache.org/jira/browse/BEAM-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-8303: -- Summary: Filesystems not properly registered using FileIO.write() on FlinkRunner (was: Filesystems not properly registered using FileIO.write()) > Filesystems not properly registered using FileIO.write() on FlinkRunner > --- > > Key: BEAM-8303 > URL: https://issues.apache.org/jira/browse/BEAM-8303 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.15.0 >Reporter: Preston Koprivica >Assignee: Maximilian Michels >Priority: Critical > Fix For: 2.17.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > I’m getting the following error when attempting to use the FileIO apis > (beam-2.15.0) and integrating with AWS S3. I have setup the PipelineOptions > with all the relevant AWS options, so the filesystem registry **should** be > properly seeded by the time the graph is compiled and executed: > {code:java} > java.lang.IllegalArgumentException: No filesystem found for scheme s3 > at > org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456) > at > org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105) > at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159) > at > org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83) > at > org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480) > at > org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93) > at > org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55) > at > org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106) > at > org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72) > at > org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47) > at > org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) > at > org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503) > at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > at java.lang.Thread.run(Thread.java:748) > {code} > For reference, the write code resembles this: > {code:java} > FileIO.Write write = FileIO.write() > .via(ParquetIO.sink(schema)) > .to(options.getOutputDir()). // will be something like: > s3:/// > .withSuffix(".parquet"); > records.apply(String.format("Write(%s)", options.getOutputDir()), > write);{code} > The issue does not appear to be related to ParquetIO.sink(). I am able to > reliably reproduce the issue using JSON formatted records and TextIO.sink(), > as well. Moreover, AvroIO is affected if withWindowedWrites() option is > added. > Just trying some different knobs, I went ahead and set the following option: > {code:java} > write = write.withNoSpilling();{code} > This actually seemed to fix the issue, only to have it reemerge as I scaled > up the data set size. The stack trace, while very similar, reads: > {code:java} > java.lang.IllegalArgumentException: No filesystem found for scheme s3 > at > org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456) > at > org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105) > at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159) > at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:82) > at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:36) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(Win
[jira] [Updated] (BEAM-8303) Filesystems not properly registered using FileIO.write()
[ https://issues.apache.org/jira/browse/BEAM-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-8303: -- Component/s: (was: sdk-java-core) > Filesystems not properly registered using FileIO.write() > > > Key: BEAM-8303 > URL: https://issues.apache.org/jira/browse/BEAM-8303 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.15.0 >Reporter: Preston Koprivica >Assignee: Maximilian Michels >Priority: Critical > Fix For: 2.17.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > I’m getting the following error when attempting to use the FileIO apis > (beam-2.15.0) and integrating with AWS S3. I have setup the PipelineOptions > with all the relevant AWS options, so the filesystem registry **should** be > properly seeded by the time the graph is compiled and executed: > {code:java} > java.lang.IllegalArgumentException: No filesystem found for scheme s3 > at > org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456) > at > org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105) > at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159) > at > org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83) > at > org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480) > at > org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93) > at > org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55) > at > org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106) > at > org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72) > at > org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47) > at > org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) > at > org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503) > at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > at java.lang.Thread.run(Thread.java:748) > {code} > For reference, the write code resembles this: > {code:java} > FileIO.Write write = FileIO.write() > .via(ParquetIO.sink(schema)) > .to(options.getOutputDir()). // will be something like: > s3:/// > .withSuffix(".parquet"); > records.apply(String.format("Write(%s)", options.getOutputDir()), > write);{code} > The issue does not appear to be related to ParquetIO.sink(). I am able to > reliably reproduce the issue using JSON formatted records and TextIO.sink(), > as well. Moreover, AvroIO is affected if withWindowedWrites() option is > added. > Just trying some different knobs, I went ahead and set the following option: > {code:java} > write = write.withNoSpilling();{code} > This actually seemed to fix the issue, only to have it reemerge as I scaled > up the data set size. The stack trace, while very similar, reads: > {code:java} > java.lang.IllegalArgumentException: No filesystem found for scheme s3 > at > org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456) > at > org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105) > at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159) > at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:82) > at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:36) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534) > at > o
[jira] [Updated] (BEAM-8303) Filesystems not properly registered using FileIO.write() on FlinkRunner
[ https://issues.apache.org/jira/browse/BEAM-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-8303: -- Component/s: sdk-java-core > Filesystems not properly registered using FileIO.write() on FlinkRunner > --- > > Key: BEAM-8303 > URL: https://issues.apache.org/jira/browse/BEAM-8303 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-java-core >Affects Versions: 2.15.0 >Reporter: Preston Koprivica >Assignee: Maximilian Michels >Priority: Critical > Fix For: 2.17.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > I’m getting the following error when attempting to use the FileIO apis > (beam-2.15.0) and integrating with AWS S3. I have setup the PipelineOptions > with all the relevant AWS options, so the filesystem registry **should** be > properly seeded by the time the graph is compiled and executed: > {code:java} > java.lang.IllegalArgumentException: No filesystem found for scheme s3 > at > org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456) > at > org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105) > at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159) > at > org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83) > at > org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480) > at > org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93) > at > org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55) > at > org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106) > at > org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72) > at > org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47) > at > org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) > at > org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503) > at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > at java.lang.Thread.run(Thread.java:748) > {code} > For reference, the write code resembles this: > {code:java} > FileIO.Write write = FileIO.write() > .via(ParquetIO.sink(schema)) > .to(options.getOutputDir()). // will be something like: > s3:/// > .withSuffix(".parquet"); > records.apply(String.format("Write(%s)", options.getOutputDir()), > write);{code} > The issue does not appear to be related to ParquetIO.sink(). I am able to > reliably reproduce the issue using JSON formatted records and TextIO.sink(), > as well. Moreover, AvroIO is affected if withWindowedWrites() option is > added. > Just trying some different knobs, I went ahead and set the following option: > {code:java} > write = write.withNoSpilling();{code} > This actually seemed to fix the issue, only to have it reemerge as I scaled > up the data set size. The stack trace, while very similar, reads: > {code:java} > java.lang.IllegalArgumentException: No filesystem found for scheme s3 > at > org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456) > at > org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105) > at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159) > at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:82) > at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:36) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(Win
[jira] [Updated] (BEAM-8303) Filesystems not properly registered using FileIO.write()
[ https://issues.apache.org/jira/browse/BEAM-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-8303: -- Component/s: runner-flink > Filesystems not properly registered using FileIO.write() > > > Key: BEAM-8303 > URL: https://issues.apache.org/jira/browse/BEAM-8303 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-java-core >Affects Versions: 2.15.0 >Reporter: Preston Koprivica >Assignee: Maximilian Michels >Priority: Critical > Fix For: 2.17.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > I’m getting the following error when attempting to use the FileIO apis > (beam-2.15.0) and integrating with AWS S3. I have setup the PipelineOptions > with all the relevant AWS options, so the filesystem registry **should** be > properly seeded by the time the graph is compiled and executed: > {code:java} > java.lang.IllegalArgumentException: No filesystem found for scheme s3 > at > org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456) > at > org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105) > at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159) > at > org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83) > at > org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480) > at > org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93) > at > org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55) > at > org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106) > at > org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72) > at > org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47) > at > org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) > at > org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503) > at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > at java.lang.Thread.run(Thread.java:748) > {code} > For reference, the write code resembles this: > {code:java} > FileIO.Write write = FileIO.write() > .via(ParquetIO.sink(schema)) > .to(options.getOutputDir()). // will be something like: > s3:/// > .withSuffix(".parquet"); > records.apply(String.format("Write(%s)", options.getOutputDir()), > write);{code} > The issue does not appear to be related to ParquetIO.sink(). I am able to > reliably reproduce the issue using JSON formatted records and TextIO.sink(), > as well. Moreover, AvroIO is affected if withWindowedWrites() option is > added. > Just trying some different knobs, I went ahead and set the following option: > {code:java} > write = write.withNoSpilling();{code} > This actually seemed to fix the issue, only to have it reemerge as I scaled > up the data set size. The stack trace, while very similar, reads: > {code:java} > java.lang.IllegalArgumentException: No filesystem found for scheme s3 > at > org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456) > at > org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105) > at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159) > at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:82) > at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:36) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534) > at
[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines
[ https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=323607&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323607 ] ASF GitHub Bot logged work on BEAM-8343: Author: ASF GitHub Bot Created on: 04/Oct/19 17:41 Start Date: 04/Oct/19 17:41 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #9731: [BEAM-8343] Added nessesary methods to BeamSqlTable to enable support for predicate/project push-down URL: https://github.com/apache/beam/pull/9731 - Added methods needed for predicate/project push-down to BeamSqlTable interface. - Create a sql.meta.BeamSqlTableFilter interface and a default implementation for it sql.meta.DefaultTableFilter. Based on top of #9718 Design doc [link](https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing). Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Pytho
[jira] [Work logged] (BEAM-8092) Switch to java Optional in DirectRunner
[ https://issues.apache.org/jira/browse/BEAM-8092?focusedWorklogId=323605&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323605 ] ASF GitHub Bot logged work on BEAM-8092: Author: ASF GitHub Bot Created on: 04/Oct/19 17:41 Start Date: 04/Oct/19 17:41 Worklog Time Spent: 10m Work Description: je-ik commented on issue #9431: [BEAM-8092] change guava's Optional to java.util in DirectRunner URL: https://github.com/apache/beam/pull/9431#issuecomment-538494089 @apilloud thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323605) Time Spent: 50m (was: 40m) > Switch to java Optional in DirectRunner > --- > > Key: BEAM-8092 > URL: https://issues.apache.org/jira/browse/BEAM-8092 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Affects Versions: 2.15.0 >Reporter: Jan Lukavský >Assignee: Jan Lukavský >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8092) Switch to java Optional in DirectRunner
[ https://issues.apache.org/jira/browse/BEAM-8092?focusedWorklogId=323608&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323608 ] ASF GitHub Bot logged work on BEAM-8092: Author: ASF GitHub Bot Created on: 04/Oct/19 17:41 Start Date: 04/Oct/19 17:41 Worklog Time Spent: 10m Work Description: je-ik commented on pull request #9431: [BEAM-8092] change guava's Optional to java.util in DirectRunner URL: https://github.com/apache/beam/pull/9431 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323608) Time Spent: 1h (was: 50m) > Switch to java Optional in DirectRunner > --- > > Key: BEAM-8092 > URL: https://issues.apache.org/jira/browse/BEAM-8092 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Affects Versions: 2.15.0 >Reporter: Jan Lukavský >Assignee: Jan Lukavský >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines
[ https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=323609&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323609 ] ASF GitHub Bot logged work on BEAM-8343: Author: ASF GitHub Bot Created on: 04/Oct/19 17:41 Start Date: 04/Oct/19 17:41 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #9731: [BEAM-8343] Added nessesary methods to BeamSqlTable to enable support for predicate/project push-down URL: https://github.com/apache/beam/pull/9731#issuecomment-538494247 R: @apilloud This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323609) Time Spent: 1h (was: 50m) > Add means for IO APIs to support predicate and/or project push-down when > running SQL pipelines > -- > > Key: BEAM-8343 > URL: https://issues.apache.org/jira/browse/BEAM-8343 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > The objective is to create a universal way for Beam SQL IO APIs to support > predicate/project push-down. > A proposed way to achieve that is by introducing an interface responsible > for identifying what portion(s) of a Calc can be moved down to IO layer. > Also, adding following methods to a BeamSqlTable interface to pass necessary > parameters to IO APIs: > - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter) > - Boolean supportsProjects() > - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, > List fieldNames) > > Design doc > [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7765) Add test for snippet accessing_valueprovider_info_after_run
[ https://issues.apache.org/jira/browse/BEAM-7765?focusedWorklogId=323606&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323606 ] ASF GitHub Bot logged work on BEAM-7765: Author: ASF GitHub Bot Created on: 04/Oct/19 17:41 Start Date: 04/Oct/19 17:41 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9685: [BEAM-7765] - Add test for snippet accessing_valueprovider_info_after_run URL: https://github.com/apache/beam/pull/9685#issuecomment-538494124 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323606) Time Spent: 1.5h (was: 1h 20m) > Add test for snippet accessing_valueprovider_info_after_run > --- > > Key: BEAM-7765 > URL: https://issues.apache.org/jira/browse/BEAM-7765 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: John Patoch >Priority: Major > Labels: easy > Time Spent: 1.5h > Remaining Estimate: 0h > > This snippet needs a unit test. > It has bugs. For example: > - apache_beam.utils.value_provider doesn't exist > - beam.combiners.Sum doesn't exist > - unused import of: WriteToText > cc: [~pabloem] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8303) Filesystems not properly registered using FileIO.write() on FlinkRunner
[ https://issues.apache.org/jira/browse/BEAM-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944708#comment-16944708 ] Kenneth Knowles commented on BEAM-8303: --- Is the pain mild, given the presence of a workaround? > Filesystems not properly registered using FileIO.write() on FlinkRunner > --- > > Key: BEAM-8303 > URL: https://issues.apache.org/jira/browse/BEAM-8303 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-java-core >Affects Versions: 2.15.0 >Reporter: Preston Koprivica >Assignee: Maximilian Michels >Priority: Critical > Fix For: 2.17.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > I’m getting the following error when attempting to use the FileIO apis > (beam-2.15.0) and integrating with AWS S3. I have setup the PipelineOptions > with all the relevant AWS options, so the filesystem registry **should** be > properly seeded by the time the graph is compiled and executed: > {code:java} > java.lang.IllegalArgumentException: No filesystem found for scheme s3 > at > org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456) > at > org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105) > at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159) > at > org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83) > at > org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480) > at > org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93) > at > org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55) > at > org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106) > at > org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72) > at > org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47) > at > org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) > at > org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503) > at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > at java.lang.Thread.run(Thread.java:748) > {code} > For reference, the write code resembles this: > {code:java} > FileIO.Write write = FileIO.write() > .via(ParquetIO.sink(schema)) > .to(options.getOutputDir()). // will be something like: > s3:/// > .withSuffix(".parquet"); > records.apply(String.format("Write(%s)", options.getOutputDir()), > write);{code} > The issue does not appear to be related to ParquetIO.sink(). I am able to > reliably reproduce the issue using JSON formatted records and TextIO.sink(), > as well. Moreover, AvroIO is affected if withWindowedWrites() option is > added. > Just trying some different knobs, I went ahead and set the following option: > {code:java} > write = write.withNoSpilling();{code} > This actually seemed to fix the issue, only to have it reemerge as I scaled > up the data set size. The stack trace, while very similar, reads: > {code:java} > java.lang.IllegalArgumentException: No filesystem found for scheme s3 > at > org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456) > at > org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149) > at > org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105) > at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159) > at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:82) > at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:36) > at > org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543) >
[jira] [Resolved] (BEAM-8092) Switch to java Optional in DirectRunner
[ https://issues.apache.org/jira/browse/BEAM-8092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Lukavský resolved BEAM-8092. Resolution: Fixed > Switch to java Optional in DirectRunner > --- > > Key: BEAM-8092 > URL: https://issues.apache.org/jira/browse/BEAM-8092 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Affects Versions: 2.15.0 >Reporter: Jan Lukavský >Assignee: Jan Lukavský >Priority: Minor > Fix For: 2.17.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8092) Switch to java Optional in DirectRunner
[ https://issues.apache.org/jira/browse/BEAM-8092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Lukavský updated BEAM-8092: --- Fix Version/s: 2.17.0 > Switch to java Optional in DirectRunner > --- > > Key: BEAM-8092 > URL: https://issues.apache.org/jira/browse/BEAM-8092 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Affects Versions: 2.15.0 >Reporter: Jan Lukavský >Assignee: Jan Lukavský >Priority: Minor > Fix For: 2.17.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines
[ https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=323611&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323611 ] ASF GitHub Bot logged work on BEAM-8343: Author: ASF GitHub Bot Created on: 04/Oct/19 17:45 Start Date: 04/Oct/19 17:45 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #9718: [BEAM-8343] Moved BeamTable classes to a more appropriate location URL: https://github.com/apache/beam/pull/9718#issuecomment-538495582 > I think this should be broken up so the refactoring is separate from the new features. > > PR 1: > > * Moved sql.BeamSqlTable → sql.meta.BeamSqlTable. > * Renamed sql.imp.schema.BaseBeamTable → sql.imp.schema.SchemaBaseBeamTable. > * Created a sql.meta.BaseBeamTable to be a default implementation of BeamSqlTable interface. > * Updated existing IO Table classes to extend sql.meta.BaseBeamTable. > > PR 2: > > * Added methods needed for predicate/project push-down to BeamSqlTable interface. > * Create a sql.meta.BeamSqlTableFilter interface and a default implementation for it sql.meta.DefaultTableFilter. Makes sense, done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323611) Time Spent: 1h 10m (was: 1h) > Add means for IO APIs to support predicate and/or project push-down when > running SQL pipelines > -- > > Key: BEAM-8343 > URL: https://issues.apache.org/jira/browse/BEAM-8343 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > The objective is to create a universal way for Beam SQL IO APIs to support > predicate/project push-down. > A proposed way to achieve that is by introducing an interface responsible > for identifying what portion(s) of a Calc can be moved down to IO layer. > Also, adding following methods to a BeamSqlTable interface to pass necessary > parameters to IO APIs: > - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter) > - Boolean supportsProjects() > - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, > List fieldNames) > > Design doc > [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-8092) Switch to java Optional in DirectRunner
[ https://issues.apache.org/jira/browse/BEAM-8092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Lukavský closed BEAM-8092. -- > Switch to java Optional in DirectRunner > --- > > Key: BEAM-8092 > URL: https://issues.apache.org/jira/browse/BEAM-8092 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Affects Versions: 2.15.0 >Reporter: Jan Lukavský >Assignee: Jan Lukavský >Priority: Minor > Fix For: 2.17.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines
[ https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=323614&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323614 ] ASF GitHub Bot logged work on BEAM-8343: Author: ASF GitHub Bot Created on: 04/Oct/19 17:47 Start Date: 04/Oct/19 17:47 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #9718: [BEAM-8343] Moved BeamTable classes to a more appropriate location URL: https://github.com/apache/beam/pull/9718#discussion_r331615203 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BaseBeamTable.java ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta; + +import java.util.List; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram; + +/** Basic implementation of {@link BeamSqlTable} methods used by predicate and filter push-down. */ +public abstract class BaseBeamTable implements BeamSqlTable { + + @Override + public PCollection buildIOReader( + PBegin begin, BeamSqlTableFilter filters, List fieldNames) { +return buildIOReader(begin); Review comment: Modified default implementation for this method to throw an exception in PR2. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323614) Time Spent: 1h 20m (was: 1h 10m) > Add means for IO APIs to support predicate and/or project push-down when > running SQL pipelines > -- > > Key: BEAM-8343 > URL: https://issues.apache.org/jira/browse/BEAM-8343 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > The objective is to create a universal way for Beam SQL IO APIs to support > predicate/project push-down. > A proposed way to achieve that is by introducing an interface responsible > for identifying what portion(s) of a Calc can be moved down to IO layer. > Also, adding following methods to a BeamSqlTable interface to pass necessary > parameters to IO APIs: > - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter) > - Boolean supportsProjects() > - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, > List fieldNames) > > Design doc > [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7765) Add test for snippet accessing_valueprovider_info_after_run
[ https://issues.apache.org/jira/browse/BEAM-7765?focusedWorklogId=323617&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323617 ] ASF GitHub Bot logged work on BEAM-7765: Author: ASF GitHub Bot Created on: 04/Oct/19 17:48 Start Date: 04/Oct/19 17:48 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9685: [BEAM-7765] - Add test for snippet accessing_valueprovider_info_after_run URL: https://github.com/apache/beam/pull/9685#discussion_r331615148 ## File path: sdks/python/apache_beam/examples/snippets/snippets_test.py ## @@ -1266,6 +1266,59 @@ def expand(self, pcoll): lengths = p | beam.Create(["a", "ab", "abc"]) | ComputeWordLengths() assert_that(lengths, equal_to([1, 2, 3])) +class AccessingValueProviderInfoAfterRunTest(unittest.TestCase): + """Tests for accessing value provider info after run.""" + + def test_accessing_valueprovider_info_after_run(self): +# [START AccessingValueProviderInfoAfterRunSnip1] + +class MyOptions(PipelineOptions): + @classmethod + def _add_argparse_args(cls, parser): +# Use add_value_provider_argument for arguments to be templatable +# Use add_argument as usual for non-templatable arguments +parser.add_value_provider_argument('--string_value', type=str, + default='the quick brown fox jumps over the lazy dog') + +class LogValueProvidersFn(beam.DoFn): + def __init__(self, string_vp): +self.string_vp = string_vp + + # Define the DoFn that logs the ValueProvider value. + # The DoFn is called when creating the pipeline branch. + # This example logs the ValueProvider value, but + # you could store it by pushing it to an external database. + def process(self, an_int, **kwargs): Review comment: You don't need the `kwargs`. ```suggestion def process(self, an_int): ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323617) Time Spent: 1h 40m (was: 1.5h) > Add test for snippet accessing_valueprovider_info_after_run > --- > > Key: BEAM-7765 > URL: https://issues.apache.org/jira/browse/BEAM-7765 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: John Patoch >Priority: Major > Labels: easy > Time Spent: 1h 40m > Remaining Estimate: 0h > > This snippet needs a unit test. > It has bugs. For example: > - apache_beam.utils.value_provider doesn't exist > - beam.combiners.Sum doesn't exist > - unused import of: WriteToText > cc: [~pabloem] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7765) Add test for snippet accessing_valueprovider_info_after_run
[ https://issues.apache.org/jira/browse/BEAM-7765?focusedWorklogId=323618&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323618 ] ASF GitHub Bot logged work on BEAM-7765: Author: ASF GitHub Bot Created on: 04/Oct/19 17:48 Start Date: 04/Oct/19 17:48 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9685: [BEAM-7765] - Add test for snippet accessing_valueprovider_info_after_run URL: https://github.com/apache/beam/pull/9685#discussion_r331615477 ## File path: sdks/python/apache_beam/examples/snippets/snippets_test.py ## @@ -1266,6 +1266,59 @@ def expand(self, pcoll): lengths = p | beam.Create(["a", "ab", "abc"]) | ComputeWordLengths() assert_that(lengths, equal_to([1, 2, 3])) +class AccessingValueProviderInfoAfterRunTest(unittest.TestCase): + """Tests for accessing value provider info after run.""" + + def test_accessing_valueprovider_info_after_run(self): +# [START AccessingValueProviderInfoAfterRunSnip1] + +class MyOptions(PipelineOptions): + @classmethod + def _add_argparse_args(cls, parser): +# Use add_value_provider_argument for arguments to be templatable +# Use add_argument as usual for non-templatable arguments +parser.add_value_provider_argument('--string_value', type=str, + default='the quick brown fox jumps over the lazy dog') + +class LogValueProvidersFn(beam.DoFn): + def __init__(self, string_vp): +self.string_vp = string_vp + + # Define the DoFn that logs the ValueProvider value. + # The DoFn is called when creating the pipeline branch. + # This example logs the ValueProvider value, but + # you could store it by pushing it to an external database. + def process(self, an_int, **kwargs): +logging.info('The string_value is %s' % self.string_vp.get()) + +yield self.string_vp.get() + +pipeline_options = PipelineOptions() + +my_options = pipeline_options.view_as(MyOptions) + +# Create pipeline. +with beam.Pipeline(options=my_options) as p: + # Add a branch for logging the ValueProvider value. + vp_output = (p + | beam.Create([None]) + | 'LogValueProvider' >> beam.ParDo( +LogValueProvidersFn(my_options.string_value))) + + # Test value provider argument : is equal to given string 'the quick brown fox jumps over the lazy dog' + assert_that(vp_output, equal_to(['the quick brown fox jumps over the lazy dog']), Review comment: If you move this to the end, you can end the snippet before the assertion ``` # [END AccessingValueProviderInfoAfterRunSnip1] assert_that(main_pipeline_output, equal_to([6]), label='assert_main_output') assert_that(vp_output, equal_to(['the quick brown fox jumps over the lazy dog']), ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323618) Time Spent: 1h 40m (was: 1.5h) > Add test for snippet accessing_valueprovider_info_after_run > --- > > Key: BEAM-7765 > URL: https://issues.apache.org/jira/browse/BEAM-7765 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: John Patoch >Priority: Major > Labels: easy > Time Spent: 1h 40m > Remaining Estimate: 0h > > This snippet needs a unit test. > It has bugs. For example: > - apache_beam.utils.value_provider doesn't exist > - beam.combiners.Sum doesn't exist > - unused import of: WriteToText > cc: [~pabloem] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7765) Add test for snippet accessing_valueprovider_info_after_run
[ https://issues.apache.org/jira/browse/BEAM-7765?focusedWorklogId=323619&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323619 ] ASF GitHub Bot logged work on BEAM-7765: Author: ASF GitHub Bot Created on: 04/Oct/19 17:48 Start Date: 04/Oct/19 17:48 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9685: [BEAM-7765] - Add test for snippet accessing_valueprovider_info_after_run URL: https://github.com/apache/beam/pull/9685#discussion_r331614775 ## File path: sdks/python/apache_beam/examples/snippets/snippets.py ## @@ -1394,28 +1397,24 @@ def __init__(self, string_vp): # The DoFn is called when creating the pipeline branch. # This example logs the ValueProvider value, but # you could store it by pushing it to an external database. -def process(self, an_int): +def process(self, an_int, **kwargs): logging.info('The string_value is %s' % self.string_vp.get()) - # Another option (where you don't need to pass the value at all) is: - logging.info('The string value is %s' % - RuntimeValueProvider.get_value('string_value', str, '')) pipeline_options = PipelineOptions() - # Create pipeline. - p = beam.Pipeline(options=pipeline_options) my_options = pipeline_options.view_as(MyOptions) - # Add a branch for logging the ValueProvider value. - _ = (p - | beam.Create([None]) - | 'LogValueProvs' >> beam.ParDo( - LogValueProvidersFn(my_options.string_value))) - - # The main pipeline. - result_pc = (p - | "main_pc" >> beam.Create([1, 2, 3]) - | beam.combiners.Sum.Globally()) - p.run().wait_until_finish() - - # [END AccessingValueProviderInfoAfterRunSnip1] + # Create pipeline. + with beam.Pipeline(options=my_options) as p: +# Add a branch for logging the ValueProvider value. +_ = (p + | beam.Create([None]) + | 'LogValueProvider' >> beam.ParDo( + LogValueProvidersFn(my_options.string_value))) + +# The main pipeline. Review comment: Does it make sense to completely move the snippet to the `snippets_test.py` file, and remove it from snippets.py? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323619) Time Spent: 1h 40m (was: 1.5h) > Add test for snippet accessing_valueprovider_info_after_run > --- > > Key: BEAM-7765 > URL: https://issues.apache.org/jira/browse/BEAM-7765 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: John Patoch >Priority: Major > Labels: easy > Time Spent: 1h 40m > Remaining Estimate: 0h > > This snippet needs a unit test. > It has bugs. For example: > - apache_beam.utils.value_provider doesn't exist > - beam.combiners.Sum doesn't exist > - unused import of: WriteToText > cc: [~pabloem] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7977) DataflowRunner writes to System.out instead of logging
[ https://issues.apache.org/jira/browse/BEAM-7977?focusedWorklogId=323623&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323623 ] ASF GitHub Bot logged work on BEAM-7977: Author: ASF GitHub Bot Created on: 04/Oct/19 17:50 Start Date: 04/Oct/19 17:50 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9337: [BEAM-7977] use LOG.info instead of System.out.println in DataflowRunner URL: https://github.com/apache/beam/pull/9337 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323623) Time Spent: 50m (was: 40m) > DataflowRunner writes to System.out instead of logging > -- > > Key: BEAM-7977 > URL: https://issues.apache.org/jira/browse/BEAM-7977 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Sebastian Jambor >Assignee: Sebastian Jambor >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > DataflowRunner writes two lines to stdout for every job, which bypasses the > logger setup. This is slightly annoying, especially if many jobs are started > programmatically. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-6228) Website build from source release fails due to git dependency
[ https://issues.apache.org/jira/browse/BEAM-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-6228: - Assignee: Kenneth Knowles > Website build from source release fails due to git dependency > - > > Key: BEAM-6228 > URL: https://issues.apache.org/jira/browse/BEAM-6228 > Project: Beam > Issue Type: New Feature > Components: website >Reporter: Scott Wegner >Assignee: Kenneth Knowles >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > The website build assumes a git environment, and will fail if built outside > of git. As a result, we cannot build the website from the source release. > [~kenn] noticed this during 2.9.0 release validation. See: > https://lists.apache.org/thread.html/dc816a5f8c82dd4d19e82666209305ec67d46440fe8edba4534f9b63@%3Cdev.beam.apache.org%3E > {code} > > Configure project :beam-website > No git repository found for :beam-website. Accessing grgit will cause an NPE. > FAILURE: Build failed with an exception. > * Where: > Build file 'website/build.gradle' line: 143 > * What went wrong: > A problem occurred evaluating project ':beam-website'. > > Cannot get property 'branch' on null object > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7977) DataflowRunner writes to System.out instead of logging
[ https://issues.apache.org/jira/browse/BEAM-7977?focusedWorklogId=323622&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323622 ] ASF GitHub Bot logged work on BEAM-7977: Author: ASF GitHub Bot Created on: 04/Oct/19 17:50 Start Date: 04/Oct/19 17:50 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9337: [BEAM-7977] use LOG.info instead of System.out.println in DataflowRunner URL: https://github.com/apache/beam/pull/9337#issuecomment-538497321 Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323622) Time Spent: 40m (was: 0.5h) > DataflowRunner writes to System.out instead of logging > -- > > Key: BEAM-7977 > URL: https://issues.apache.org/jira/browse/BEAM-7977 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Sebastian Jambor >Assignee: Sebastian Jambor >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > DataflowRunner writes two lines to stdout for every job, which bypasses the > logger setup. This is slightly annoying, especially if many jobs are started > programmatically. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-6228) Website build from source release fails due to git dependency
[ https://issues.apache.org/jira/browse/BEAM-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-6228: -- Priority: Critical (was: Minor) > Website build from source release fails due to git dependency > - > > Key: BEAM-6228 > URL: https://issues.apache.org/jira/browse/BEAM-6228 > Project: Beam > Issue Type: Bug > Components: build-system, website >Reporter: Scott Wegner >Assignee: Kenneth Knowles >Priority: Critical > Time Spent: 1h 10m > Remaining Estimate: 0h > > The website build assumes a git environment, and will fail if built outside > of git. As a result, we cannot build the website from the source release. > [~kenn] noticed this during 2.9.0 release validation. See: > https://lists.apache.org/thread.html/dc816a5f8c82dd4d19e82666209305ec67d46440fe8edba4534f9b63@%3Cdev.beam.apache.org%3E > {code} > > Configure project :beam-website > No git repository found for :beam-website. Accessing grgit will cause an NPE. > FAILURE: Build failed with an exception. > * Where: > Build file 'website/build.gradle' line: 143 > * What went wrong: > A problem occurred evaluating project ':beam-website'. > > Cannot get property 'branch' on null object > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-6228) Website build from source release fails due to git dependency
[ https://issues.apache.org/jira/browse/BEAM-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-6228: -- Issue Type: Bug (was: New Feature) > Website build from source release fails due to git dependency > - > > Key: BEAM-6228 > URL: https://issues.apache.org/jira/browse/BEAM-6228 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Scott Wegner >Assignee: Kenneth Knowles >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > The website build assumes a git environment, and will fail if built outside > of git. As a result, we cannot build the website from the source release. > [~kenn] noticed this during 2.9.0 release validation. See: > https://lists.apache.org/thread.html/dc816a5f8c82dd4d19e82666209305ec67d46440fe8edba4534f9b63@%3Cdev.beam.apache.org%3E > {code} > > Configure project :beam-website > No git repository found for :beam-website. Accessing grgit will cause an NPE. > FAILURE: Build failed with an exception. > * Where: > Build file 'website/build.gradle' line: 143 > * What went wrong: > A problem occurred evaluating project ':beam-website'. > > Cannot get property 'branch' on null object > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-6228) Website build from source release fails due to git dependency
[ https://issues.apache.org/jira/browse/BEAM-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-6228: -- Component/s: build-system > Website build from source release fails due to git dependency > - > > Key: BEAM-6228 > URL: https://issues.apache.org/jira/browse/BEAM-6228 > Project: Beam > Issue Type: Bug > Components: build-system, website >Reporter: Scott Wegner >Assignee: Kenneth Knowles >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > The website build assumes a git environment, and will fail if built outside > of git. As a result, we cannot build the website from the source release. > [~kenn] noticed this during 2.9.0 release validation. See: > https://lists.apache.org/thread.html/dc816a5f8c82dd4d19e82666209305ec67d46440fe8edba4534f9b63@%3Cdev.beam.apache.org%3E > {code} > > Configure project :beam-website > No git repository found for :beam-website. Accessing grgit will cause an NPE. > FAILURE: Build failed with an exception. > * Where: > Build file 'website/build.gradle' line: 143 > * What went wrong: > A problem occurred evaluating project ':beam-website'. > > Cannot get property 'branch' on null object > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-6228) Website build from source release fails due to git dependency
[ https://issues.apache.org/jira/browse/BEAM-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-6228: -- Fix Version/s: 2.17.0 > Website build from source release fails due to git dependency > - > > Key: BEAM-6228 > URL: https://issues.apache.org/jira/browse/BEAM-6228 > Project: Beam > Issue Type: Bug > Components: build-system, website >Reporter: Scott Wegner >Assignee: Kenneth Knowles >Priority: Critical > Fix For: 2.17.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > The website build assumes a git environment, and will fail if built outside > of git. As a result, we cannot build the website from the source release. > [~kenn] noticed this during 2.9.0 release validation. See: > https://lists.apache.org/thread.html/dc816a5f8c82dd4d19e82666209305ec67d46440fe8edba4534f9b63@%3Cdev.beam.apache.org%3E > {code} > > Configure project :beam-website > No git repository found for :beam-website. Accessing grgit will cause an NPE. > FAILURE: Build failed with an exception. > * Where: > Build file 'website/build.gradle' line: 143 > * What went wrong: > A problem occurred evaluating project ':beam-website'. > > Cannot get property 'branch' on null object > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines
[ https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=323627&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323627 ] ASF GitHub Bot logged work on BEAM-8343: Author: ASF GitHub Bot Created on: 04/Oct/19 17:53 Start Date: 04/Oct/19 17:53 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #9718: [BEAM-8343] Moved BeamTable classes to a more appropriate location URL: https://github.com/apache/beam/pull/9718#discussion_r331617476 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/DefaultTableFilter.java ## @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta; + +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram; + +/** + * This default implementation of {@link BeamSqlTableFilter} interface. Assumes that predicate + * push-down is not supported. + */ +public class DefaultTableFilter implements BeamSqlTableFilter { + private final RexProgram program; + private final RexNode filter; + + public DefaultTableFilter(RexProgram program, RexNode filter) { +this.program = program; +this.filter = filter; + } + + /** + * Since predicate push-down is assumed not to be supported by default - return an unchanged + * filter to be preserved. + * + * @return Predicate {@code RexNode} which is not supported + */ + @Override + public RexNode getNotSupported() { Review comment: Yes, I think updating it to a List of RexNodes, where each node is an operand for a top-level AND, is a better approach here. Addressed in #9731. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323627) Time Spent: 1h 50m (was: 1h 40m) > Add means for IO APIs to support predicate and/or project push-down when > running SQL pipelines > -- > > Key: BEAM-8343 > URL: https://issues.apache.org/jira/browse/BEAM-8343 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > The objective is to create a universal way for Beam SQL IO APIs to support > predicate/project push-down. > A proposed way to achieve that is by introducing an interface responsible > for identifying what portion(s) of a Calc can be moved down to IO layer. > Also, adding following methods to a BeamSqlTable interface to pass necessary > parameters to IO APIs: > - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter) > - Boolean supportsProjects() > - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, > List fieldNames) > > Design doc > [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines
[ https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=323625&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323625 ] ASF GitHub Bot logged work on BEAM-8343: Author: ASF GitHub Bot Created on: 04/Oct/19 17:53 Start Date: 04/Oct/19 17:53 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #9718: [BEAM-8343] Moved BeamTable classes to a more appropriate location URL: https://github.com/apache/beam/pull/9718#discussion_r331617476 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/DefaultTableFilter.java ## @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta; + +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram; + +/** + * This default implementation of {@link BeamSqlTableFilter} interface. Assumes that predicate + * push-down is not supported. + */ +public class DefaultTableFilter implements BeamSqlTableFilter { + private final RexProgram program; + private final RexNode filter; + + public DefaultTableFilter(RexProgram program, RexNode filter) { +this.program = program; +this.filter = filter; + } + + /** + * Since predicate push-down is assumed not to be supported by default - return an unchanged + * filter to be preserved. + * + * @return Predicate {@code RexNode} which is not supported + */ + @Override + public RexNode getNotSupported() { Review comment: Yes, I think updating it to a List of RexNodes, where each node is an operand for a top-level AND, is a better approach here. Addressed in #9731 . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323625) Time Spent: 1.5h (was: 1h 20m) > Add means for IO APIs to support predicate and/or project push-down when > running SQL pipelines > -- > > Key: BEAM-8343 > URL: https://issues.apache.org/jira/browse/BEAM-8343 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > The objective is to create a universal way for Beam SQL IO APIs to support > predicate/project push-down. > A proposed way to achieve that is by introducing an interface responsible > for identifying what portion(s) of a Calc can be moved down to IO layer. > Also, adding following methods to a BeamSqlTable interface to pass necessary > parameters to IO APIs: > - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter) > - Boolean supportsProjects() > - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, > List fieldNames) > > Design doc > [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines
[ https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=323626&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323626 ] ASF GitHub Bot logged work on BEAM-8343: Author: ASF GitHub Bot Created on: 04/Oct/19 17:53 Start Date: 04/Oct/19 17:53 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #9718: [BEAM-8343] Moved BeamTable classes to a more appropriate location URL: https://github.com/apache/beam/pull/9718#discussion_r331615203 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BaseBeamTable.java ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta; + +import java.util.List; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram; + +/** Basic implementation of {@link BeamSqlTable} methods used by predicate and filter push-down. */ +public abstract class BaseBeamTable implements BeamSqlTable { + + @Override + public PCollection buildIOReader( + PBegin begin, BeamSqlTableFilter filters, List fieldNames) { +return buildIOReader(begin); Review comment: Modified default implementation for this method to throw an exception in #9731. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323626) Time Spent: 1h 40m (was: 1.5h) > Add means for IO APIs to support predicate and/or project push-down when > running SQL pipelines > -- > > Key: BEAM-8343 > URL: https://issues.apache.org/jira/browse/BEAM-8343 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > The objective is to create a universal way for Beam SQL IO APIs to support > predicate/project push-down. > A proposed way to achieve that is by introducing an interface responsible > for identifying what portion(s) of a Calc can be moved down to IO layer. > Also, adding following methods to a BeamSqlTable interface to pass necessary > parameters to IO APIs: > - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter) > - Boolean supportsProjects() > - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, > List fieldNames) > > Design doc > [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=323628&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323628 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 04/Oct/19 17:53 Start Date: 04/Oct/19 17:53 Worklog Time Spent: 10m Work Description: davidcavazos commented on issue #9692: [BEAM-7389] Update docs with matching code files URL: https://github.com/apache/beam/pull/9692#issuecomment-538498541 * Rebased to latest version * Added buttons for every snippet TODO (until #9664 merges) * Generate notebooks * Test and stage website This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323628) Time Spent: 63h 40m (was: 63.5h) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 63h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-7049) Merge multiple input to one BeamUnionRel
[ https://issues.apache.org/jira/browse/BEAM-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944063#comment-16944063 ] Rui Wang edited comment on BEAM-7049 at 10/4/19 5:54 PM: - Ah ok. I found a tricky problem: given the merged cost based optimization, with enabled UnionMerge rule, the calcite planner falls into an infinite loop without choosing a plan. It becomes a very tricky problem. I will need to spend many hours to understand Calcite planner, BeamSQL's CBO implementation and others to understand the root cause. To answer your last question, what I was thinking was to have two rules for UNION ALL and UNION respectively and each rule should overwrite [1]. So UNION ALL rule will fire only for UNION ALL queries. UNION is the same. By doing so you can separate implementation of underlying PTransform. [1]: https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/plan/RelOptRule.java#L495 was (Author: amaliujia): Ah ok. I found a tricky problem: given the merged cost based optimization, with enabled UnionMerge rule, the calcite planner falls into an infinite loop without choosing a plan. It becomes a very tricky problem. I will need to spend many hours to understand Calcite planner, BeamSQL's CBO implementation and others to understand the root cause. To answer your last question, what I was thinking was to have two rules for UNION ALL and UNION respectively and each rule should overwrite [1]. So UNION ALL rule will fire only for UNION ALL queries. UNION is the same. By doing so you can separate implementation of underlying PTransform. [1]: https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/plan/RelOptRule.java#L511 > Merge multiple input to one BeamUnionRel > > > Key: BEAM-7049 > URL: https://issues.apache.org/jira/browse/BEAM-7049 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: sridhar Reddy >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c` > will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If > BeamUnionRel can handle multiple shuffles, we will have only one shuffle -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines
[ https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=323629&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323629 ] ASF GitHub Bot logged work on BEAM-8343: Author: ASF GitHub Bot Created on: 04/Oct/19 17:54 Start Date: 04/Oct/19 17:54 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #9718: [BEAM-8343] Moved BeamTable classes to a more appropriate location URL: https://github.com/apache/beam/pull/9718#discussion_r331617476 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/DefaultTableFilter.java ## @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta; + +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram; + +/** + * This default implementation of {@link BeamSqlTableFilter} interface. Assumes that predicate + * push-down is not supported. + */ +public class DefaultTableFilter implements BeamSqlTableFilter { + private final RexProgram program; + private final RexNode filter; + + public DefaultTableFilter(RexProgram program, RexNode filter) { +this.program = program; +this.filter = filter; + } + + /** + * Since predicate push-down is assumed not to be supported by default - return an unchanged + * filter to be preserved. + * + * @return Predicate {@code RexNode} which is not supported + */ + @Override + public RexNode getNotSupported() { Review comment: Yes, I think updating it to a List of `RexNode`s, where each node is an operand for a top-level AND, is a better approach here. Addressed in #9731. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323629) Time Spent: 2h (was: 1h 50m) > Add means for IO APIs to support predicate and/or project push-down when > running SQL pipelines > -- > > Key: BEAM-8343 > URL: https://issues.apache.org/jira/browse/BEAM-8343 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > The objective is to create a universal way for Beam SQL IO APIs to support > predicate/project push-down. > A proposed way to achieve that is by introducing an interface responsible > for identifying what portion(s) of a Calc can be moved down to IO layer. > Also, adding following methods to a BeamSqlTable interface to pass necessary > parameters to IO APIs: > - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter) > - Boolean supportsProjects() > - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, > List fieldNames) > > Design doc > [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines
[ https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=323630&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323630 ] ASF GitHub Bot logged work on BEAM-8343: Author: ASF GitHub Bot Created on: 04/Oct/19 17:55 Start Date: 04/Oct/19 17:55 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #9718: [BEAM-8343] Moved BeamTable classes to a more appropriate location URL: https://github.com/apache/beam/pull/9718#discussion_r331618465 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BeamSqlTable.java ## @@ -15,24 +15,36 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package org.apache.beam.sdk.extensions.sql; +package org.apache.beam.sdk.extensions.sql.meta; +import java.util.List; import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; import org.apache.beam.sdk.values.POutput; import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram; /** This interface defines a Beam Sql Table. */ public interface BeamSqlTable { /** create a {@code PCollection} from source. */ PCollection buildIOReader(PBegin begin); + /** create a {@code PCollection} from source with predicate and/or project pushed-down. */ + PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, List fieldNames); + /** create a {@code IO.write()} instance to write to target. */ POutput buildIOWriter(PCollection input); + /** Generate an IO implementation of {@code BeamSqlTableFilter} for predicate push-down. */ + BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter); Review comment: Yes, I believe `List` can be used as an argument here, addressed in #9731. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323630) Time Spent: 2h 10m (was: 2h) > Add means for IO APIs to support predicate and/or project push-down when > running SQL pipelines > -- > > Key: BEAM-8343 > URL: https://issues.apache.org/jira/browse/BEAM-8343 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > The objective is to create a universal way for Beam SQL IO APIs to support > predicate/project push-down. > A proposed way to achieve that is by introducing an interface responsible > for identifying what portion(s) of a Calc can be moved down to IO layer. > Also, adding following methods to a BeamSqlTable interface to pass necessary > parameters to IO APIs: > - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter) > - Boolean supportsProjects() > - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, > List fieldNames) > > Design doc > [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-6995) SQL aggregation with where clause fails to plan
[ https://issues.apache.org/jira/browse/BEAM-6995?focusedWorklogId=323631&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323631 ] ASF GitHub Bot logged work on BEAM-6995: Author: ASF GitHub Bot Created on: 04/Oct/19 17:59 Start Date: 04/Oct/19 17:59 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #9703: [BEAM-6995] Beam basic aggregation rule only when not windowed URL: https://github.com/apache/beam/pull/9703#discussion_r331620109 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java ## @@ -84,6 +85,31 @@ public BeamAggregationRel( this.windowFieldIndex = windowFieldIndex; } + public BeamAggregationRel( + RelOptCluster cluster, + RelTraitSet traits, + RelNode child, + RelDataType rowType, + boolean indicator, + ImmutableBitSet groupSet, + List groupSets, + List aggCalls, + @Nullable WindowFn windowFn, + int windowFieldIndex) { +this( +cluster, +traits, +child, +indicator, +groupSet, +groupSets, +aggCalls, +windowFn, +windowFieldIndex); + +this.rowType = rowType; Review comment: Where is this rowType used? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323631) Time Spent: 2.5h (was: 2h 20m) > SQL aggregation with where clause fails to plan > --- > > Key: BEAM-6995 > URL: https://issues.apache.org/jira/browse/BEAM-6995 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.11.0 >Reporter: David McIntosh >Assignee: Kirill Kozlov >Priority: Minor > Time Spent: 2.5h > Remaining Estimate: 0h > > I'm finding that this code fails with a CannotPlanException listed below. > {code:java} > Schema schema = Schema.builder() > .addInt32Field("id") > .addInt32Field("val") > .build(); > Row row = Row.withSchema(schema).addValues(1, 2).build(); > PCollection inputData = p.apply("row input", > Create.of(row).withRowSchema(schema)); > inputData.apply("sql", > SqlTransform.query( > "SELECT id, SUM(val) " > + "FROM PCOLLECTION " > + "WHERE val > 0 " > + "GROUP BY id"));{code} > If the WHERE clause is removed the code runs successfully. > This may be similar to BEAM-5384 since I was able to work around this by > adding an extra column to the input that isn't reference in the sql. > {code:java} > Schema schema = Schema.builder() > .addInt32Field("id") > .addInt32Field("val") > .addInt32Field("extra") > .build();{code} > > {code:java} > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException: > Node [rel#100:Subset#2.BEAM_LOGICAL] could not be implemented; planner state: > Root: rel#100:Subset#2.BEAM_LOGICAL > Original rel: > LogicalAggregate(subset=[rel#100:Subset#2.BEAM_LOGICAL], group=[{0}], > EXPR$1=[SUM($1)]): rowcount = 5.0, cumulative cost = {5.687500238418579 rows, > 0.0 cpu, 0.0 io}, id = 98 > LogicalFilter(subset=[rel#97:Subset#1.NONE], condition=[>($1, 0)]): > rowcount = 50.0, cumulative cost = {50.0 rows, 100.0 cpu, 0.0 io}, id = 96 > BeamIOSourceRel(subset=[rel#95:Subset#0.BEAM_LOGICAL], table=[[beam, > PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, > 0.0 io}, id = 92 > Sets: > Set#0, type: RecordType(INTEGER id, INTEGER val) > rel#95:Subset#0.BEAM_LOGICAL, best=rel#92, > importance=0.7291 > rel#92:BeamIOSourceRel.BEAM_LOGICAL(table=[beam, > PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io} > rel#110:Subset#0.ENUMERABLE, best=rel#109, > importance=0.36455 > > rel#109:BeamEnumerableConverter.ENUMERABLE(input=rel#95:Subset#0.BEAM_LOGICAL), > rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, > 1.7976931348623157E308 cpu, 1.7976931348623157E308 io} > Set#1, type: RecordType(INTEGER id, INTEGER val) > rel#97:Subset#1.NONE, best=null, importance=0.81 > > rel#96:LogicalFilter.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,condition=>($1, > 0)), rowcount=50.0, cumulative cost={inf} > > rel#102:LogicalCalc.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,expr#0..1={inputs},expr#2=0,expr#3=>($t1, > $t2),i
[jira] [Commented] (BEAM-8319) Errorprone 0.0.13 fails during JDK11 build
[ https://issues.apache.org/jira/browse/BEAM-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944719#comment-16944719 ] Pablo Estrada commented on BEAM-8319: - I've seen on Guava that the version they call out as having official JDK 11 support is 27.1-jre: https://github.com/google/guava/issues/3272 https://github.com/google/guava/issues/3288 > Errorprone 0.0.13 fails during JDK11 build > -- > > Key: BEAM-8319 > URL: https://issues.apache.org/jira/browse/BEAM-8319 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > I'm using openjdk 1.11.02. After switching version to; > {code:java} > javaVersion = 11 {code} > in BeamModule Plugin and running > {code:java} > ./gradlew clean build -p sdks/java/code -xtest {code} > building fails. I was able to run errorprone after upgrading it but had > problems with conflicting guava version. See more here: > https://issues.apache.org/jira/browse/BEAM-5085 > > Stacktrace: > {code:java} > org.gradle.api.tasks.TaskExecutionException: Execution failed for task > ':model:pipeline:compileJava'. > at > org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:121) > at > org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:117) > at org.gradle.internal.Try$Failure.ifSuccessfulOrElse(Try.java:184) > at > org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.execute(ExecuteActionsTaskExecuter.java:110) > at > org.gradle.api.internal.tasks.execution.ResolveIncrementalChangesTaskExecuter.execute(ResolveIncrementalChangesTaskExecuter.java:84) > at > org.gradle.api.internal.tasks.execution.ResolveTaskOutputCachingStateExecuter.execute(ResolveTaskOutputCachingStateExecuter.java:91) > at > org.gradle.api.internal.tasks.execution.FinishSnapshotTaskInputsBuildOperationTaskExecuter.execute(FinishSnapshotTaskInputsBuildOperationTaskExecuter.java:51) > at > org.gradle.api.internal.tasks.execution.ResolveBuildCacheKeyExecuter.execute(ResolveBuildCacheKeyExecuter.java:102) > at > org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionStateTaskExecuter.execute(ResolveBeforeExecutionStateTaskExecuter.java:74) > at > org.gradle.api.internal.tasks.execution.ValidatingTaskExecuter.execute(ValidatingTaskExecuter.java:58) > at > org.gradle.api.internal.tasks.execution.SkipEmptySourceFilesTaskExecuter.execute(SkipEmptySourceFilesTaskExecuter.java:109) > at > org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionOutputsTaskExecuter.execute(ResolveBeforeExecutionOutputsTaskExecuter.java:67) > at > org.gradle.api.internal.tasks.execution.StartSnapshotTaskInputsBuildOperationTaskExecuter.execute(StartSnapshotTaskInputsBuildOperationTaskExecuter.java:52) > at > org.gradle.api.internal.tasks.execution.ResolveAfterPreviousExecutionStateTaskExecuter.execute(ResolveAfterPreviousExecutionStateTaskExecuter.java:46) > at > org.gradle.api.internal.tasks.execution.CleanupStaleOutputsExecuter.execute(CleanupStaleOutputsExecuter.java:93) > at > org.gradle.api.internal.tasks.execution.FinalizePropertiesTaskExecuter.execute(FinalizePropertiesTaskExecuter.java:45) > at > org.gradle.api.internal.tasks.execution.ResolveTaskExecutionModeExecuter.execute(ResolveTaskExecutionModeExecuter.java:94) > at > org.gradle.api.internal.tasks.execution.SkipTaskWithNoActionsExecuter.execute(SkipTaskWithNoActionsExecuter.java:57) > at > org.gradle.api.internal.tasks.execution.SkipOnlyIfTaskExecuter.execute(SkipOnlyIfTaskExecuter.java:56) > at > org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:36) > at > org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.executeTask(EventFiringTaskExecuter.java:63) > at > org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:49) > at > org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:46) > at > org.gradle.internal.operations.DefaultBuildOperationExecutor$CallableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:416) > at > org.gradle.internal.operations.DefaultBuildOperationExecutor$CallableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:406) > at > org.gradle.internal.operations.DefaultBuildOperationExecutor$1.execute(DefaultBuildOperationExecutor.java:165) > at > or
[jira] [Commented] (BEAM-8319) Errorprone 0.0.13 fails during JDK11 build
[ https://issues.apache.org/jira/browse/BEAM-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944728#comment-16944728 ] Lukasz Gajowy commented on BEAM-8319: - Thanks [~pabloem] - good to know this. In that case, I think we should update vendored guava too (it's currently in version 26.0-jre). > Errorprone 0.0.13 fails during JDK11 build > -- > > Key: BEAM-8319 > URL: https://issues.apache.org/jira/browse/BEAM-8319 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > I'm using openjdk 1.11.02. After switching version to; > {code:java} > javaVersion = 11 {code} > in BeamModule Plugin and running > {code:java} > ./gradlew clean build -p sdks/java/code -xtest {code} > building fails. I was able to run errorprone after upgrading it but had > problems with conflicting guava version. See more here: > https://issues.apache.org/jira/browse/BEAM-5085 > > Stacktrace: > {code:java} > org.gradle.api.tasks.TaskExecutionException: Execution failed for task > ':model:pipeline:compileJava'. > at > org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:121) > at > org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:117) > at org.gradle.internal.Try$Failure.ifSuccessfulOrElse(Try.java:184) > at > org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.execute(ExecuteActionsTaskExecuter.java:110) > at > org.gradle.api.internal.tasks.execution.ResolveIncrementalChangesTaskExecuter.execute(ResolveIncrementalChangesTaskExecuter.java:84) > at > org.gradle.api.internal.tasks.execution.ResolveTaskOutputCachingStateExecuter.execute(ResolveTaskOutputCachingStateExecuter.java:91) > at > org.gradle.api.internal.tasks.execution.FinishSnapshotTaskInputsBuildOperationTaskExecuter.execute(FinishSnapshotTaskInputsBuildOperationTaskExecuter.java:51) > at > org.gradle.api.internal.tasks.execution.ResolveBuildCacheKeyExecuter.execute(ResolveBuildCacheKeyExecuter.java:102) > at > org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionStateTaskExecuter.execute(ResolveBeforeExecutionStateTaskExecuter.java:74) > at > org.gradle.api.internal.tasks.execution.ValidatingTaskExecuter.execute(ValidatingTaskExecuter.java:58) > at > org.gradle.api.internal.tasks.execution.SkipEmptySourceFilesTaskExecuter.execute(SkipEmptySourceFilesTaskExecuter.java:109) > at > org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionOutputsTaskExecuter.execute(ResolveBeforeExecutionOutputsTaskExecuter.java:67) > at > org.gradle.api.internal.tasks.execution.StartSnapshotTaskInputsBuildOperationTaskExecuter.execute(StartSnapshotTaskInputsBuildOperationTaskExecuter.java:52) > at > org.gradle.api.internal.tasks.execution.ResolveAfterPreviousExecutionStateTaskExecuter.execute(ResolveAfterPreviousExecutionStateTaskExecuter.java:46) > at > org.gradle.api.internal.tasks.execution.CleanupStaleOutputsExecuter.execute(CleanupStaleOutputsExecuter.java:93) > at > org.gradle.api.internal.tasks.execution.FinalizePropertiesTaskExecuter.execute(FinalizePropertiesTaskExecuter.java:45) > at > org.gradle.api.internal.tasks.execution.ResolveTaskExecutionModeExecuter.execute(ResolveTaskExecutionModeExecuter.java:94) > at > org.gradle.api.internal.tasks.execution.SkipTaskWithNoActionsExecuter.execute(SkipTaskWithNoActionsExecuter.java:57) > at > org.gradle.api.internal.tasks.execution.SkipOnlyIfTaskExecuter.execute(SkipOnlyIfTaskExecuter.java:56) > at > org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:36) > at > org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.executeTask(EventFiringTaskExecuter.java:63) > at > org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:49) > at > org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:46) > at > org.gradle.internal.operations.DefaultBuildOperationExecutor$CallableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:416) > at > org.gradle.internal.operations.DefaultBuildOperationExecutor$CallableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:406) > at > org.gradle.internal.operations.DefaultBuildOperationExecutor$1.execute(DefaultBuildOperationExecutor.java:165) > at > org.gradle.internal.operations.DefaultBuildOperation
[jira] [Comment Edited] (BEAM-8319) Errorprone 0.0.13 fails during JDK11 build
[ https://issues.apache.org/jira/browse/BEAM-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944728#comment-16944728 ] Lukasz Gajowy edited comment on BEAM-8319 at 10/4/19 6:11 PM: -- Thanks [~pabloem] - good to know this. In that case, I think we may need to update vendored guava too (it's currently in version 26.0-jre). was (Author: łukaszg): Thanks [~pabloem] - good to know this. In that case, I think we should update vendored guava too (it's currently in version 26.0-jre). > Errorprone 0.0.13 fails during JDK11 build > -- > > Key: BEAM-8319 > URL: https://issues.apache.org/jira/browse/BEAM-8319 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > I'm using openjdk 1.11.02. After switching version to; > {code:java} > javaVersion = 11 {code} > in BeamModule Plugin and running > {code:java} > ./gradlew clean build -p sdks/java/code -xtest {code} > building fails. I was able to run errorprone after upgrading it but had > problems with conflicting guava version. See more here: > https://issues.apache.org/jira/browse/BEAM-5085 > > Stacktrace: > {code:java} > org.gradle.api.tasks.TaskExecutionException: Execution failed for task > ':model:pipeline:compileJava'. > at > org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:121) > at > org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:117) > at org.gradle.internal.Try$Failure.ifSuccessfulOrElse(Try.java:184) > at > org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.execute(ExecuteActionsTaskExecuter.java:110) > at > org.gradle.api.internal.tasks.execution.ResolveIncrementalChangesTaskExecuter.execute(ResolveIncrementalChangesTaskExecuter.java:84) > at > org.gradle.api.internal.tasks.execution.ResolveTaskOutputCachingStateExecuter.execute(ResolveTaskOutputCachingStateExecuter.java:91) > at > org.gradle.api.internal.tasks.execution.FinishSnapshotTaskInputsBuildOperationTaskExecuter.execute(FinishSnapshotTaskInputsBuildOperationTaskExecuter.java:51) > at > org.gradle.api.internal.tasks.execution.ResolveBuildCacheKeyExecuter.execute(ResolveBuildCacheKeyExecuter.java:102) > at > org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionStateTaskExecuter.execute(ResolveBeforeExecutionStateTaskExecuter.java:74) > at > org.gradle.api.internal.tasks.execution.ValidatingTaskExecuter.execute(ValidatingTaskExecuter.java:58) > at > org.gradle.api.internal.tasks.execution.SkipEmptySourceFilesTaskExecuter.execute(SkipEmptySourceFilesTaskExecuter.java:109) > at > org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionOutputsTaskExecuter.execute(ResolveBeforeExecutionOutputsTaskExecuter.java:67) > at > org.gradle.api.internal.tasks.execution.StartSnapshotTaskInputsBuildOperationTaskExecuter.execute(StartSnapshotTaskInputsBuildOperationTaskExecuter.java:52) > at > org.gradle.api.internal.tasks.execution.ResolveAfterPreviousExecutionStateTaskExecuter.execute(ResolveAfterPreviousExecutionStateTaskExecuter.java:46) > at > org.gradle.api.internal.tasks.execution.CleanupStaleOutputsExecuter.execute(CleanupStaleOutputsExecuter.java:93) > at > org.gradle.api.internal.tasks.execution.FinalizePropertiesTaskExecuter.execute(FinalizePropertiesTaskExecuter.java:45) > at > org.gradle.api.internal.tasks.execution.ResolveTaskExecutionModeExecuter.execute(ResolveTaskExecutionModeExecuter.java:94) > at > org.gradle.api.internal.tasks.execution.SkipTaskWithNoActionsExecuter.execute(SkipTaskWithNoActionsExecuter.java:57) > at > org.gradle.api.internal.tasks.execution.SkipOnlyIfTaskExecuter.execute(SkipOnlyIfTaskExecuter.java:56) > at > org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:36) > at > org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.executeTask(EventFiringTaskExecuter.java:63) > at > org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:49) > at > org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:46) > at > org.gradle.internal.operations.DefaultBuildOperationExecutor$CallableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:416) > at > org.gradle.internal.operations.DefaultBuildOperationExecutor$CallableBuildOperationWorker.execute(DefaultBuildOperati
[jira] [Work logged] (BEAM-8344) Add infer schema support in ParquetIO and refactor ParquetTableProvider
[ https://issues.apache.org/jira/browse/BEAM-8344?focusedWorklogId=323638&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323638 ] ASF GitHub Bot logged work on BEAM-8344: Author: ASF GitHub Bot Created on: 04/Oct/19 18:11 Start Date: 04/Oct/19 18:11 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #9721: [BEAM-8344] Add inferSchema support in ParquetIO and refactor ParquetTableProvider URL: https://github.com/apache/beam/pull/9721#discussion_r331624690 ## File path: sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java ## @@ -122,15 +122,34 @@ * pattern). */ public static Read read(Schema schema) { -return new AutoValue_ParquetIO_Read.Builder().setSchema(schema).build(); +return new AutoValue_ParquetIO_Read.Builder() +.setSchema(schema) +.setInferBeamSchema(false) +.build(); } /** * Like {@link #read(Schema)}, but reads each file in a {@link PCollection} of {@link * org.apache.beam.sdk.io.FileIO.ReadableFile}, which allows more flexible usage. */ public static ReadFiles readFiles(Schema schema) { -return new AutoValue_ParquetIO_ReadFiles.Builder().setSchema(schema).build(); +return new AutoValue_ParquetIO_ReadFiles.Builder() +.setSchema(schema) +.setInferBeamSchema(false) +.build(); + } + + private static PCollection setBeamSchema( + PCollection pc, Class clazz, @Nullable Schema schema) { +org.apache.beam.sdk.schemas.Schema beamSchema = +org.apache.beam.sdk.schemas.utils.AvroUtils.getSchema(clazz, schema); +if (beamSchema != null) { Review comment: Because here you won't throw exception if there is not beamSchema, why not just make "inforSchema" as a non-optional action so we don't need set the boolean to control it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323638) Time Spent: 20m (was: 10m) > Add infer schema support in ParquetIO and refactor ParquetTableProvider > --- > > Key: BEAM-8344 > URL: https://issues.apache.org/jira/browse/BEAM-8344 > Project: Beam > Issue Type: Improvement > Components: dsl-sql, io-java-parquet >Reporter: Vishwas >Assignee: Vishwas >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Add support for inferring Beam Schema in ParquetIO. > Refactor ParquetTable code to use Convert.rows(). > Remove unnecessary java class GenericRecordReadConverter. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-6995) SQL aggregation with where clause fails to plan
[ https://issues.apache.org/jira/browse/BEAM-6995?focusedWorklogId=323640&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323640 ] ASF GitHub Bot logged work on BEAM-6995: Author: ASF GitHub Bot Created on: 04/Oct/19 18:12 Start Date: 04/Oct/19 18:12 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #9703: [BEAM-6995] Beam basic aggregation rule only when not windowed URL: https://github.com/apache/beam/pull/9703#discussion_r331625043 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java ## @@ -84,6 +85,31 @@ public BeamAggregationRel( this.windowFieldIndex = windowFieldIndex; } + public BeamAggregationRel( + RelOptCluster cluster, + RelTraitSet traits, + RelNode child, + RelDataType rowType, + boolean indicator, + ImmutableBitSet groupSet, + List groupSets, + List aggCalls, + @Nullable WindowFn windowFn, + int windowFieldIndex) { +this( +cluster, +traits, +child, +indicator, +groupSet, +groupSets, +aggCalls, +windowFn, +windowFieldIndex); + +this.rowType = rowType; Review comment: RelNode type is already inferred from input nodes. Usually when you need to use it, you can use getRowType() function to get it than save it as a class member. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323640) Time Spent: 2h 40m (was: 2.5h) > SQL aggregation with where clause fails to plan > --- > > Key: BEAM-6995 > URL: https://issues.apache.org/jira/browse/BEAM-6995 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.11.0 >Reporter: David McIntosh >Assignee: Kirill Kozlov >Priority: Minor > Time Spent: 2h 40m > Remaining Estimate: 0h > > I'm finding that this code fails with a CannotPlanException listed below. > {code:java} > Schema schema = Schema.builder() > .addInt32Field("id") > .addInt32Field("val") > .build(); > Row row = Row.withSchema(schema).addValues(1, 2).build(); > PCollection inputData = p.apply("row input", > Create.of(row).withRowSchema(schema)); > inputData.apply("sql", > SqlTransform.query( > "SELECT id, SUM(val) " > + "FROM PCOLLECTION " > + "WHERE val > 0 " > + "GROUP BY id"));{code} > If the WHERE clause is removed the code runs successfully. > This may be similar to BEAM-5384 since I was able to work around this by > adding an extra column to the input that isn't reference in the sql. > {code:java} > Schema schema = Schema.builder() > .addInt32Field("id") > .addInt32Field("val") > .addInt32Field("extra") > .build();{code} > > {code:java} > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException: > Node [rel#100:Subset#2.BEAM_LOGICAL] could not be implemented; planner state: > Root: rel#100:Subset#2.BEAM_LOGICAL > Original rel: > LogicalAggregate(subset=[rel#100:Subset#2.BEAM_LOGICAL], group=[{0}], > EXPR$1=[SUM($1)]): rowcount = 5.0, cumulative cost = {5.687500238418579 rows, > 0.0 cpu, 0.0 io}, id = 98 > LogicalFilter(subset=[rel#97:Subset#1.NONE], condition=[>($1, 0)]): > rowcount = 50.0, cumulative cost = {50.0 rows, 100.0 cpu, 0.0 io}, id = 96 > BeamIOSourceRel(subset=[rel#95:Subset#0.BEAM_LOGICAL], table=[[beam, > PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, > 0.0 io}, id = 92 > Sets: > Set#0, type: RecordType(INTEGER id, INTEGER val) > rel#95:Subset#0.BEAM_LOGICAL, best=rel#92, > importance=0.7291 > rel#92:BeamIOSourceRel.BEAM_LOGICAL(table=[beam, > PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io} > rel#110:Subset#0.ENUMERABLE, best=rel#109, > importance=0.36455 > > rel#109:BeamEnumerableConverter.ENUMERABLE(input=rel#95:Subset#0.BEAM_LOGICAL), > rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, > 1.7976931348623157E308 cpu, 1.7976931348623157E308 io} > Set#1, type: RecordType(INTEGER id, INTEGER val) > rel#97:Subset#1.NONE, best=null, importance=0.81 > > rel#96:LogicalFilter.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,condition=>($1, > 0)), rowcount=50.0, cumulative cost={i
[jira] [Work logged] (BEAM-6995) SQL aggregation with where clause fails to plan
[ https://issues.apache.org/jira/browse/BEAM-6995?focusedWorklogId=323642&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323642 ] ASF GitHub Bot logged work on BEAM-6995: Author: ASF GitHub Bot Created on: 04/Oct/19 18:15 Start Date: 04/Oct/19 18:15 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #9703: [BEAM-6995] Beam basic aggregation rule only when not windowed URL: https://github.com/apache/beam/pull/9703#discussion_r331626110 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java ## @@ -84,6 +85,31 @@ public BeamAggregationRel( this.windowFieldIndex = windowFieldIndex; } + public BeamAggregationRel( + RelOptCluster cluster, + RelTraitSet traits, + RelNode child, + RelDataType rowType, + boolean indicator, + ImmutableBitSet groupSet, + List groupSets, + List aggCalls, + @Nullable WindowFn windowFn, + int windowFieldIndex) { +this( +cluster, +traits, +child, +indicator, +groupSet, +groupSets, +aggCalls, +windowFn, +windowFieldIndex); + +this.rowType = rowType; Review comment: There is a JIRA issue: https://jira.apache.org/jira/browse/BEAM-7609, when running queries with "SELECT DISTINCT + JOIN", resulting field names are not assigned proper name. Even though it does not solve this particular issue, preserving the rowType should not hurt (where previously it would just ignore it and set it to null). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323642) Time Spent: 2h 50m (was: 2h 40m) > SQL aggregation with where clause fails to plan > --- > > Key: BEAM-6995 > URL: https://issues.apache.org/jira/browse/BEAM-6995 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.11.0 >Reporter: David McIntosh >Assignee: Kirill Kozlov >Priority: Minor > Time Spent: 2h 50m > Remaining Estimate: 0h > > I'm finding that this code fails with a CannotPlanException listed below. > {code:java} > Schema schema = Schema.builder() > .addInt32Field("id") > .addInt32Field("val") > .build(); > Row row = Row.withSchema(schema).addValues(1, 2).build(); > PCollection inputData = p.apply("row input", > Create.of(row).withRowSchema(schema)); > inputData.apply("sql", > SqlTransform.query( > "SELECT id, SUM(val) " > + "FROM PCOLLECTION " > + "WHERE val > 0 " > + "GROUP BY id"));{code} > If the WHERE clause is removed the code runs successfully. > This may be similar to BEAM-5384 since I was able to work around this by > adding an extra column to the input that isn't reference in the sql. > {code:java} > Schema schema = Schema.builder() > .addInt32Field("id") > .addInt32Field("val") > .addInt32Field("extra") > .build();{code} > > {code:java} > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException: > Node [rel#100:Subset#2.BEAM_LOGICAL] could not be implemented; planner state: > Root: rel#100:Subset#2.BEAM_LOGICAL > Original rel: > LogicalAggregate(subset=[rel#100:Subset#2.BEAM_LOGICAL], group=[{0}], > EXPR$1=[SUM($1)]): rowcount = 5.0, cumulative cost = {5.687500238418579 rows, > 0.0 cpu, 0.0 io}, id = 98 > LogicalFilter(subset=[rel#97:Subset#1.NONE], condition=[>($1, 0)]): > rowcount = 50.0, cumulative cost = {50.0 rows, 100.0 cpu, 0.0 io}, id = 96 > BeamIOSourceRel(subset=[rel#95:Subset#0.BEAM_LOGICAL], table=[[beam, > PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, > 0.0 io}, id = 92 > Sets: > Set#0, type: RecordType(INTEGER id, INTEGER val) > rel#95:Subset#0.BEAM_LOGICAL, best=rel#92, > importance=0.7291 > rel#92:BeamIOSourceRel.BEAM_LOGICAL(table=[beam, > PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io} > rel#110:Subset#0.ENUMERABLE, best=rel#109, > importance=0.36455 > > rel#109:BeamEnumerableConverter.ENUMERABLE(input=rel#95:Subset#0.BEAM_LOGICAL), > rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, > 1.7976931348623157E308 cpu, 1.7976931348623157E308 io} > Set#1, type: RecordType(INTEGER id, INTEGER val) > rel#97:Subset#
[jira] [Resolved] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir
[ https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada resolved BEAM-6923. - Fix Version/s: 2.17.0 Resolution: Fixed I think this has been fixed huh ? : ) > OOM errors in jobServer when using GCS artifactDir > -- > > Key: BEAM-6923 > URL: https://issues.apache.org/jira/browse/BEAM-6923 > Project: Beam > Issue Type: Bug > Components: sdk-java-harness >Reporter: Lukasz Gajowy >Assignee: Ankur Goenka >Priority: Major > Fix For: 2.17.0 > > Attachments: Instance counts.png, Paths to GC root.png, > Telemetries.png, beam6923-flink156.m4v, beam6923flink182.m4v, heapdump > size-sorted.png > > Time Spent: 3h > Remaining Estimate: 0h > > When starting jobServer with artifactDir pointing to a GCS bucket: > {code:java} > ./gradlew :beam-runners-flink_2.11-job-server:runShadow > -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code} > and running a Java portable pipeline with the following, portability related > pipeline options: > {code:java} > --runner=PortableRunner --jobEndpoint=localhost:8099 > --defaultEnvironmentType=DOCKER > --defaultEnvironmentConfig=gcr.io//java:latest'{code} > > I'm facing a series of OOM errors, like this: > {code:java} > Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: > Java heap space > at > com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606) > at > com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408) > at > com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549) > at > com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > > This does not happen when I'm using a local filesystem for the artifact > staging location. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config
[ https://issues.apache.org/jira/browse/BEAM-8351?focusedWorklogId=323643&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323643 ] ASF GitHub Bot logged work on BEAM-8351: Author: ASF GitHub Bot Created on: 04/Oct/19 18:19 Start Date: 04/Oct/19 18:19 Worklog Time Spent: 10m Work Description: violalyu commented on pull request #9730: [BEAM-8351] Support passing in arbitrary KV pairs to sdk worker via external environment config URL: https://github.com/apache/beam/pull/9730#discussion_r331627588 ## File path: sdks/python/apache_beam/runners/portability/portable_runner.py ## @@ -129,11 +129,25 @@ def _create_environment(options): env=(config.get('env') or '') ).SerializeToString()) elif environment_urn == common_urns.environments.EXTERNAL.urn: + def _looks_like_json(environment_config): +import re +return re.match(r'\{.+\}', environment_config) Review comment: Thanks Chad! Updated for now! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323643) Time Spent: 1h 20m (was: 1h 10m) > Support passing in arbitrary KV pairs to sdk worker via external environment > config > --- > > Key: BEAM-8351 > URL: https://issues.apache.org/jira/browse/BEAM-8351 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Wanqi Lyu >Assignee: Wanqi Lyu >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > Originally, the environment config for environment type of EXTERNAL only > support passing in an url for the external worker pool; We want to support > passing in arbitrary KV pairs to sdk worker via external environment config, > so that the when starting the sdk harness we could get the values from > `StartWorkerRequest.params`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config
[ https://issues.apache.org/jira/browse/BEAM-8351?focusedWorklogId=323644&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323644 ] ASF GitHub Bot logged work on BEAM-8351: Author: ASF GitHub Bot Created on: 04/Oct/19 18:19 Start Date: 04/Oct/19 18:19 Worklog Time Spent: 10m Work Description: violalyu commented on pull request #9730: [BEAM-8351] Support passing in arbitrary KV pairs to sdk worker via external environment config URL: https://github.com/apache/beam/pull/9730#discussion_r331627588 ## File path: sdks/python/apache_beam/runners/portability/portable_runner.py ## @@ -129,11 +129,25 @@ def _create_environment(options): env=(config.get('env') or '') ).SerializeToString()) elif environment_urn == common_urns.environments.EXTERNAL.urn: + def _looks_like_json(environment_config): +import re +return re.match(r'\{.+\}', environment_config) Review comment: Thanks Chad! Updated for now in https://github.com/apache/beam/pull/9730/commits/7fed22b5ad4c845770bac84944c4743b77495367! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323644) Time Spent: 1.5h (was: 1h 20m) > Support passing in arbitrary KV pairs to sdk worker via external environment > config > --- > > Key: BEAM-8351 > URL: https://issues.apache.org/jira/browse/BEAM-8351 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Wanqi Lyu >Assignee: Wanqi Lyu >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > Originally, the environment config for environment type of EXTERNAL only > support passing in an url for the external worker pool; We want to support > passing in arbitrary KV pairs to sdk worker via external environment config, > so that the when starting the sdk harness we could get the values from > `StartWorkerRequest.params`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-6995) SQL aggregation with where clause fails to plan
[ https://issues.apache.org/jira/browse/BEAM-6995?focusedWorklogId=323647&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323647 ] ASF GitHub Bot logged work on BEAM-6995: Author: ASF GitHub Bot Created on: 04/Oct/19 18:24 Start Date: 04/Oct/19 18:24 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #9703: [BEAM-6995] Beam basic aggregation rule only when not windowed URL: https://github.com/apache/beam/pull/9703#discussion_r331296410 ## File path: sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationTest.java ## @@ -701,7 +700,6 @@ public void testSupportsAggregationWithoutProjection() throws Exception { } @Test - @Ignore("https://issues.apache.org/jira/browse/BEAM-8317";) public void testSupportsAggregationWithFilterWithoutProjection() throws Exception { Review comment: Found a useful reference link with examples: https://github.com/Pragmatists/JUnitParams/blob/master/src/test/java/junitparams/usage/SamplesOfUsageTest.java And this one: https://github.com/junit-team/junit4/wiki/parameterized-tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323647) Time Spent: 3h (was: 2h 50m) > SQL aggregation with where clause fails to plan > --- > > Key: BEAM-6995 > URL: https://issues.apache.org/jira/browse/BEAM-6995 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.11.0 >Reporter: David McIntosh >Assignee: Kirill Kozlov >Priority: Minor > Time Spent: 3h > Remaining Estimate: 0h > > I'm finding that this code fails with a CannotPlanException listed below. > {code:java} > Schema schema = Schema.builder() > .addInt32Field("id") > .addInt32Field("val") > .build(); > Row row = Row.withSchema(schema).addValues(1, 2).build(); > PCollection inputData = p.apply("row input", > Create.of(row).withRowSchema(schema)); > inputData.apply("sql", > SqlTransform.query( > "SELECT id, SUM(val) " > + "FROM PCOLLECTION " > + "WHERE val > 0 " > + "GROUP BY id"));{code} > If the WHERE clause is removed the code runs successfully. > This may be similar to BEAM-5384 since I was able to work around this by > adding an extra column to the input that isn't reference in the sql. > {code:java} > Schema schema = Schema.builder() > .addInt32Field("id") > .addInt32Field("val") > .addInt32Field("extra") > .build();{code} > > {code:java} > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException: > Node [rel#100:Subset#2.BEAM_LOGICAL] could not be implemented; planner state: > Root: rel#100:Subset#2.BEAM_LOGICAL > Original rel: > LogicalAggregate(subset=[rel#100:Subset#2.BEAM_LOGICAL], group=[{0}], > EXPR$1=[SUM($1)]): rowcount = 5.0, cumulative cost = {5.687500238418579 rows, > 0.0 cpu, 0.0 io}, id = 98 > LogicalFilter(subset=[rel#97:Subset#1.NONE], condition=[>($1, 0)]): > rowcount = 50.0, cumulative cost = {50.0 rows, 100.0 cpu, 0.0 io}, id = 96 > BeamIOSourceRel(subset=[rel#95:Subset#0.BEAM_LOGICAL], table=[[beam, > PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, > 0.0 io}, id = 92 > Sets: > Set#0, type: RecordType(INTEGER id, INTEGER val) > rel#95:Subset#0.BEAM_LOGICAL, best=rel#92, > importance=0.7291 > rel#92:BeamIOSourceRel.BEAM_LOGICAL(table=[beam, > PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io} > rel#110:Subset#0.ENUMERABLE, best=rel#109, > importance=0.36455 > > rel#109:BeamEnumerableConverter.ENUMERABLE(input=rel#95:Subset#0.BEAM_LOGICAL), > rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, > 1.7976931348623157E308 cpu, 1.7976931348623157E308 io} > Set#1, type: RecordType(INTEGER id, INTEGER val) > rel#97:Subset#1.NONE, best=null, importance=0.81 > > rel#96:LogicalFilter.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,condition=>($1, > 0)), rowcount=50.0, cumulative cost={inf} > > rel#102:LogicalCalc.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,expr#0..1={inputs},expr#2=0,expr#3=>($t1, > $t2),id=$t0,val=$t1,$condition=$t3), rowcount=50.0, cumulative cost={inf} > rel#104:Subset#1.BEAM_LOGICAL, best=rel#103, importance=0.405 > > re
[jira] [Work logged] (BEAM-876) Support schemaUpdateOption in BigQueryIO
[ https://issues.apache.org/jira/browse/BEAM-876?focusedWorklogId=323671&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323671 ] ASF GitHub Bot logged work on BEAM-876: --- Author: ASF GitHub Bot Created on: 04/Oct/19 19:45 Start Date: 04/Oct/19 19:45 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9524: [BEAM-876] Support schemaUpdateOption in BigQueryIO URL: https://github.com/apache/beam/pull/9524#issuecomment-538535933 Yes, I'll be glad to review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323671) Time Spent: 50m (was: 40m) > Support schemaUpdateOption in BigQueryIO > > > Key: BEAM-876 > URL: https://issues.apache.org/jira/browse/BEAM-876 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Eugene Kirpichov >Assignee: canaan silberberg >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > BigQuery recently added support for updating the schema as a side effect of > the load job. > Here is the relevant API method in JobConfigurationLoad: > https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/JobConfigurationLoad.html#setSchemaUpdateOptions(java.util.List) > BigQueryIO should support this too. See user request for this: > http://stackoverflow.com/questions/40333245/is-it-possible-to-update-schema-while-doing-a-load-into-an-existing-bigquery-tab -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-6995) SQL aggregation with where clause fails to plan
[ https://issues.apache.org/jira/browse/BEAM-6995?focusedWorklogId=323672&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323672 ] ASF GitHub Bot logged work on BEAM-6995: Author: ASF GitHub Bot Created on: 04/Oct/19 19:46 Start Date: 04/Oct/19 19:46 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #9703: [BEAM-6995] Beam basic aggregation rule only when not windowed URL: https://github.com/apache/beam/pull/9703#discussion_r331659215 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java ## @@ -84,6 +85,31 @@ public BeamAggregationRel( this.windowFieldIndex = windowFieldIndex; } + public BeamAggregationRel( + RelOptCluster cluster, + RelTraitSet traits, + RelNode child, + RelDataType rowType, + boolean indicator, + ImmutableBitSet groupSet, + List groupSets, + List aggCalls, + @Nullable WindowFn windowFn, + int windowFieldIndex) { +this( +cluster, +traits, +child, +indicator, +groupSet, +groupSets, +aggCalls, +windowFn, +windowFieldIndex); + +this.rowType = rowType; Review comment: I see, in that case I will remove this constructor. Do you think adding `deriveRowType();` to the original constructor makes sense or it would be redundant? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323672) Time Spent: 3h 10m (was: 3h) > SQL aggregation with where clause fails to plan > --- > > Key: BEAM-6995 > URL: https://issues.apache.org/jira/browse/BEAM-6995 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.11.0 >Reporter: David McIntosh >Assignee: Kirill Kozlov >Priority: Minor > Time Spent: 3h 10m > Remaining Estimate: 0h > > I'm finding that this code fails with a CannotPlanException listed below. > {code:java} > Schema schema = Schema.builder() > .addInt32Field("id") > .addInt32Field("val") > .build(); > Row row = Row.withSchema(schema).addValues(1, 2).build(); > PCollection inputData = p.apply("row input", > Create.of(row).withRowSchema(schema)); > inputData.apply("sql", > SqlTransform.query( > "SELECT id, SUM(val) " > + "FROM PCOLLECTION " > + "WHERE val > 0 " > + "GROUP BY id"));{code} > If the WHERE clause is removed the code runs successfully. > This may be similar to BEAM-5384 since I was able to work around this by > adding an extra column to the input that isn't reference in the sql. > {code:java} > Schema schema = Schema.builder() > .addInt32Field("id") > .addInt32Field("val") > .addInt32Field("extra") > .build();{code} > > {code:java} > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException: > Node [rel#100:Subset#2.BEAM_LOGICAL] could not be implemented; planner state: > Root: rel#100:Subset#2.BEAM_LOGICAL > Original rel: > LogicalAggregate(subset=[rel#100:Subset#2.BEAM_LOGICAL], group=[{0}], > EXPR$1=[SUM($1)]): rowcount = 5.0, cumulative cost = {5.687500238418579 rows, > 0.0 cpu, 0.0 io}, id = 98 > LogicalFilter(subset=[rel#97:Subset#1.NONE], condition=[>($1, 0)]): > rowcount = 50.0, cumulative cost = {50.0 rows, 100.0 cpu, 0.0 io}, id = 96 > BeamIOSourceRel(subset=[rel#95:Subset#0.BEAM_LOGICAL], table=[[beam, > PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, > 0.0 io}, id = 92 > Sets: > Set#0, type: RecordType(INTEGER id, INTEGER val) > rel#95:Subset#0.BEAM_LOGICAL, best=rel#92, > importance=0.7291 > rel#92:BeamIOSourceRel.BEAM_LOGICAL(table=[beam, > PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io} > rel#110:Subset#0.ENUMERABLE, best=rel#109, > importance=0.36455 > > rel#109:BeamEnumerableConverter.ENUMERABLE(input=rel#95:Subset#0.BEAM_LOGICAL), > rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, > 1.7976931348623157E308 cpu, 1.7976931348623157E308 io} > Set#1, type: RecordType(INTEGER id, INTEGER val) > rel#97:Subset#1.NONE, best=null, importance=0.81 > > rel#96:LogicalFilter.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,condition=>($1, > 0)), rowcount=50.0, cumulative cost={inf} >
[jira] [Work logged] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config
[ https://issues.apache.org/jira/browse/BEAM-8351?focusedWorklogId=323677&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323677 ] ASF GitHub Bot logged work on BEAM-8351: Author: ASF GitHub Bot Created on: 04/Oct/19 20:06 Start Date: 04/Oct/19 20:06 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9730: [BEAM-8351] Support passing in arbitrary KV pairs to sdk worker via external environment config URL: https://github.com/apache/beam/pull/9730#issuecomment-538542170 ``` 11:58:18 * Module apache_beam.runners.portability.portable_runner_test 11:58:18 W:328, 0: Bad indentation. Found 8 spaces, expected 6 (bad-indentation) 11:58:18 W:333, 0: Bad indentation. Found 8 spaces, expected 6 (bad-indentation) 11:58:18 C:337, 0: Line too long (82/80) (line-too-long) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323677) Time Spent: 1h 40m (was: 1.5h) > Support passing in arbitrary KV pairs to sdk worker via external environment > config > --- > > Key: BEAM-8351 > URL: https://issues.apache.org/jira/browse/BEAM-8351 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Wanqi Lyu >Assignee: Wanqi Lyu >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > Originally, the environment config for environment type of EXTERNAL only > support passing in an url for the external worker pool; We want to support > passing in arbitrary KV pairs to sdk worker via external environment config, > so that the when starting the sdk harness we could get the values from > `StartWorkerRequest.params`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-5559) Beam Dependency Update Request: com.google.guava:guava
[ https://issues.apache.org/jira/browse/BEAM-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-5559: - Assignee: Lukasz Gajowy > Beam Dependency Update Request: com.google.guava:guava > -- > > Key: BEAM-5559 > URL: https://issues.apache.org/jira/browse/BEAM-5559 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Lukasz Gajowy >Priority: Major > Fix For: 2.15.0 > > > - 2018-10-01 19:30:53.471497 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 20.0. The latest version is 26.0-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-08 12:18:05.174889 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 20.0. The latest version is 26.0-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-04-15 12:32:27.737694 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 20.0. The latest version is 27.1-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-04-22 12:10:18.539470 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 20.0. The latest version is 27.1-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-5559) Beam Dependency Update Request: com.google.guava:guava
[ https://issues.apache.org/jira/browse/BEAM-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-5559: -- Fix Version/s: (was: 2.15.0) > Beam Dependency Update Request: com.google.guava:guava > -- > > Key: BEAM-5559 > URL: https://issues.apache.org/jira/browse/BEAM-5559 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Lukasz Gajowy >Priority: Major > > - 2018-10-01 19:30:53.471497 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 20.0. The latest version is 26.0-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-08 12:18:05.174889 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 20.0. The latest version is 26.0-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-04-15 12:32:27.737694 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 20.0. The latest version is 27.1-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-04-22 12:10:18.539470 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 20.0. The latest version is 27.1-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8329) AvroCoder ReflectData doesn't use TimestampConversion
[ https://issues.apache.org/jira/browse/BEAM-8329?focusedWorklogId=323693&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323693 ] ASF GitHub Bot logged work on BEAM-8329: Author: ASF GitHub Bot Created on: 04/Oct/19 20:41 Start Date: 04/Oct/19 20:41 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #9702: [BEAM-8329] Add TimestampConversion to AvroCoder ReflectData URL: https://github.com/apache/beam/pull/9702 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323693) Time Spent: 0.5h (was: 20m) > AvroCoder ReflectData doesn't use TimestampConversion > - > > Key: BEAM-8329 > URL: https://issues.apache.org/jira/browse/BEAM-8329 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.16.0 >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.17.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The ReflectData created by AvroCoder doesn't have > {{TimeConversions.TimestampConversion}} registered which prevents it from > working with millis-instant joda DateTime instances. > AvroUtils does this statically > [here|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L78]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8209) Document custom docker containers
[ https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=323710&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323710 ] ASF GitHub Bot logged work on BEAM-8209: Author: ASF GitHub Bot Created on: 04/Oct/19 20:44 Start Date: 04/Oct/19 20:44 Worklog Time Spent: 10m Work Description: soyrice commented on pull request #9607: [BEAM-8209] Custom container docs URL: https://github.com/apache/beam/pull/9607#discussion_r331596572 ## File path: website/src/documentation/runtime/environments.md ## @@ -0,0 +1,187 @@ +--- +layout: section +title: "Runtime environments" +section_menu: section-menu/documentation.html +permalink: /documentation/runtime/environments/ +redirect_from: + - /documentation/execution-model/ +--- + + +# Runtime environments + +Any execution engine can run the Beam SDK beacuse the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/) and isolated from other runtime systems. This page describes how to build, customize, and push Beam SDK container images. + +## Building container images + +Before building Beam SDK container images: +* Register a [Bintray](https://bintray.com/) account with a Docker repository named `apache`. +* Install [Docker](https://www.docker.com/) on your workstation. + +To build Beam SDK container images: + + + +Navigate to your local copy of the https://github.com/apache/beam";>beam + + +Run Gradle with the docker target: ./gradlew docker + + + +> **Note**: It may take a long time to build all of the container images. You can instead build the images for specific SDKs: +> +> ``` +> ./gradlew -p sdks/java/container docker +> ./gradlew -p sdks/python/container docker +> ./gradlew -p sdks/go/container docker +> ``` + +Run `docker images` to examine the containers. For example, if you successfully built the container images, the command prompt displays a response like: + +``` +REPOSITORY TAGIMAGE IDCREATED SIZE +$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 3 minutes ago 1.27GB +$USER-docker-apache.bintray.io/beam/java latest 0103512f1d8f 34 minutes ago 780MB +$USER-docker-apache.bintray.io/beam/go latest ce055985808a 35 minutes ago 121MB +``` + +Although the respository names look like URLs, the container images are stored locally on your workstation. After building the container images locally, you can [push](#pushing-container-images) them to an eponymous repository online. + +### Overriding default Docker targets + +The default SDK version is `latest` and the default Docker repository is the following Bintray location: + +``` +$USER-docker-apache.bintray.io/beam +``` + +When you [build SDK container images](#building-container-images), you can override the default version and location. + +To specify an older Python SDK version, like 2.3.0, build the container with the `docker-tag` option: + +``` +./gradlew docker -Pdocker-tag=2.3.0 +``` + +To change the `docker` target, build the container with the `docker-repository-root` option: + +``` +./gradlew docker -Pdocker-repository-root=$LOCATION +``` + +## Customizing container images + +You can add extra dependencies or serialization files to container images so the execution engine doesn't need them. + +To customize a container image, either: +* [Write a new](#writing-new-dockerfiles) [Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the original +* [Modify](#modifying-dockerfiles) the [original Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) and reimage the container + +It's often easier to write a new Dockerfile, but you can customize anything, including the base OS, by modifying the original. + +### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} + + + +Pull a https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release";>prebuilt SDK container image for your target language and version. + + +https://docs.docker.com/develop/develop-images/dockerfile_best-practices/";>Write a new Dockerfile that https://docs.docker.com/engine/reference/builder/#from";>designates the original as its https://docs.docker.com/glossary/?term=parent%20image";>parent + + +Build a child image: docker build -f /path/to/new/Dockerfile + + + +### Modifying the original Dockerfile {#modifying-dockerfiles} + +1. Pull the [prebuilt SDK container image](https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release) for your target language and version +2. Customize the [Dockerfile](https:/
[jira] [Work logged] (BEAM-8209) Document custom docker containers
[ https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=323702&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323702 ] ASF GitHub Bot logged work on BEAM-8209: Author: ASF GitHub Bot Created on: 04/Oct/19 20:44 Start Date: 04/Oct/19 20:44 Worklog Time Spent: 10m Work Description: soyrice commented on pull request #9607: [BEAM-8209] Custom container docs URL: https://github.com/apache/beam/pull/9607#discussion_r331574098 ## File path: website/src/documentation/runtime/environments.md ## @@ -0,0 +1,187 @@ +--- +layout: section +title: "Runtime environments" +section_menu: section-menu/documentation.html +permalink: /documentation/runtime/environments/ +redirect_from: + - /documentation/execution-model/ +--- + + +# Runtime environments + +Any execution engine can run the Beam SDK beacuse the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/) and isolated from other runtime systems. This page describes how to build, customize, and push Beam SDK container images. + +## Building container images + +Before building Beam SDK container images: +* Register a [Bintray](https://bintray.com/) account with a Docker repository named `apache`. +* Install [Docker](https://www.docker.com/) on your workstation. + +To build Beam SDK container images: + + + +Navigate to your local copy of the https://github.com/apache/beam";>beam + + +Run Gradle with the docker target: ./gradlew docker + + + +> **Note**: It may take a long time to build all of the container images. You can instead build the images for specific SDKs: +> +> ``` +> ./gradlew -p sdks/java/container docker Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323702) Time Spent: 6h 50m (was: 6h 40m) > Document custom docker containers > - > > Key: BEAM-8209 > URL: https://issues.apache.org/jira/browse/BEAM-8209 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Cyrus Maden >Assignee: Cyrus Maden >Priority: Minor > Fix For: 2.17.0 > > Time Spent: 6h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8209) Document custom docker containers
[ https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=323719&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323719 ] ASF GitHub Bot logged work on BEAM-8209: Author: ASF GitHub Bot Created on: 04/Oct/19 20:44 Start Date: 04/Oct/19 20:44 Worklog Time Spent: 10m Work Description: soyrice commented on pull request #9607: [BEAM-8209] Custom container docs URL: https://github.com/apache/beam/pull/9607#discussion_r331590943 ## File path: website/src/documentation/runtime/environments.md ## @@ -0,0 +1,187 @@ +--- +layout: section +title: "Runtime environments" +section_menu: section-menu/documentation.html +permalink: /documentation/runtime/environments/ +redirect_from: + - /documentation/execution-model/ +--- + + +# Runtime environments + +Any execution engine can run the Beam SDK beacuse the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/) and isolated from other runtime systems. This page describes how to build, customize, and push Beam SDK container images. + +## Building container images + +Before building Beam SDK container images: +* Register a [Bintray](https://bintray.com/) account with a Docker repository named `apache`. +* Install [Docker](https://www.docker.com/) on your workstation. + +To build Beam SDK container images: + + + +Navigate to your local copy of the https://github.com/apache/beam";>beam + + +Run Gradle with the docker target: ./gradlew docker + + + +> **Note**: It may take a long time to build all of the container images. You can instead build the images for specific SDKs: +> +> ``` +> ./gradlew -p sdks/java/container docker +> ./gradlew -p sdks/python/container docker +> ./gradlew -p sdks/go/container docker +> ``` + +Run `docker images` to examine the containers. For example, if you successfully built the container images, the command prompt displays a response like: + +``` +REPOSITORY TAGIMAGE IDCREATED SIZE +$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 3 minutes ago 1.27GB +$USER-docker-apache.bintray.io/beam/java latest 0103512f1d8f 34 minutes ago 780MB +$USER-docker-apache.bintray.io/beam/go latest ce055985808a 35 minutes ago 121MB +``` + +Although the respository names look like URLs, the container images are stored locally on your workstation. After building the container images locally, you can [push](#pushing-container-images) them to an eponymous repository online. + +### Overriding default Docker targets + +The default SDK version is `latest` and the default Docker repository is the following Bintray location: + +``` +$USER-docker-apache.bintray.io/beam +``` + +When you [build SDK container images](#building-container-images), you can override the default version and location. + +To specify an older Python SDK version, like 2.3.0, build the container with the `docker-tag` option: + +``` +./gradlew docker -Pdocker-tag=2.3.0 +``` + +To change the `docker` target, build the container with the `docker-repository-root` option: + +``` +./gradlew docker -Pdocker-repository-root=$LOCATION +``` + +## Customizing container images + +You can add extra dependencies or serialization files to container images so the execution engine doesn't need them. + +To customize a container image, either: +* [Write a new](#writing-new-dockerfiles) [Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the original +* [Modify](#modifying-dockerfiles) the [original Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) and reimage the container + +It's often easier to write a new Dockerfile, but you can customize anything, including the base OS, by modifying the original. + +### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} + + + +Pull a https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release";>prebuilt SDK container image for your target language and version. + + +https://docs.docker.com/develop/develop-images/dockerfile_best-practices/";>Write a new Dockerfile that https://docs.docker.com/engine/reference/builder/#from";>designates the original as its https://docs.docker.com/glossary/?term=parent%20image";>parent + + +Build a child image: docker build -f /path/to/new/Dockerfile + + + +### Modifying the original Dockerfile {#modifying-dockerfiles} + +1. Pull the [prebuilt SDK container image](https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release) for your target language and version +2. Customize the [Dockerfile](https:/
[jira] [Work logged] (BEAM-8209) Document custom docker containers
[ https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=323700&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323700 ] ASF GitHub Bot logged work on BEAM-8209: Author: ASF GitHub Bot Created on: 04/Oct/19 20:44 Start Date: 04/Oct/19 20:44 Worklog Time Spent: 10m Work Description: soyrice commented on pull request #9607: [BEAM-8209] Custom container docs URL: https://github.com/apache/beam/pull/9607#discussion_r331569705 ## File path: website/src/documentation/runtime/environments.md ## @@ -0,0 +1,187 @@ +--- +layout: section +title: "Runtime environments" +section_menu: section-menu/documentation.html +permalink: /documentation/runtime/environments/ +redirect_from: + - /documentation/execution-model/ +--- + + +# Runtime environments + +Any execution engine can run the Beam SDK beacuse the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/) and isolated from other runtime systems. This page describes how to build, customize, and push Beam SDK container images. + +## Building container images + +Before building Beam SDK container images: +* Register a [Bintray](https://bintray.com/) account with a Docker repository named `apache`. +* Install [Docker](https://www.docker.com/) on your workstation. + +To build Beam SDK container images: + + + +Navigate to your local copy of the https://github.com/apache/beam";>beam + + +Run Gradle with the docker target: ./gradlew docker + + + +> **Note**: It may take a long time to build all of the container images. You can instead build the images for specific SDKs: +> +> ``` +> ./gradlew -p sdks/java/container docker +> ./gradlew -p sdks/python/container docker +> ./gradlew -p sdks/go/container docker +> ``` + +Run `docker images` to examine the containers. For example, if you successfully built the container images, the command prompt displays a response like: + +``` +REPOSITORY TAGIMAGE IDCREATED SIZE +$USER-docker-apache.bintray.io/beam/python latest 4ea515403a1a 3 minutes ago 1.27GB +$USER-docker-apache.bintray.io/beam/java latest 0103512f1d8f 34 minutes ago 780MB +$USER-docker-apache.bintray.io/beam/go latest ce055985808a 35 minutes ago 121MB +``` + +Although the respository names look like URLs, the container images are stored locally on your workstation. After building the container images locally, you can [push](#pushing-container-images) them to an eponymous repository online. + +### Overriding default Docker targets + +The default SDK version is `latest` and the default Docker repository is the following Bintray location: + +``` +$USER-docker-apache.bintray.io/beam +``` + +When you [build SDK container images](#building-container-images), you can override the default version and location. Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323700) Time Spent: 6.5h (was: 6h 20m) > Document custom docker containers > - > > Key: BEAM-8209 > URL: https://issues.apache.org/jira/browse/BEAM-8209 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Cyrus Maden >Assignee: Cyrus Maden >Priority: Minor > Fix For: 2.17.0 > > Time Spent: 6.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)