[jira] [Work logged] (BEAM-4176) Java: Portable batch runner passes all ValidatesRunner tests that non-portable runner passes
[ https://issues.apache.org/jira/browse/BEAM-4176?focusedWorklogId=154120=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154120 ] ASF GitHub Bot logged work on BEAM-4176: Author: ASF GitHub Bot Created on: 13/Oct/18 18:40 Start Date: 13/Oct/18 18:40 Worklog Time Spent: 10m Work Description: tweise commented on a change in pull request #6592: [BEAM-4176] Enable Post Commit JAVA PVR tests for Flink URL: https://github.com/apache/beam/pull/6592#discussion_r224968563 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -1498,6 +1498,7 @@ artifactId=${project.name} testClassesDirs = project.files(project.project(":beam-sdks-java-core").sourceSets.test.output.classesDirs, project.project(":beam-runners-core-java").sourceSets.test.output.classesDirs) maxParallelForks config.parallelism useJUnit(config.testCategories) +dependsOn ':beam-sdks-java-container:docker' Review comment: Is this later going to change to use the process environment? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154120) Time Spent: 33h 40m (was: 33.5h) > Java: Portable batch runner passes all ValidatesRunner tests that > non-portable runner passes > > > Key: BEAM-4176 > URL: https://issues.apache.org/jira/browse/BEAM-4176 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Ben Sidhom >Assignee: Ankur Goenka >Priority: Major > Attachments: 81VxNWtFtke.png, Screen Shot 2018-08-14 at 4.18.31 > PM.png, Screen Shot 2018-09-03 at 11.07.38 AM.png > > Time Spent: 33h 40m > Remaining Estimate: 0h > > We need this as a sanity check that runner execution is correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5708) Support caching of SDKHarness environments in flink
[ https://issues.apache.org/jira/browse/BEAM-5708?focusedWorklogId=154119=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154119 ] ASF GitHub Bot logged work on BEAM-5708: Author: ASF GitHub Bot Created on: 13/Oct/18 18:21 Start Date: 13/Oct/18 18:21 Worklog Time Spent: 10m Work Description: tweise closed pull request #6638: [BEAM-5708] Cache environment in portable flink runner URL: https://github.com/apache/beam/pull/6638 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy b/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy index 290f72399db..0ba980b27e5 100644 --- a/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy +++ b/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy @@ -1484,6 +1484,7 @@ artifactId=${project.name} def beamTestPipelineOptions = [ "--runner=org.apache.beam.runners.reference.testing.TestPortableRunner", "--jobServerDriver=${config.jobServerDriver}", +"--environmentCacheMillis=1", ] if (config.jobServerConfig) { beamTestPipelineOptions.add("--jobServerConfig=${config.jobServerConfig}") diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java index 988a94826fb..bb2b9dcbe16 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java @@ -24,12 +24,17 @@ import java.util.concurrent.ConcurrentHashMap; import java.util.concurrent.Executors; import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicInteger; +import org.apache.beam.runners.core.construction.PipelineOptionsTranslation; import org.apache.beam.runners.core.construction.graph.ExecutableStage; import org.apache.beam.runners.fnexecution.control.StageBundleFactory; import org.apache.beam.runners.fnexecution.provisioning.JobInfo; import org.apache.beam.sdk.fn.function.ThrowingFunction; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.PortablePipelineOptions; import org.apache.flink.annotation.VisibleForTesting; +import org.apache.flink.api.java.ExecutionEnvironment; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -104,9 +109,30 @@ private void scheduleRelease(JobInfo jobInfo) { WrappedContext wrapper = getCache().get(jobInfo.jobId()); Preconditions.checkState( wrapper != null, "Releasing context for unknown job: " + jobInfo.jobId()); -// Do not release this asynchronously, as the releasing could fail due to the classloader not being -// available anymore after the tasks have been removed from the execution engine. -release(wrapper); + +PipelineOptions pipelineOptions = +PipelineOptionsTranslation.fromProto(jobInfo.pipelineOptions()); +int environmentCacheTTLMillis = + pipelineOptions.as(PortablePipelineOptions.class).getEnvironmentCacheMillis(); +if (environmentCacheTTLMillis > 0) { + if (this.getClass().getClassLoader() != ExecutionEnvironment.class.getClassLoader()) { +LOG.warn( +"{} is not loaded on parent Flink classloader. " ++ "Falling back to synchronous environment release for job {}.", +this.getClass(), +jobInfo.jobId()); +release(wrapper); + } else { +// Schedule task to clean the container later. +// Ensure that this class is loaded in the parent Flink classloader. +getExecutor() +.schedule(() -> release(wrapper), environmentCacheTTLMillis, TimeUnit.MILLISECONDS); + } +} else { + // Do not release this asynchronously, as the releasing could fail due to the classloader not + // being available anymore after the tasks have been removed from the execution engine. + release(wrapper); +} } private ConcurrentHashMap getCache() { diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java b/sdks/java/core/src/main/java/org/apache/beam/sd
[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner
[ https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=154104=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154104 ] ASF GitHub Bot logged work on BEAM-5442: Author: ASF GitHub Bot Created on: 13/Oct/18 17:07 Start Date: 13/Oct/18 17:07 Worklog Time Spent: 10m Work Description: tweise closed pull request #6683: [BEAM-5442] Revert #6675 "Revert PRs #6557 #6589 #6600" URL: https://github.com/apache/beam/pull/6683 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154104) Time Spent: 9h 50m (was: 9h 40m) > PortableRunner swallows custom options for Runner > - > > Key: BEAM-5442 > URL: https://issues.apache.org/jira/browse/BEAM-5442 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Fix For: 2.9.0 > > Time Spent: 9h 50m > Remaining Estimate: 0h > > The PortableRunner doesn't pass custom PipelineOptions to the executing > Runner. > Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner. > (The option is just removed during proto translation without any warning) > We should allow some form of customization through the options, even for the > PortableRunner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5097) Increment counter for "small words" in go SDK example
[ https://issues.apache.org/jira/browse/BEAM-5097?focusedWorklogId=154100=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154100 ] ASF GitHub Bot logged work on BEAM-5097: Author: ASF GitHub Bot Created on: 13/Oct/18 14:07 Start Date: 13/Oct/18 14:07 Worklog Time Spent: 10m Work Description: stale[bot] closed pull request #6157: [BEAM-5097][WIP] Add counter to combine example in go sdk URL: https://github.com/apache/beam/pull/6157 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/go/examples/cookbook/combine/combine.go b/sdks/go/examples/cookbook/combine/combine.go index 7e24aa1fb30..1950a687d24 100644 --- a/sdks/go/examples/cookbook/combine/combine.go +++ b/sdks/go/examples/cookbook/combine/combine.go @@ -63,11 +63,16 @@ type extractFn struct { MinLength int `json:"min_length"` } +// A global context for simplicity. +var ctx = context.Background() + func (f *extractFn) ProcessElement(row WordRow, emit func(string, string)) { + small_words := beam.NewCounter("example.namespace", "small_words") if len(row.Word) >= f.MinLength { emit(row.Word, row.Corpus) + } else { + small_words.Inc(ctx, 1) } - // TODO(herohde) 7/14/2017: increment counter for "small words" } // TODO(herohde) 7/14/2017: the choice of a string (instead of []string) for the This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154100) Time Spent: 1h 40m (was: 1.5h) > Increment counter for "small words" in go SDK example > - > > Key: BEAM-5097 > URL: https://issues.apache.org/jira/browse/BEAM-5097 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: holdenk >Assignee: holdenk >Priority: Trivial > Time Spent: 1h 40m > Remaining Estimate: 0h > > Increment counter for "small words" in go SDK example -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5097) Increment counter for "small words" in go SDK example
[ https://issues.apache.org/jira/browse/BEAM-5097?focusedWorklogId=154099=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154099 ] ASF GitHub Bot logged work on BEAM-5097: Author: ASF GitHub Bot Created on: 13/Oct/18 14:07 Start Date: 13/Oct/18 14:07 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #6157: [BEAM-5097][WIP] Add counter to combine example in go sdk URL: https://github.com/apache/beam/pull/6157#issuecomment-429545012 This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154099) Time Spent: 1.5h (was: 1h 20m) > Increment counter for "small words" in go SDK example > - > > Key: BEAM-5097 > URL: https://issues.apache.org/jira/browse/BEAM-5097 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: holdenk >Assignee: holdenk >Priority: Trivial > Time Spent: 1.5h > Remaining Estimate: 0h > > Increment counter for "small words" in go SDK example -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5626) Several IO tests fail in Python 3 with RuntimeError('dictionary changed size during iteration',)}
[ https://issues.apache.org/jira/browse/BEAM-5626?focusedWorklogId=154082=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154082 ] ASF GitHub Bot logged work on BEAM-5626: Author: ASF GitHub Bot Created on: 13/Oct/18 04:25 Start Date: 13/Oct/18 04:25 Worklog Time Spent: 10m Work Description: HuangLED commented on a change in pull request #6587: [BEAM-5626] Fix hadoop filesystem test for py3. URL: https://github.com/apache/beam/pull/6587#discussion_r224949182 ## File path: sdks/python/apache_beam/io/hadoopfilesystem_test.py ## @@ -214,6 +214,11 @@ def setUp(self): url = self.fs.join(self.tmpdir, filename) self.fs.create(url).close() +try:# Python 2 Review comment: @tvalentynI came across this python2->python3 doc from python.org, LINK: python-2-3 Difference . Interesting read. Section "Use feature detection instead of version detection" talks about replying on sys.version_info. figured maybe we can weigh in the point this doc brought up before we apply the change to every place. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154082) Time Spent: 5h 40m (was: 5.5h) > Several IO tests fail in Python 3 with RuntimeError('dictionary changed size > during iteration',)} > - > > Key: BEAM-5626 > URL: https://issues.apache.org/jira/browse/BEAM-5626 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Ruoyun Huang >Priority: Major > Fix For: 2.8.0 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > ERROR: test_delete_dir > (apache_beam.io.hadoopfilesystem_test.HadoopFileSystemTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/hadoopfilesystem_test.py", > line 506, in test_delete_dir > self.fs.delete([url_t1]) >File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/hadoopfilesystem.py", > line 370, in delete > raise BeamIOError("Delete operation failed", exceptions) > apache_beam.io.filesystem.BeamIOError: Delete operation failed with > exceptions {'hdfs://test_dir/new_dir1': RuntimeError('dictionary changed size > during iteration', )} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5626) Several IO tests fail in Python 3 with RuntimeError('dictionary changed size during iteration',)}
[ https://issues.apache.org/jira/browse/BEAM-5626?focusedWorklogId=154083=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154083 ] ASF GitHub Bot logged work on BEAM-5626: Author: ASF GitHub Bot Created on: 13/Oct/18 04:28 Start Date: 13/Oct/18 04:28 Worklog Time Spent: 10m Work Description: HuangLED commented on a change in pull request #6587: [BEAM-5626] Fix hadoop filesystem test for py3. URL: https://github.com/apache/beam/pull/6587#discussion_r224949182 ## File path: sdks/python/apache_beam/io/hadoopfilesystem_test.py ## @@ -214,6 +214,11 @@ def setUp(self): url = self.fs.join(self.tmpdir, filename) self.fs.create(url).close() +try:# Python 2 Review comment: @tvalentynI came across this python2->python3 doc from python.org, LINK: python-2-3 Difference . Interesting read. Section "Use feature detection instead of version detection" talks about replying on sys.version_info. Though the point does makes our use here into a negative. just FYI. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154083) Time Spent: 5h 50m (was: 5h 40m) > Several IO tests fail in Python 3 with RuntimeError('dictionary changed size > during iteration',)} > - > > Key: BEAM-5626 > URL: https://issues.apache.org/jira/browse/BEAM-5626 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Ruoyun Huang >Priority: Major > Fix For: 2.8.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > ERROR: test_delete_dir > (apache_beam.io.hadoopfilesystem_test.HadoopFileSystemTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/hadoopfilesystem_test.py", > line 506, in test_delete_dir > self.fs.delete([url_t1]) >File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/hadoopfilesystem.py", > line 370, in delete > raise BeamIOError("Delete operation failed", exceptions) > apache_beam.io.filesystem.BeamIOError: Delete operation failed with > exceptions {'hdfs://test_dir/new_dir1': RuntimeError('dictionary changed size > during iteration', )} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5626) Several IO tests fail in Python 3 with RuntimeError('dictionary changed size during iteration',)}
[ https://issues.apache.org/jira/browse/BEAM-5626?focusedWorklogId=154081=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154081 ] ASF GitHub Bot logged work on BEAM-5626: Author: ASF GitHub Bot Created on: 13/Oct/18 04:24 Start Date: 13/Oct/18 04:24 Worklog Time Spent: 10m Work Description: HuangLED commented on a change in pull request #6587: [BEAM-5626] Fix hadoop filesystem test for py3. URL: https://github.com/apache/beam/pull/6587#discussion_r224949182 ## File path: sdks/python/apache_beam/io/hadoopfilesystem_test.py ## @@ -214,6 +214,11 @@ def setUp(self): url = self.fs.join(self.tmpdir, filename) self.fs.create(url).close() +try:# Python 2 Review comment: I came across this python2->python3 doc from python.org. python-2-3 Difference Interesting read. Section "Use feature detection instead of version detection" talks about replying on sys.version_info, figured maybe we can weigh in the point this doc brought up before we apply the change to every place. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154081) Time Spent: 5.5h (was: 5h 20m) > Several IO tests fail in Python 3 with RuntimeError('dictionary changed size > during iteration',)} > - > > Key: BEAM-5626 > URL: https://issues.apache.org/jira/browse/BEAM-5626 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Ruoyun Huang >Priority: Major > Fix For: 2.8.0 > > Time Spent: 5.5h > Remaining Estimate: 0h > > ERROR: test_delete_dir > (apache_beam.io.hadoopfilesystem_test.HadoopFileSystemTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/hadoopfilesystem_test.py", > line 506, in test_delete_dir > self.fs.delete([url_t1]) >File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/hadoopfilesystem.py", > line 370, in delete > raise BeamIOError("Delete operation failed", exceptions) > apache_beam.io.filesystem.BeamIOError: Delete operation failed with > exceptions {'hdfs://test_dir/new_dir1': RuntimeError('dictionary changed size > during iteration', )} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154080=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154080 ] ASF GitHub Bot logged work on BEAM-5636: Author: ASF GitHub Bot Created on: 13/Oct/18 02:44 Start Date: 13/Oct/18 02:44 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #6665: [BEAM-5636] Java support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6665#issuecomment-429505015 @herohde this PR is ready to merge. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154080) Time Spent: 1h 20m (was: 1h 10m) > Java support for custom dataflow worker jar > --- > > Key: BEAM-5636 > URL: https://issues.apache.org/jira/browse/BEAM-5636 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Java jobs. That requires a change to the Java boot > code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5744) Investigate negative numbers represented as 'long' in Python SDK + Direct runner
[ https://issues.apache.org/jira/browse/BEAM-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648703#comment-16648703 ] Ahmet Altay commented on BEAM-5744: --- Should we revert this until we figure out how to fix it? > Investigate negative numbers represented as 'long' in Python SDK + Direct > runner > > > Key: BEAM-5744 > URL: https://issues.apache.org/jira/browse/BEAM-5744 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: 2.8.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5628) Several VcfIO tests fail in Python 3 with TypeError: cannot use a string pattern on a bytes-like object
[ https://issues.apache.org/jira/browse/BEAM-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648695#comment-16648695 ] Valentyn Tymofieiev commented on BEAM-5628: --- cc [~arostami], one of VCF IO authors. > Several VcfIO tests fail in Python 3 with TypeError: cannot use a string > pattern on a bytes-like object > > > Key: BEAM-5628 > URL: https://issues.apache.org/jira/browse/BEAM-5628 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Simon >Priority: Major > > ERROR: test_read_after_splitting (apache_beam.io.vcfio_test.VcfSourceTest) > " > -- > Traceback (most recent call last): >File > ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio_test.py"", > line 336, in test_read_after_splitting > ] split_records.extend(source_test_utils.read_from_source(*source_info)) > ] File > ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils.py"", > line 101, in read_from_source > for value in reader: >File > ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"", > line 264, in read_records > for line in record_iterator: >File > ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"", > line 330, in __next__ > record = next(self._vcf_reader) >File > ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/vcf/parser.py"", > line 543, in __next__ > row = self._row_pattern.split(line.rstrip()) > TypeError: cannot use a string pattern on a bytes-like object > " -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-5628) Several VcfIO tests fail in Python 3 with TypeError: cannot use a string pattern on a bytes-like object
[ https://issues.apache.org/jira/browse/BEAM-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648694#comment-16648694 ] Valentyn Tymofieiev edited comment on BEAM-5628 at 10/13/18 2:20 AM: - Looks like VCF IO is using another package (PyVCF). We need to understand whether the issue is because we don't use PyVCF correctly (VCF IO issue), or because PyVCF itself is not Python3-compatible as it claims to be. was (Author: tvalentyn): Looks like VCF IO is using another package (PyVCF). We need to understand whether the issue is because we don't use PyVCF correctly (VCF IO issue), or because PyVCF itself is not Python3-compatible. > Several VcfIO tests fail in Python 3 with TypeError: cannot use a string > pattern on a bytes-like object > > > Key: BEAM-5628 > URL: https://issues.apache.org/jira/browse/BEAM-5628 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Simon >Priority: Major > > ERROR: test_read_after_splitting (apache_beam.io.vcfio_test.VcfSourceTest) > " > -- > Traceback (most recent call last): >File > ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio_test.py"", > line 336, in test_read_after_splitting > ] split_records.extend(source_test_utils.read_from_source(*source_info)) > ] File > ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils.py"", > line 101, in read_from_source > for value in reader: >File > ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"", > line 264, in read_records > for line in record_iterator: >File > ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"", > line 330, in __next__ > record = next(self._vcf_reader) >File > ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/vcf/parser.py"", > line 543, in __next__ > row = self._row_pattern.split(line.rstrip()) > TypeError: cannot use a string pattern on a bytes-like object > " -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5628) Several VcfIO tests fail in Python 3 with TypeError: cannot use a string pattern on a bytes-like object
[ https://issues.apache.org/jira/browse/BEAM-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648694#comment-16648694 ] Valentyn Tymofieiev commented on BEAM-5628: --- Looks like VCF IO is using another package (PyVCF). We need to understand whether the issue is because we don't use PyVCF correctly (VCF IO issue), or because PyVCF itself is not Python3-compatible. > Several VcfIO tests fail in Python 3 with TypeError: cannot use a string > pattern on a bytes-like object > > > Key: BEAM-5628 > URL: https://issues.apache.org/jira/browse/BEAM-5628 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Simon >Priority: Major > > ERROR: test_read_after_splitting (apache_beam.io.vcfio_test.VcfSourceTest) > " > -- > Traceback (most recent call last): >File > ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio_test.py"", > line 336, in test_read_after_splitting > ] split_records.extend(source_test_utils.read_from_source(*source_info)) > ] File > ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils.py"", > line 101, in read_from_source > for value in reader: >File > ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"", > line 264, in read_records > for line in record_iterator: >File > ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"", > line 330, in __next__ > record = next(self._vcf_reader) >File > ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/vcf/parser.py"", > line 543, in __next__ > row = self._row_pattern.split(line.rstrip()) > TypeError: cannot use a string pattern on a bytes-like object > " -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5623) Several IO tests hang indefinitely during execution on Python 3.
[ https://issues.apache.org/jira/browse/BEAM-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648693#comment-16648693 ] Valentyn Tymofieiev commented on BEAM-5623: --- What exactly happens when the test is hanging? > Several IO tests hang indefinitely during execution on Python 3. > > > Key: BEAM-5623 > URL: https://issues.apache.org/jira/browse/BEAM-5623 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > test_read_empty_single_file_no_eol_gzip > (apache_beam.io.textio_test.TextSourceTest) > Also several tests cases in tfrecordio_test, for example: > test_process_auto (apache_beam.io.tfrecordio_test.TestReadAllFromTFRecord) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5744) Investigate negative numbers represented as 'long' in Python SDK + Direct runner
[ https://issues.apache.org/jira/browse/BEAM-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648688#comment-16648688 ] Valentyn Tymofieiev commented on BEAM-5744: --- cc [~altay] [~charleschen] [~juta] [~robertwb] > Investigate negative numbers represented as 'long' in Python SDK + Direct > runner > > > Key: BEAM-5744 > URL: https://issues.apache.org/jira/browse/BEAM-5744 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: 2.8.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5442) PortableRunner swallows custom options for Runner
[ https://issues.apache.org/jira/browse/BEAM-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Weise updated BEAM-5442: --- Fix Version/s: 2.9.0 > PortableRunner swallows custom options for Runner > - > > Key: BEAM-5442 > URL: https://issues.apache.org/jira/browse/BEAM-5442 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Fix For: 2.9.0 > > Time Spent: 9h 40m > Remaining Estimate: 0h > > The PortableRunner doesn't pass custom PipelineOptions to the executing > Runner. > Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner. > (The option is just removed during proto translation without any warning) > We should allow some form of customization through the options, even for the > PortableRunner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5744) Investigate negative numbers represented as 'long' in Python SDK + Direct runner
[ https://issues.apache.org/jira/browse/BEAM-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648681#comment-16648681 ] Valentyn Tymofieiev commented on BEAM-5744: --- https://github.com/apache/beam/pull/6685 illustrates the failure. 18:13:43 == 18:13:43 ERROR: test_assert_that_passes_order_does_not_matter_with_negatives (apache_beam.testing.util_test.UtilTest) 18:13:43 -- 18:13:43 Traceback (most recent call last): 18:13:43 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/testing/util_test.py", line 47, in test_assert_that_passes_order_does_not_matter_with_negatives 18:13:43 assert_that(p | Create([1, -2, 3]), equal_to([-2, 1, 3])) 18:13:43 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/pipeline.py", line 423, in __exit__ 18:13:43 self.run().wait_until_finish() 18:13:43 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/testing/test_pipeline.py", line 104, in run 18:13:43 result = super(TestPipeline, self).run(test_runner_api) 18:13:43 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/pipeline.py", line 403, in run 18:13:43 self.to_runner_api(), self.runner, self._options).run(False) 18:13:43 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/pipeline.py", line 416, in run 18:13:43 return self.runner.run_pipeline(self) 18:13:43 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/runners/direct/direct_runner.py", line 138, in run_pipeline 18:13:43 return runner.run_pipeline(pipeline) 18:13:43 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 231, in run_pipeline 18:13:43 return self.run_via_runner_api(pipeline.to_runner_api()) 18:13:43 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 234, in run_via_runner_api 18:13:43 return self.run_stages(*self.create_stages(pipeline_proto)) 18:13:43 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1017, in run_stages 18:13:43 pcoll_buffers, safe_coders) 18:13:43 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1137, in run_stage 18:13:43 self._progress_frequency).process_bundle(data_input, data_output) 18:13:43 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1393, in process_bundle 18:13:43 result_future = self._controller.control_handler.push(process_bundle) 18:13:43 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1265, in push 18:13:43 response = self.worker.do_instruction(request) 18:13:43 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/runners/worker/sdk_worker.py", line 212, in do_instruction 18:13:43 request.instruction_id) 18:13:43 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/runners/worker/sdk_worker.py", line 234, in process_bundle 18:13:43 processor.process_bundle(instruction_id) 18:13:43 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/runners/worker/bundle_processor.py", line 420, in process_bundle 18:13:43 ].process_encoded(data.data) 18:13:43 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/runners/worker/bundle_processor.py", line 125, in process_encoded 18:13:43 self.output(decoded_value) 18:13:43 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/runners/worker/operations.py", line 169, in output 18:13:43 cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value) 18:13:43 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/runners/worker/operations.py", line 89, in receive 18:13:43 cython.cast(Operation, consumer).process(windowe
[jira] [Updated] (BEAM-2953) Timeseries processing extensions using state API
[ https://issues.apache.org/jira/browse/BEAM-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reza ardeshir rokni updated BEAM-2953: -- Summary: Timeseries processing extensions using state API (was: Create more advanced Timeseries processing examples using state API) > Timeseries processing extensions using state API > > > Key: BEAM-2953 > URL: https://issues.apache.org/jira/browse/BEAM-2953 > Project: Beam > Issue Type: Improvement > Components: sdk-ideas >Affects Versions: 2.7.0 >Reporter: Reza ardeshir rokni >Assignee: Reuven Lax >Priority: Minor > Time Spent: 3h 10m > Remaining Estimate: 0h > > A general set of timeseries transforms that abstract the user from the > process of dealing with some of the common problems when dealing with > timeseries using BEAM (in stream or batch mode). > BEAM can be used to build out some very interesting pre-processing stages for > time series data. Some examples that will be useful: > - Downsampling time series based on simple MIN, MAX, COUNT, SUM, LAST, FIRST > - Creating a value for each downsampled window even if no value has been > emitted for the specific key. > - Loading the value of a downsample with the previous value (used in FX with > previous close being brought into current open value) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154078=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154078 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 13/Oct/18 01:29 Start Date: 13/Oct/18 01:29 Worklog Time Spent: 10m Work Description: HuangLED edited a comment on issue #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#issuecomment-429468108 R: @herohde cc: @boyuanzz @pabloem Addressed. Also, option definition moved to WorkerOptions per Pablo's suggestion. Thanks to Boyuan for pointing out the right place for error message. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154078) Time Spent: 2.5h (was: 2h 20m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing
[ https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=154077=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154077 ] ASF GitHub Bot logged work on BEAM-5707: Author: ASF GitHub Bot Created on: 13/Oct/18 01:03 Start Date: 13/Oct/18 01:03 Worklog Time Spent: 10m Work Description: mwylde commented on a change in pull request #6637: [BEAM-5707] Add a periodic, streaming impulse source for Flink portable pipelines URL: https://github.com/apache/beam/pull/6637#discussion_r224944539 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java ## @@ -406,6 +417,56 @@ private void translateImpulse( context.addDataStream(Iterables.getOnlyElement(pTransform.getOutputsMap().values()), source); } + @AutoService(NativeTransforms.IsNativeTransform.class) + public static class IsFlinkNativeTransform implements NativeTransforms.IsNativeTransform { +@Override +public boolean test(RunnerApi.PTransform pTransform) { + return STREAMING_IMPULSE_TRANSFORM_URL.equals(PTransformTranslation.urnForTransformOrNull(pTransform)); +} + } + + private void translateStreamingImpulse( + String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext context) { +RunnerApi.PTransform pTransform = pipeline.getComponents().getTransformsOrThrow(id); + +ObjectMapper objectMapper = new ObjectMapper(); + +int intervalMillis; +int messageCount; +try { + JsonNode config = objectMapper.readTree(pTransform.getSpec().getPayload().toByteArray()); + intervalMillis = config.path("interval_ms").asInt(100); + messageCount = config.path("message_count").asInt(0); +} catch (IOException e) { +throw new RuntimeException("Failed to parse configuration for streaming impulse", e); +} + +DataStreamSource> source = +context +.getExecutionEnvironment() +.addSource( +new RichParallelSourceFunction>() { + private AtomicBoolean cancelled = new AtomicBoolean(false); + private AtomicLong count = new AtomicLong(); + + @Override + public void run(SourceContext> ctx) throws Exception { +while (!cancelled.get() && (messageCount == 0 || count.getAndIncrement() < messageCount)) { + ctx.collect(WindowedValue.valueInGlobalWindow(new byte[] {})); + Thread.sleep(intervalMillis); Review comment: This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154077) Time Spent: 2h 20m (was: 2h 10m) > Add a portable Flink streaming synthetic source for testing > --- > > Key: BEAM-5707 > URL: https://issues.apache.org/jira/browse/BEAM-5707 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Micah Wylde >Assignee: Aljoscha Krettek >Priority: Minor > Time Spent: 2h 20m > Remaining Estimate: 0h > > Currently there are no built-in streaming sources for portable pipelines. > This makes it hard to test streaming functionality in the Python SDK. > It would be very useful to add a periodic impulse source that (with some > configurable frequency) outputs an empty byte array, which can then be > transformed as desired inside the python pipeline. More context in this > [mailing list > discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5744) Investigate negative numbers represented as 'long' in Python SDK + Direct runner
[ https://issues.apache.org/jira/browse/BEAM-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648659#comment-16648659 ] Valentyn Tymofieiev commented on BEAM-5744: --- It appears that sometimes negative numbers are sometimes represented as `long`, which I observed in https://github.com/apache/beam/pull/6602#issuecomment-429468709, opened this issue to investigate if this is a regression in 2.8.0, since such behavior would be incompatible with https://github.com/apache/beam/pull/6602 on Python 2, and https://github.com/apache/beam/pull/6602 is included in 2.8.0 release. > Investigate negative numbers represented as 'long' in Python SDK + Direct > runner > > > Key: BEAM-5744 > URL: https://issues.apache.org/jira/browse/BEAM-5744 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: 2.8.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()
[ https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154075=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154075 ] ASF GitHub Bot logged work on BEAM-5621: Author: ASF GitHub Bot Created on: 13/Oct/18 00:54 Start Date: 13/Oct/18 00:54 Worklog Time Spent: 10m Work Description: tvalentyn edited a comment on issue #6602: [BEAM-5621] Fix unorderable types in python 3 URL: https://github.com/apache/beam/pull/6602#issuecomment-429497899 I was not able to reproduce the issue where negative numbers are represented as `long` type I mentioned in https://github.com/apache/beam/pull/6602#issuecomment-429468709 using a fresh installation of apache-beam==2.7.0, so I instead opened https://issues.apache.org/jira/browse/BEAM-5744, to investigate if this is a regression in 2.8.0. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154075) Time Spent: 3h (was: 2h 50m) > Several tests fail on Python 3 with TypeError: unorderable types: str() < > int() > --- > > Key: BEAM-5621 > URL: https://issues.apache.org/jira/browse/BEAM-5621 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 3h > Remaining Estimate: 0h > > == > ERROR: test_remove_duplicates > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 677, in process > self.do_fn_invoker.invoke_process(windowed_value) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 414, in invoke_process > windowed_value, self.process_method(windowed_value.value)) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1068, in > wrapper = lambda x: [fn(x)] > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py", > line 115, in _equal > sorted_expected = sorted(expected) > TypeError: unorderable types: str() < int() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing
[ https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=154074=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154074 ] ASF GitHub Bot logged work on BEAM-5707: Author: ASF GitHub Bot Created on: 13/Oct/18 00:54 Start Date: 13/Oct/18 00:54 Worklog Time Spent: 10m Work Description: mwylde commented on a change in pull request #6637: [BEAM-5707] Add a periodic, streaming impulse source for Flink portable pipelines URL: https://github.com/apache/beam/pull/6637#discussion_r224944203 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java ## @@ -406,6 +417,56 @@ private void translateImpulse( context.addDataStream(Iterables.getOnlyElement(pTransform.getOutputsMap().values()), source); } + @AutoService(NativeTransforms.IsNativeTransform.class) + public static class IsFlinkNativeTransform implements NativeTransforms.IsNativeTransform { +@Override +public boolean test(RunnerApi.PTransform pTransform) { + return STREAMING_IMPULSE_TRANSFORM_URL.equals(PTransformTranslation.urnForTransformOrNull(pTransform)); +} + } + + private void translateStreamingImpulse( + String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext context) { +RunnerApi.PTransform pTransform = pipeline.getComponents().getTransformsOrThrow(id); + +ObjectMapper objectMapper = new ObjectMapper(); + +int intervalMillis; +int messageCount; +try { + JsonNode config = objectMapper.readTree(pTransform.getSpec().getPayload().toByteArray()); + intervalMillis = config.path("interval_ms").asInt(100); + messageCount = config.path("message_count").asInt(0); +} catch (IOException e) { +throw new RuntimeException("Failed to parse configuration for streaming impulse", e); +} + +DataStreamSource> source = +context +.getExecutionEnvironment() +.addSource( +new RichParallelSourceFunction>() { + private AtomicBoolean cancelled = new AtomicBoolean(false); + private AtomicLong count = new AtomicLong(); Review comment: Good to know that this will never be called concurrently. I'll change this to a long. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154074) Time Spent: 2h 10m (was: 2h) > Add a portable Flink streaming synthetic source for testing > --- > > Key: BEAM-5707 > URL: https://issues.apache.org/jira/browse/BEAM-5707 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Micah Wylde >Assignee: Aljoscha Krettek >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently there are no built-in streaming sources for portable pipelines. > This makes it hard to test streaming functionality in the Python SDK. > It would be very useful to add a periodic impulse source that (with some > configurable frequency) outputs an empty byte array, which can then be > transformed as desired inside the python pipeline. More context in this > [mailing list > discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()
[ https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154073=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154073 ] ASF GitHub Bot logged work on BEAM-5621: Author: ASF GitHub Bot Created on: 13/Oct/18 00:53 Start Date: 13/Oct/18 00:53 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #6602: [BEAM-5621] Fix unorderable types in python 3 URL: https://github.com/apache/beam/pull/6602#issuecomment-429497899 I was not able to reproduce https://github.com/apache/beam/pull/6602#issuecomment-429468709 using a fresh installation of apache-beam==2.7.0, so I instead opened https://issues.apache.org/jira/browse/BEAM-5744, to investigate if this is a regression in 2.8.0. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154073) Time Spent: 2h 50m (was: 2h 40m) > Several tests fail on Python 3 with TypeError: unorderable types: str() < > int() > --- > > Key: BEAM-5621 > URL: https://issues.apache.org/jira/browse/BEAM-5621 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 2h 50m > Remaining Estimate: 0h > > == > ERROR: test_remove_duplicates > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 677, in process > self.do_fn_invoker.invoke_process(windowed_value) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 414, in invoke_process > windowed_value, self.process_method(windowed_value.value)) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1068, in > wrapper = lambda x: [fn(x)] > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py", > line 115, in _equal > sorted_expected = sorted(expected) > TypeError: unorderable types: str() < int() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5744) Investigate negative numbers represented as 'long' in Python SDK + Direct runner
[ https://issues.apache.org/jira/browse/BEAM-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-5744: -- Summary: Investigate negative numbers represented as 'long' in Python SDK + Direct runner (was: Investigate negative numbers investigated as 'long' in Python SDK + Direct runner) > Investigate negative numbers represented as 'long' in Python SDK + Direct > runner > > > Key: BEAM-5744 > URL: https://issues.apache.org/jira/browse/BEAM-5744 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: 2.8.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5744) Investigate negative numbers investigated as 'long' in Python SDK + Direct runner
Valentyn Tymofieiev created BEAM-5744: - Summary: Investigate negative numbers investigated as 'long' in Python SDK + Direct runner Key: BEAM-5744 URL: https://issues.apache.org/jira/browse/BEAM-5744 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Valentyn Tymofieiev Assignee: Ahmet Altay Fix For: 2.8.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing
[ https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=154072=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154072 ] ASF GitHub Bot logged work on BEAM-5707: Author: ASF GitHub Bot Created on: 13/Oct/18 00:51 Start Date: 13/Oct/18 00:51 Worklog Time Spent: 10m Work Description: mwylde commented on a change in pull request #6637: [BEAM-5707] Add a periodic, streaming impulse source for Flink portable pipelines URL: https://github.com/apache/beam/pull/6637#discussion_r224944097 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java ## @@ -406,6 +417,56 @@ private void translateImpulse( context.addDataStream(Iterables.getOnlyElement(pTransform.getOutputsMap().values()), source); } + @AutoService(NativeTransforms.IsNativeTransform.class) + public static class IsFlinkNativeTransform implements NativeTransforms.IsNativeTransform { +@Override +public boolean test(RunnerApi.PTransform pTransform) { + return STREAMING_IMPULSE_TRANSFORM_URL.equals(PTransformTranslation.urnForTransformOrNull(pTransform)); +} + } + + private void translateStreamingImpulse( + String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext context) { +RunnerApi.PTransform pTransform = pipeline.getComponents().getTransformsOrThrow(id); + +ObjectMapper objectMapper = new ObjectMapper(); + +int intervalMillis; +int messageCount; +try { + JsonNode config = objectMapper.readTree(pTransform.getSpec().getPayload().toByteArray()); + intervalMillis = config.path("interval_ms").asInt(100); + messageCount = config.path("message_count").asInt(0); +} catch (IOException e) { +throw new RuntimeException("Failed to parse configuration for streaming impulse", e); +} + +DataStreamSource> source = +context +.getExecutionEnvironment() +.addSource( +new RichParallelSourceFunction>() { Review comment: I've moved it to org.apache.beam.runners.flink.translation.wrappers.streaming.io This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154072) Time Spent: 2h (was: 1h 50m) > Add a portable Flink streaming synthetic source for testing > --- > > Key: BEAM-5707 > URL: https://issues.apache.org/jira/browse/BEAM-5707 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Micah Wylde >Assignee: Aljoscha Krettek >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > Currently there are no built-in streaming sources for portable pipelines. > This makes it hard to test streaming functionality in the Python SDK. > It would be very useful to add a periodic impulse source that (with some > configurable frequency) outputs an empty byte array, which can then be > transformed as desired inside the python pipeline. More context in this > [mailing list > discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5744) Investigate negative numbers investigated as 'long' in Python SDK + Direct runner
[ https://issues.apache.org/jira/browse/BEAM-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev reassigned BEAM-5744: - Assignee: Valentyn Tymofieiev (was: Ahmet Altay) > Investigate negative numbers investigated as 'long' in Python SDK + Direct > runner > - > > Key: BEAM-5744 > URL: https://issues.apache.org/jira/browse/BEAM-5744 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: 2.8.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()
[ https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154071=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154071 ] ASF GitHub Bot logged work on BEAM-5621: Author: ASF GitHub Bot Created on: 13/Oct/18 00:49 Start Date: 13/Oct/18 00:49 Worklog Time Spent: 10m Work Description: tvalentyn edited a comment on issue #6602: [BEAM-5621] Fix unorderable types in python 3 URL: https://github.com/apache/beam/pull/6602#issuecomment-429478105 Unfortunately I think https://github.com/apache/beam/pull/6602#issuecomment-429468709 may be a very unintuitive change, so we need to roll it back and either fix the underlying issue with typing of negative numbers or proceed with a different solution here. We would need to cherry-pick the change into the release branch, so I'll mark BEAM-5621 as release blocker until cherry-pick is in. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154071) Time Spent: 2h 40m (was: 2.5h) > Several tests fail on Python 3 with TypeError: unorderable types: str() < > int() > --- > > Key: BEAM-5621 > URL: https://issues.apache.org/jira/browse/BEAM-5621 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 2h 40m > Remaining Estimate: 0h > > == > ERROR: test_remove_duplicates > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 677, in process > self.do_fn_invoker.invoke_process(windowed_value) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 414, in invoke_process > windowed_value, self.process_method(windowed_value.value)) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1068, in > wrapper = lambda x: [fn(x)] > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py", > line 115, in _equal > sorted_expected = sorted(expected) > TypeError: unorderable types: str() < int() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5615) Several tests fail on Python 3 with TypeError: 'cmp' is an invalid keyword argument for this function
[ https://issues.apache.org/jira/browse/BEAM-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-5615: -- Affects Version/s: (was: 2.8.0) > Several tests fail on Python 3 with TypeError: 'cmp' is an invalid keyword > argument for this function > - > > Key: BEAM-5615 > URL: https://issues.apache.org/jira/browse/BEAM-5615 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 3.5h > Remaining Estimate: 0h > > ERROR: test_top (apache_beam.transforms.combiners_test.CombineTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners_test.py", > line 89, in test_top > names) # Note parameter passed to comparator. > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pvalue.py", > line 111, in __or__ > return self.pipeline.apply(ptransform, self) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py", > line 467, in apply > label or transform.label) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py", > line 477, in apply > return self.apply(transform, pvalueish) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py", > line 513, in apply > pvalueish_result = self.runner.apply(transform, pvalueish) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/ptransform.py", > line 759, in expand > return self._fn(pcoll, *args, **kwargs) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners.py", > line 185, in Of > TopCombineFn(n, compare, key, reverse), *args, **kwargs) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pvalue.py", > line 111, in __or__ > return self.pipeline.apply(ptransform, self) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py", > line 513, in apply > pvalueish_result = self.runner.apply(transform, pvalueish) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1251, in expand > default_value = combine_fn.apply([], *self.args, **self.kwargs) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 623, in apply > *args, **kwargs) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners.py", > line 362, in extract_output > self._sort_buffer(buffer, lt) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners.py", > line 295, in _sort_buffer > key=self._key_fn) > TypeError: 'cmp' is an invalid keyword argument for this function -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5714) RedisIO emit error of EXEC without MULTI
[ https://issues.apache.org/jira/browse/BEAM-5714?focusedWorklogId=154070=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154070 ] ASF GitHub Bot logged work on BEAM-5714: Author: ASF GitHub Bot Created on: 13/Oct/18 00:47 Start Date: 13/Oct/18 00:47 Worklog Time Spent: 10m Work Description: aaltay commented on issue #6651: [BEAM-5714] Fix RedisIO EXEC without MULTI error URL: https://github.com/apache/beam/pull/6651#issuecomment-429497451 cc: @vvarma might have an opinion This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154070) Time Spent: 40m (was: 0.5h) > RedisIO emit error of EXEC without MULTI > > > Key: BEAM-5714 > URL: https://issues.apache.org/jira/browse/BEAM-5714 > Project: Beam > Issue Type: Bug > Components: io-java-redis >Affects Versions: 2.7.0 >Reporter: K.K. POON >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > RedisIO has EXEC without MULTI error after SET a batch of records. > > By looking at the source code, I guess there is missing `pipeline.multi();` > after exec() the last batch. > [https://github.com/apache/beam/blob/master/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java#L555] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing
[ https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=154069=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154069 ] ASF GitHub Bot logged work on BEAM-5707: Author: ASF GitHub Bot Created on: 13/Oct/18 00:42 Start Date: 13/Oct/18 00:42 Worklog Time Spent: 10m Work Description: mwylde commented on a change in pull request #6637: [BEAM-5707] Add a periodic, streaming impulse source for Flink portable pipelines URL: https://github.com/apache/beam/pull/6637#discussion_r224943673 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java ## @@ -63,6 +64,8 @@ public static final String GROUP_BY_KEY_TRANSFORM_URN = getUrn(StandardPTransforms.Primitives.GROUP_BY_KEY); public static final String IMPULSE_TRANSFORM_URN = getUrn(StandardPTransforms.Primitives.IMPULSE); + public static final String STREAMING_IMPULSE_TRANSFORM_URL = "flink:transform:streaming_impulse:v1"; Review comment: moved to FlinkStreamingPortablePipelineTranslator This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154069) Time Spent: 1h 50m (was: 1h 40m) > Add a portable Flink streaming synthetic source for testing > --- > > Key: BEAM-5707 > URL: https://issues.apache.org/jira/browse/BEAM-5707 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Micah Wylde >Assignee: Aljoscha Krettek >Priority: Minor > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently there are no built-in streaming sources for portable pipelines. > This makes it hard to test streaming functionality in the Python SDK. > It would be very useful to add a periodic impulse source that (with some > configurable frequency) outputs an empty byte array, which can then be > transformed as desired inside the python pipeline. More context in this > [mailing list > discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (BEAM-5539) Beam Dependency Update Request: google-cloud-pubsub
[ https://issues.apache.org/jira/browse/BEAM-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri reopened BEAM-5539: - PR was rolled back > Beam Dependency Update Request: google-cloud-pubsub > --- > > Key: BEAM-5539 > URL: https://issues.apache.org/jira/browse/BEAM-5539 > Project: Beam > Issue Type: Bug > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Udi Meiri >Priority: Major > Fix For: Not applicable > > > - 2018-10-01 19:17:59.633423 > - > Please consider upgrading the dependency google-cloud-pubsub. > The current version is 0.26.0. The latest version is 0.38.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-08 12:11:22.339342 > - > Please consider upgrading the dependency google-cloud-pubsub. > The current version is 0.35.4. The latest version is 0.38.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=154064=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154064 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 12/Oct/18 23:58 Start Date: 12/Oct/18 23:58 Worklog Time Spent: 10m Work Description: pabloem closed pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/model/fn-execution/src/main/proto/beam_fn_api.proto b/model/fn-execution/src/main/proto/beam_fn_api.proto index a0795a7c285..915686de6b3 100644 --- a/model/fn-execution/src/main/proto/beam_fn_api.proto +++ b/model/fn-execution/src/main/proto/beam_fn_api.proto @@ -40,6 +40,7 @@ option java_outer_classname = "BeamFnApi"; import "beam_runner_api.proto"; import "endpoints.proto"; +import "google/protobuf/descriptor.proto"; import "google/protobuf/timestamp.proto"; import "google/protobuf/wrappers.proto"; @@ -250,11 +251,16 @@ message ProcessBundleRequest { message ProcessBundleResponse { // (Optional) If metrics reporting is supported by the SDK, this represents // the final metrics to record for this bundle. + // DEPRECATED Metrics metrics = 1; // (Optional) Specifies that the bundle has been split since the last // ProcessBundleProgressResponse was sent. BundleSplit split = 2; + + // (Required) The list of metrics or other MonitoredState + // collected while processing this bundle. + repeated MonitoringInfo monitoring_infos = 3; } // A request to report progress information for a given bundle. @@ -275,9 +281,9 @@ message MonitoringInfo { // Sub types like field formats - int64, double, string. // Aggregation methods - SUM, LATEST, TOP-N, BOTTOM-N, DISTRIBUTION // valid values are: - // beam:metrics:[SumInt64|LatestInt64|Top-NInt64|Bottom-NInt64| - // SumDouble|LatestDouble|Top-NDouble|Bottom-NDouble|DistributionInt64| - // DistributionDouble|MonitoringDataTable] + // beam:metrics:[sum_int_64|latest_int_64|top_n_int_64|bottom_n_int_64| + // sum_double|latest_double|top_n_double|bottom_n_double| + // distribution_int_64|distribution_double|monitoring_data_table string type = 2; // The Metric or monitored state. @@ -302,6 +308,45 @@ message MonitoringInfo { // Some systems such as Stackdriver will be able to aggregate the metrics // using a subset of the provided labels map labels = 5; + + // The walltime of the most recent update. + // Useful for aggregation for Latest types such as LatestInt64. + google.protobuf.Timestamp timestamp = 6; +} + +message MonitoringInfoUrns { + enum Enum { +USER_COUNTER_URN_PREFIX = 0 [(org.apache.beam.model.pipeline.v1.beam_urn) = +"beam:metric:user"]; + +ELEMENT_COUNT = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) = +"beam:metric:element_count:v1"]; + +START_BUNDLE_MSECS = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) = + "beam:metric:pardo_execution_time:start_bundle_msecs:v1"]; + +PROCESS_BUNDLE_MSECS = 3 [(org.apache.beam.model.pipeline.v1.beam_urn) = +"beam:metric:pardo_execution_time:process_bundle_msecs:v1"]; + +FINISH_BUNDLE_MSECS = 4 [(org.apache.beam.model.pipeline.v1.beam_urn) = +"beam:metric:pardo_execution_time:finish_bundle_msecs:v1"]; + +TOTAL_MSECS = 5 [(org.apache.beam.model.pipeline.v1.beam_urn) = +"beam:metric:ptransform_execution_time:total_msecs:v1"]; + } +} + +message MonitoringInfoTypeUrns { + enum Enum { +SUM_INT64_TYPE = 0 [(org.apache.beam.model.pipeline.v1.beam_urn) = +"beam:metrics:sum_int_64"]; + +DISTRIBUTION_INT64_TYPE = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) = +"beam:metrics:distribution_int_64"]; + +LATEST_INT64_TYPE = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) = + "beam:metrics:latest_int_64"]; + } } message Metric { @@ -525,12 +570,16 @@ message Metrics { } message ProcessBundleProgressResponse { - // (Required) + // DEPRECATED (Required) Metrics metrics = 1; // (Optional) Specifies that the bundle has been split since the last // ProcessBundleProgressResponse was sent. BundleSplit split = 2; + + // (Required) The list of metrics or other MonitoredState + // collected while processing this bundle. + repeated MonitoringInfo monitoring_infos = 3; } message ProcessBundleSplitRequest { @@ -795,7 +844,6 @@ m
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=154063=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154063 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 12/Oct/18 23:58 Start Date: 12/Oct/18 23:58 Worklog Time Spent: 10m Work Description: pabloem commented on issue #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#issuecomment-429492504 Okay, as this looks good, I'll go ahead and merge. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154063) Time Spent: 8h 20m (was: 8h 10m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 8h 20m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154062=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154062 ] ASF GitHub Bot logged work on BEAM-5636: Author: ASF GitHub Bot Created on: 12/Oct/18 23:54 Start Date: 12/Oct/18 23:54 Worklog Time Spent: 10m Work Description: herohde commented on issue #6665: [BEAM-5636] Java support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6665#issuecomment-429492075 @boyuanzz Btw, please also update the doc describing this change to include Java. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154062) Time Spent: 1h 10m (was: 1h) > Java support for custom dataflow worker jar > --- > > Key: BEAM-5636 > URL: https://issues.apache.org/jira/browse/BEAM-5636 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Java jobs. That requires a change to the Java boot > code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5442) PortableRunner swallows custom options for Runner
[ https://issues.apache.org/jira/browse/BEAM-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Weise updated BEAM-5442: --- Fix Version/s: (was: 2.8.0) > PortableRunner swallows custom options for Runner > - > > Key: BEAM-5442 > URL: https://issues.apache.org/jira/browse/BEAM-5442 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Time Spent: 9h 40m > Remaining Estimate: 0h > > The PortableRunner doesn't pass custom PipelineOptions to the executing > Runner. > Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner. > (The option is just removed during proto translation without any warning) > We should allow some form of customization through the options, even for the > PortableRunner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (BEAM-5442) PortableRunner swallows custom options for Runner
[ https://issues.apache.org/jira/browse/BEAM-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Weise reopened BEAM-5442: > PortableRunner swallows custom options for Runner > - > > Key: BEAM-5442 > URL: https://issues.apache.org/jira/browse/BEAM-5442 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Time Spent: 9h 40m > Remaining Estimate: 0h > > The PortableRunner doesn't pass custom PipelineOptions to the executing > Runner. > Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner. > (The option is just removed during proto translation without any warning) > We should allow some form of customization through the options, even for the > PortableRunner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154061=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154061 ] ASF GitHub Bot logged work on BEAM-5636: Author: ASF GitHub Bot Created on: 12/Oct/18 23:40 Start Date: 12/Oct/18 23:40 Worklog Time Spent: 10m Work Description: herohde commented on a change in pull request #6665: [BEAM-5636] Java support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6665#discussion_r224939002 ## File path: sdks/java/container/boot.go ## @@ -103,7 +103,17 @@ func main() { filepath.Join(jarsDir, "slf4j-jdk14.jar"), filepath.Join(jarsDir, "beam-sdks-java-harness.jar"), } + + var hasWorkerExperiment = strings.Contains(options, "use_staged_dataflow_worker_jar") for _, md := range artifacts { + if hasWorkerExperiment { + if strings.HasPrefix(md.Name, "beam-runners-google-cloud-dataflow-java-fn-api-worker") { + continue + } + if strings.HasPrefix(md.Name, "dataflow-worker.jar") { Review comment: Small comment: this can be == instead. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154061) Time Spent: 1h (was: 50m) > Java support for custom dataflow worker jar > --- > > Key: BEAM-5636 > URL: https://issues.apache.org/jira/browse/BEAM-5636 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Java jobs. That requires a change to the Java boot > code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3587) User reports TextIO failure in FlinkRunner on master
[ https://issues.apache.org/jira/browse/BEAM-3587?focusedWorklogId=154059=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154059 ] ASF GitHub Bot logged work on BEAM-3587: Author: ASF GitHub Bot Created on: 12/Oct/18 23:36 Start Date: 12/Oct/18 23:36 Worklog Time Spent: 10m Work Description: swegner closed pull request #384: [BEAM-3587] Add a note to Gradle shadowJar for merge service files URL: https://github.com/apache/beam-site/pull/384 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/src/documentation/runners/flink.md b/src/documentation/runners/flink.md index 6dc6e7b69d..df64ba46f9 100644 --- a/src/documentation/runners/flink.md +++ b/src/documentation/runners/flink.md @@ -51,7 +51,7 @@ For more information, the [Flink Documentation](https://ci.apache.org/projects/f ```java org.apache.beam - beam-runners-flink_2.10 + beam-runners-flink_2.11 {{ site.release_latest }} runtime @@ -81,6 +81,62 @@ $ mvn exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \ If you have a Flink `JobManager` running on your local machine you can give `localhost:6123` for `flinkMaster`. +Behind the hood, to create your shaded jar (containing your pipeline and the Flink runner dependencies), you have to use the `maven-shade-plugin`: + +```java + +org.apache.beam +beam-runners-flink_2.10 +{{ site.release_latest }} + +``` + +```java + +org.apache.maven.plugins +maven-shade-plugin +${maven-shade-plugin.version} + + false + + +*:* + +META-INF/*.SF +META-INF/*.DSA +META-INF/*.RSA + + + + + + +package + +shade + + + true +shaded + + + + + + + +``` + +Then, Maven build will create the shaded jar. + +If you prefer to use Gradle, you can achieve the same using `shadowJar`: + +```java +shadowJar { +mergeServiceFiles() +} +``` + ## Pipeline options for the Flink Runner When executing your pipeline with the Flink Runner, you can set these pipeline options. diff --git a/src/documentation/runners/spark.md b/src/documentation/runners/spark.md index 1502f242c0..4b4479e0e3 100644 --- a/src/documentation/runners/spark.md +++ b/src/documentation/runners/spark.md @@ -37,7 +37,7 @@ You can add a dependency on the latest version of the Spark runner by adding to ### Deploying Spark with your application -In some cases, such as running in local mode/Standalone, your (self-contained) application would be required to pack Spark by explicitly adding the following dependencies in your pom.xml: +Most of the time (running in local mode/Standalone or using `spark-submit`), your (self-contained) application would be required to pack Spark by explicitly adding the following dependencies in your pom.xml: ```java org.apache.spark @@ -94,6 +94,17 @@ After running mvn package, run ls target and you shoul beam-examples-1.0.0-shaded.jar ``` +If you are using gradle, you have to use `shadowJar` to create the shaded jar enabling `mergeServiceFiles()`: +```java +shadowJar { +transform(AppendingTransformer) { +resource = 'reference.conf' +} +relocate 'com.google.protobuf', 'shaded.protobuf' +mergeServiceFiles() +} +``` + To run against a Standalone cluster simply run: ``` spark-submit --class com.beam.examples.BeamPipeline --master spark://HOST:PORT target/beam-examples-1.0.0-shaded.jar --runner=SparkRunner This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154059) Time Spent: 1h 20m (was: 1h 10m) > User reports TextIO failure in FlinkRun
[jira] [Work logged] (BEAM-3587) User reports TextIO failure in FlinkRunner on master
[ https://issues.apache.org/jira/browse/BEAM-3587?focusedWorklogId=154058=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154058 ] ASF GitHub Bot logged work on BEAM-3587: Author: ASF GitHub Bot Created on: 12/Oct/18 23:36 Start Date: 12/Oct/18 23:36 Worklog Time Spent: 10m Work Description: swegner commented on issue #384: [BEAM-3587] Add a note to Gradle shadowJar for merge service files URL: https://github.com/apache/beam-site/pull/384#issuecomment-429489867 I've migrated the changes for this pull request onto the migrated website sources in the `apache/beam` repository: https://github.com/swegner/beam/tree/migrated-pr-384 To pull the migrated changes into your local git client, run: ``` git remote add swegner g...@github.com:swegner/beam.git && git fetch swegner git checkout -B bigqueryio swegner/migrated-pr-384 ``` You can then push the changes to your own branch and [open a new pull request](https://github.com/apache/beam/compare?expand=1) against the apache/beam repository. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154058) Time Spent: 1h 10m (was: 1h) > User reports TextIO failure in FlinkRunner on master > > > Key: BEAM-3587 > URL: https://issues.apache.org/jira/browse/BEAM-3587 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Kenneth Knowles >Assignee: Jean-Baptiste Onofré >Priority: Minor > Fix For: Not applicable > > Attachments: screen1.png, screen2.png > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Reported here: > [https://lists.apache.org/thread.html/47b16c94032392782505415e010970fd2a9480891c55c2f7b5de92bd@%3Cuser.beam.apache.org%3E] > "I'm trying to run a pipeline containing just a TextIO.read() step on a Flink > cluster, using the latest Beam git revision (ff37337). The job fails to start > with the Exception: > {{java.lang.UnsupportedOperationException: The transform is currently not > supported.}} > It does work with Beam 2.2.0 though. All code, logs, and reproduction steps > [https://github.com/pelletier/beam-flink-example]; > My initial thoughts: I have a guess that this has to do with switching to > running from a portable pipeline representation, and it looks like there's a > non-composite transform with an empty URN and it threw a bad error message. > We can try to root cause but may also mitigate short-term by removing the > round-trip through pipeline proto for now. > What is curious is that the ValidatesRunner and WordCountIT are working - > they only run on a local Flink, yet this seems to be a translation issue that > would occur for local or distributed runs. > We need to certainly run this repro on the RC if we don't totally get to the > bottom of it quickly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-1081) annotations should support custom messages and classes
[ https://issues.apache.org/jira/browse/BEAM-1081?focusedWorklogId=154060=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154060 ] ASF GitHub Bot logged work on BEAM-1081: Author: ASF GitHub Bot Created on: 12/Oct/18 23:36 Start Date: 12/Oct/18 23:36 Worklog Time Spent: 10m Work Description: aaltay commented on issue #6670: [BEAM-1081] Annotations custom message support and classes tests. URL: https://github.com/apache/beam/pull/6670#issuecomment-429489882 @jglezt it looks like there are test issues, could you look at those? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154060) Time Spent: 50m (was: 40m) > annotations should support custom messages and classes > -- > > Key: BEAM-1081 > URL: https://issues.apache.org/jira/browse/BEAM-1081 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Minor > Labels: newbie, starter > Time Spent: 50m > Remaining Estimate: 0h > > Update > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/utils/annotations.py > to add 2 new features: > 1. ability to customize message > 2. ability to tag classes (not only functions) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154057=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154057 ] ASF GitHub Bot logged work on BEAM-5636: Author: ASF GitHub Bot Created on: 12/Oct/18 23:35 Start Date: 12/Oct/18 23:35 Worklog Time Spent: 10m Work Description: boyuanzz commented on a change in pull request #6665: [BEAM-5636] Java support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6665#discussion_r224938513 ## File path: sdks/java/container/boot.go ## @@ -103,7 +103,17 @@ func main() { filepath.Join(jarsDir, "slf4j-jdk14.jar"), filepath.Join(jarsDir, "beam-sdks-java-harness.jar"), } + + var has_worker_experiment = strings.Contains(options, "use_staged_dataflow_worker_jar") Review comment: Fixed, thanks~ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154057) Time Spent: 50m (was: 40m) > Java support for custom dataflow worker jar > --- > > Key: BEAM-5636 > URL: https://issues.apache.org/jira/browse/BEAM-5636 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Java jobs. That requires a change to the Java boot > code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154055=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154055 ] ASF GitHub Bot logged work on BEAM-5636: Author: ASF GitHub Bot Created on: 12/Oct/18 23:17 Start Date: 12/Oct/18 23:17 Worklog Time Spent: 10m Work Description: herohde commented on a change in pull request #6665: [BEAM-5636] Java support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6665#discussion_r224936531 ## File path: sdks/java/container/boot.go ## @@ -103,7 +103,17 @@ func main() { filepath.Join(jarsDir, "slf4j-jdk14.jar"), filepath.Join(jarsDir, "beam-sdks-java-harness.jar"), } + + var has_worker_experiment = strings.Contains(options, "use_staged_dataflow_worker_jar") Review comment: nit: go uses camelCase for local variables. You also need to run "gofmt -w ." to fix the indentation. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154055) Time Spent: 40m (was: 0.5h) > Java support for custom dataflow worker jar > --- > > Key: BEAM-5636 > URL: https://issues.apache.org/jira/browse/BEAM-5636 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Java jobs. That requires a change to the Java boot > code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5735) Contributor Guide Improvements
[ https://issues.apache.org/jira/browse/BEAM-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648557#comment-16648557 ] Scott Wegner commented on BEAM-5735: I made some small boilerplate changes while we were discussing these in person. It's not nearly ready to check-in, but in case it's useful: https://github.com/swegner/beam/commit/06aee6b3903020e04b7ce4000ba71be525e04721 > Contributor Guide Improvements > -- > > Key: BEAM-5735 > URL: https://issues.apache.org/jira/browse/BEAM-5735 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Scott Wegner >Assignee: Alan Myrvold >Priority: Major > > This is a wish-list for improvements to the Beam contributor guide. > Many thanks to [~rohdesam] for the feedback which helped shape this list. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5708) Support caching of SDKHarness environments in flink
[ https://issues.apache.org/jira/browse/BEAM-5708?focusedWorklogId=154054=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154054 ] ASF GitHub Bot logged work on BEAM-5708: Author: ASF GitHub Bot Created on: 12/Oct/18 23:07 Start Date: 12/Oct/18 23:07 Worklog Time Spent: 10m Work Description: tweise commented on issue #6638: [BEAM-5708] Cache environment in portable flink runner URL: https://github.com/apache/beam/pull/6638#issuecomment-429485973 @angoenka did you test the fallback case? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154054) Time Spent: 1h 50m (was: 1h 40m) > Support caching of SDKHarness environments in flink > --- > > Key: BEAM-5708 > URL: https://issues.apache.org/jira/browse/BEAM-5708 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Cache and reuse environment to improve performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5740) Refactor permissions section into bullet-points
[ https://issues.apache.org/jira/browse/BEAM-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner reassigned BEAM-5740: -- Assignee: Scott Wegner > Refactor permissions section into bullet-points > --- > > Key: BEAM-5740 > URL: https://issues.apache.org/jira/browse/BEAM-5740 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Major > > The permissions section has good content, but it's not easily browseable if > you're looking for a specific thing (i.e. Slack permissions). We should > refactor it into bullet points. > For permissions that require reaching out via email/Slack, we should link to > some previous example. It lowers the barrier to entry if a new contributor > can copy/paste some existing template. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5743) Move "works in progress" out of getting started guide
[ https://issues.apache.org/jira/browse/BEAM-5743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner reassigned BEAM-5743: -- Assignee: Scott Wegner > Move "works in progress" out of getting started guide > - > > Key: BEAM-5743 > URL: https://issues.apache.org/jira/browse/BEAM-5743 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Major > > These aren't relevant to a new contributor, and it makes the contributor > guide less focused. They should be moved to a separate page. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5741) Move "Contact Us" to a top-level link
[ https://issues.apache.org/jira/browse/BEAM-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner reassigned BEAM-5741: -- Assignee: (was: Melissa Pashniak) > Move "Contact Us" to a top-level link > - > > Key: BEAM-5741 > URL: https://issues.apache.org/jira/browse/BEAM-5741 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Scott Wegner >Priority: Major > > It should be very easy to figure out how to get in touch with community. > "Contact Us" should be a top-level link on the page. > The page can also be improved with: > * Some basic text on how to use subscribe / unsubscribe links > * Recommendations on how to use various communications channels (Slack for > quick questions, dev@ for longer conversations. And all decisions should make > it back to dev@) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5740) Refactor permissions section into bullet-points
[ https://issues.apache.org/jira/browse/BEAM-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner updated BEAM-5740: --- Description: The permissions section has good content, but it's not easily browseable if you're looking for a specific thing (i.e. Slack permissions). We should refactor it into bullet points. For permissions that require reaching out via email/Slack, we should link to some previous example. It lowers the barrier to entry if a new contributor can copy/paste some existing template. was:The permissions section has good content, but it's not easily browseable if you're looking for a specific thing (i.e. Slack permissions). We should refactor it into bullet points. > Refactor permissions section into bullet-points > --- > > Key: BEAM-5740 > URL: https://issues.apache.org/jira/browse/BEAM-5740 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Scott Wegner >Priority: Major > > The permissions section has good content, but it's not easily browseable if > you're looking for a specific thing (i.e. Slack permissions). We should > refactor it into bullet points. > For permissions that require reaching out via email/Slack, we should link to > some previous example. It lowers the barrier to entry if a new contributor > can copy/paste some existing template. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5743) Move "works in progress" out of getting started guide
Scott Wegner created BEAM-5743: -- Summary: Move "works in progress" out of getting started guide Key: BEAM-5743 URL: https://issues.apache.org/jira/browse/BEAM-5743 Project: Beam Issue Type: Sub-task Components: website Reporter: Scott Wegner These aren't relevant to a new contributor, and it makes the contributor guide less focused. They should be moved to a separate page. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5742) Add expectations for code reviews
Scott Wegner created BEAM-5742: -- Summary: Add expectations for code reviews Key: BEAM-5742 URL: https://issues.apache.org/jira/browse/BEAM-5742 Project: Beam Issue Type: Sub-task Components: website Reporter: Scott Wegner There should be a dedicated section about how we to code reviews, and expectations for contributors / reviewers. Things like: * PR's should have linked JIRA * Code changes should also include tests * Small changes are easier to review * How to find a reviewer, when you can expect reviewer to engage * How automatic test runs work (PreCommits run by on each commit; build / test should be passing before asking for a reviewer) * How to re-run tests, and how/when to run Post-Commits -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5741) Move "Contact Us" to a top-level link
Scott Wegner created BEAM-5741: -- Summary: Move "Contact Us" to a top-level link Key: BEAM-5741 URL: https://issues.apache.org/jira/browse/BEAM-5741 Project: Beam Issue Type: Sub-task Components: website Reporter: Scott Wegner Assignee: Melissa Pashniak It should be very easy to figure out how to get in touch with community. "Contact Us" should be a top-level link on the page. The page can also be improved with: * Some basic text on how to use subscribe / unsubscribe links * Recommendations on how to use various communications channels (Slack for quick questions, dev@ for longer conversations. And all decisions should make it back to dev@) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5740) Refactor permissions section into bullet-points
Scott Wegner created BEAM-5740: -- Summary: Refactor permissions section into bullet-points Key: BEAM-5740 URL: https://issues.apache.org/jira/browse/BEAM-5740 Project: Beam Issue Type: Sub-task Components: website Reporter: Scott Wegner The permissions section has good content, but it's not easily browseable if you're looking for a specific thing (i.e. Slack permissions). We should refactor it into bullet points. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5739) Contributor Story: "Submitting your first PR"
Scott Wegner created BEAM-5739: -- Summary: Contributor Story: "Submitting your first PR" Key: BEAM-5739 URL: https://issues.apache.org/jira/browse/BEAM-5739 Project: Beam Issue Type: Sub-task Components: website Reporter: Scott Wegner We should write the user story for "Submitting your first PR", with prescriptive steps on getting started. It should include: * Forking the repo and setting up the dev environment * How to build/test * Choosing an IDE * language / SDK-specific tips + website * "When will my changes go live?" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5738) Refactor Contributor Guide into "Stories"
Scott Wegner created BEAM-5738: -- Summary: Refactor Contributor Guide into "Stories" Key: BEAM-5738 URL: https://issues.apache.org/jira/browse/BEAM-5738 Project: Beam Issue Type: Sub-task Components: website Reporter: Scott Wegner The Contributor Guide has become a dumping ground for topics that are maybe relevant to contributors. It's long and unorganized and it doesn't really tell a contributor how to get started. We should refactor the contributor guide into "stories", like "How to submit your first PR", "How to contribute to the website", "How to propose a new feature", etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5737) Create a contributor FAQ page
Scott Wegner created BEAM-5737: -- Summary: Create a contributor FAQ page Key: BEAM-5737 URL: https://issues.apache.org/jira/browse/BEAM-5737 Project: Beam Issue Type: Sub-task Components: website Reporter: Scott Wegner This should be an easy "dumping ground" place to add documentation, pretty much about anything. It should be low-barrier to add new content or update existing. High-quality or frequently-used topics can be promoted to top level pages. It probably makes sense to put this on the wiki. But it should be linked very prominently from the top of the Contributor Guide -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154050=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154050 ] ASF GitHub Bot logged work on BEAM-5636: Author: ASF GitHub Bot Created on: 12/Oct/18 22:27 Start Date: 12/Oct/18 22:27 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #6665: [BEAM-5636] Java support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6665#issuecomment-429479809 Re: @herohde Please take another look at this. Thanks~ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154050) Time Spent: 0.5h (was: 20m) > Java support for custom dataflow worker jar > --- > > Key: BEAM-5636 > URL: https://issues.apache.org/jira/browse/BEAM-5636 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Java jobs. That requires a change to the Java boot > code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (BEAM-5615) Several tests fail on Python 3 with TypeError: 'cmp' is an invalid keyword argument for this function
[ https://issues.apache.org/jira/browse/BEAM-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev reopened BEAM-5615: --- > Several tests fail on Python 3 with TypeError: 'cmp' is an invalid keyword > argument for this function > - > > Key: BEAM-5615 > URL: https://issues.apache.org/jira/browse/BEAM-5615 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Affects Versions: 2.8.0 >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 3.5h > Remaining Estimate: 0h > > ERROR: test_top (apache_beam.transforms.combiners_test.CombineTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners_test.py", > line 89, in test_top > names) # Note parameter passed to comparator. > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pvalue.py", > line 111, in __or__ > return self.pipeline.apply(ptransform, self) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py", > line 467, in apply > label or transform.label) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py", > line 477, in apply > return self.apply(transform, pvalueish) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py", > line 513, in apply > pvalueish_result = self.runner.apply(transform, pvalueish) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/ptransform.py", > line 759, in expand > return self._fn(pcoll, *args, **kwargs) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners.py", > line 185, in Of > TopCombineFn(n, compare, key, reverse), *args, **kwargs) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pvalue.py", > line 111, in __or__ > return self.pipeline.apply(ptransform, self) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py", > line 513, in apply > pvalueish_result = self.runner.apply(transform, pvalueish) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1251, in expand > default_value = combine_fn.apply([], *self.args, **self.kwargs) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 623, in apply > *args, **kwargs) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners.py", > line 362, in extract_output > self._sort_buffer(buffer, lt) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners.py", > line 295, in _sort_buffer > key=self._key_fn) > TypeError: 'cmp' is an invalid keyword argument for this function -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5615) Several tests fail on Python 3 with TypeError: 'cmp' is an invalid keyword argument for this function
[ https://issues.apache.org/jira/browse/BEAM-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-5615: -- Affects Version/s: 2.8.0 > Several tests fail on Python 3 with TypeError: 'cmp' is an invalid keyword > argument for this function > - > > Key: BEAM-5615 > URL: https://issues.apache.org/jira/browse/BEAM-5615 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Affects Versions: 2.8.0 >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 3.5h > Remaining Estimate: 0h > > ERROR: test_top (apache_beam.transforms.combiners_test.CombineTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners_test.py", > line 89, in test_top > names) # Note parameter passed to comparator. > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pvalue.py", > line 111, in __or__ > return self.pipeline.apply(ptransform, self) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py", > line 467, in apply > label or transform.label) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py", > line 477, in apply > return self.apply(transform, pvalueish) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py", > line 513, in apply > pvalueish_result = self.runner.apply(transform, pvalueish) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/ptransform.py", > line 759, in expand > return self._fn(pcoll, *args, **kwargs) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners.py", > line 185, in Of > TopCombineFn(n, compare, key, reverse), *args, **kwargs) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pvalue.py", > line 111, in __or__ > return self.pipeline.apply(ptransform, self) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py", > line 513, in apply > pvalueish_result = self.runner.apply(transform, pvalueish) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1251, in expand > default_value = combine_fn.apply([], *self.args, **self.kwargs) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 623, in apply > *args, **kwargs) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners.py", > line 362, in extract_output > self._sort_buffer(buffer, lt) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners.py", > line 295, in _sort_buffer > key=self._key_fn) > TypeError: 'cmp' is an invalid keyword argument for this function -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()
[ https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154049 ] ASF GitHub Bot logged work on BEAM-5621: Author: ASF GitHub Bot Created on: 12/Oct/18 22:17 Start Date: 12/Oct/18 22:17 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #6602: [BEAM-5621] Fix unorderable types in python 3 URL: https://github.com/apache/beam/pull/6602#issuecomment-429478105 Unfortunately I think https://github.com/apache/beam/pull/6602#issuecomment-429468709 may be a very unintuitive change, so we need to roll it back and either fix the underlying issue with typing of negative numbers or proceed with a different solution here. We would need to cherry-pick the change into the release branch, so I'll mark BEAM-5615 as release blocker until cherry-pick is in. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154049) Time Spent: 2.5h (was: 2h 20m) > Several tests fail on Python 3 with TypeError: unorderable types: str() < > int() > --- > > Key: BEAM-5621 > URL: https://issues.apache.org/jira/browse/BEAM-5621 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 2.5h > Remaining Estimate: 0h > > == > ERROR: test_remove_duplicates > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 677, in process > self.do_fn_invoker.invoke_process(windowed_value) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 414, in invoke_process > windowed_value, self.process_method(windowed_value.value)) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1068, in > wrapper = lambda x: [fn(x)] > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py", > line 115, in _equal > sorted_expected = sorted(expected) > TypeError: unorderable types: str() < int() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.
[ https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=154048=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154048 ] ASF GitHub Bot logged work on BEAM-5653: Author: ASF GitHub Bot Created on: 12/Oct/18 22:16 Start Date: 12/Oct/18 22:16 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #6649: [BEAM-5653] Fix overriding coders due to duplicate coderId generation URL: https://github.com/apache/beam/pull/6649#issuecomment-429477876 And it's green! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154048) Time Spent: 4h 10m (was: 4h) Remaining Estimate: 67h 50m (was: 68h) > Dataflow FnApi Worker overrides some of Coders due to coder ID generation > collision. > > > Key: BEAM-5653 > URL: https://issues.apache.org/jira/browse/BEAM-5653 > Project: Beam > Issue Type: Test > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Blocker > Fix For: 2.8.0 > > Original Estimate: 72h > Time Spent: 4h 10m > Remaining Estimate: 67h 50m > > Due to one of latest refactorings, we got a bug in Java FnApi Worker that it > overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes > jobs to fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5736) List prerequisite knowledge with links to external docs
Scott Wegner created BEAM-5736: -- Summary: List prerequisite knowledge with links to external docs Key: BEAM-5736 URL: https://issues.apache.org/jira/browse/BEAM-5736 Project: Beam Issue Type: Sub-task Components: website Reporter: Scott Wegner Assignee: Scott Wegner Our contributor guide makes some assumptions about prior knowledge. We should be explicit about what we expect contributors to already know, and give links to places where they can learn more if necessary. Examples: * Git & GitHub workflows * Beam (understand what it does, what it's for, how it fits into ecosystem) * JDK 8 installed * Windows / Mac / Linux dev environment (be explicit about what we support) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5735) Contributor Guide Improvements
Scott Wegner created BEAM-5735: -- Summary: Contributor Guide Improvements Key: BEAM-5735 URL: https://issues.apache.org/jira/browse/BEAM-5735 Project: Beam Issue Type: Improvement Components: website Reporter: Scott Wegner Assignee: Alan Myrvold This is a wish-list for improvements to the Beam contributor guide. Many thanks to [~rohdesam] for the feedback which helped shape this list. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=154047=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154047 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 12/Oct/18 22:10 Start Date: 12/Oct/18 22:10 Worklog Time Spent: 10m Work Description: ajamato commented on issue #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#issuecomment-429476625 Squashed all the commits, FYI I imported this PR and internal google tests are also passing. @robertwb, happy to iterate more on your suggestions but we would like to submit this PR, and finish up this iteration This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154047) Time Spent: 8h 10m (was: 8h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 8h 10m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (BEAM-5709) Tests in BeamFnControlServiceTest are flaky.
[ https://issues.apache.org/jira/browse/BEAM-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reopened BEAM-5709: --- > Tests in BeamFnControlServiceTest are flaky. > > > Key: BEAM-5709 > URL: https://issues.apache.org/jira/browse/BEAM-5709 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Minor > Fix For: Not applicable > > Time Spent: 1.5h > Remaining Estimate: 0h > > https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlServiceTest.java > Tests for BeamFnControlService are currently flaky. The test attempts to > verify that onCompleted was called on the mock streams, but that function > gets called on a separate thread, so occasionally the function will not have > been called yet, despite the server being shut down. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5729) Create ability to read/write database implementing database/sql contract
[ https://issues.apache.org/jira/browse/BEAM-5729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648516#comment-16648516 ] Adrian Witas commented on BEAM-5729: In my case I need a few dictionary as side input to be fetched from database/sql, scallable is not must have for my case, but missing this functionality is discouraging, > Create ability to read/write database implementing database/sql contract > - > > Key: BEAM-5729 > URL: https://issues.apache.org/jira/browse/BEAM-5729 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Affects Versions: 2.7.0 >Reporter: Adrian Witas >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-1251) Python 3 Support
[ https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=154046=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154046 ] ASF GitHub Bot logged work on BEAM-1251: Author: ASF GitHub Bot Created on: 12/Oct/18 22:00 Start Date: 12/Oct/18 22:00 Worklog Time Spent: 10m Work Description: swegner commented on issue #6679: [BEAM-1251] Add a link to Python 3 Conversion Quick Start Guide to the list of ongoing efforts on Beam site. URL: https://github.com/apache/beam/pull/6679#issuecomment-429474553 Yup! If you pop open the "All checks have passed" bar, you'll see a link to the "Website_Stage_GCS" job [results](https://builds.apache.org/job/beam_PreCommit_Website_Stage_GCS_Commit/79/), which contains a link to your staged changes for review: http://apache-beam-website-pull-requests.storage.googleapis.com/6679/index.html (I'm brainstorming a way to make that link more prominent on GitHub; let me know if you have ideas) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154046) Time Spent: 22h 50m (was: 22h 40m) > Python 3 Support > > > Key: BEAM-1251 > URL: https://issues.apache.org/jira/browse/BEAM-1251 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Eyad Sibai >Assignee: Robbe >Priority: Major > Time Spent: 22h 50m > Remaining Estimate: 0h > > I have been trying to use google datalab with python3. As I see there are > several packages that does not support python3 yet which google datalab > depends on. This is one of them. > https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5729) Create ability to read/write database implementing database/sql contract
[ https://issues.apache.org/jira/browse/BEAM-5729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648512#comment-16648512 ] Robert Burke commented on BEAM-5729: Once we have a way of defining Splitable DoFns for portable runners, it'll be easier to write a scalable implementation of this. See https://issues.apache.org/jira/browse/BEAM-3301 > Create ability to read/write database implementing database/sql contract > - > > Key: BEAM-5729 > URL: https://issues.apache.org/jira/browse/BEAM-5729 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Affects Versions: 2.7.0 >Reporter: Adrian Witas >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5734) RedisIO: finishBundle calls Jedis.exec without checking if there are operations in the pipeline
[ https://issues.apache.org/jira/browse/BEAM-5734?focusedWorklogId=154045=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154045 ] ASF GitHub Bot logged work on BEAM-5734: Author: ASF GitHub Bot Created on: 12/Oct/18 21:56 Start Date: 12/Oct/18 21:56 Worklog Time Spent: 10m Work Description: casidiablo commented on a change in pull request #6682: [BEAM-5734] RedisIO: only call Jedis.exec() on finishBundle if there is something to send URL: https://github.com/apache/beam/pull/6682#discussion_r224925261 ## File path: sdks/java/io/redis/src/test/java/org/apache/beam/sdk/io/redis/RedisIOTest.java ## @@ -86,7 +86,7 @@ public void testBulkRead() throws Exception { @Test public void testWriteReadUsingDefaultAppendMethod() throws Exception { ArrayList> data = new ArrayList<>(); -for (int i = 0; i < 100; i++) { +for (int i = 0; i < 8000; i++) { Review comment: If the test had used this value instead, unit tests would have detected the issue. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154045) Time Spent: 20m (was: 10m) > RedisIO: finishBundle calls Jedis.exec without checking if there are > operations in the pipeline > --- > > Key: BEAM-5734 > URL: https://issues.apache.org/jira/browse/BEAM-5734 > Project: Beam > Issue Type: Bug > Components: io-java-redis >Reporter: Cristian >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > It throws: > > {code:java} > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > redis.clients.jedis.exceptions.JedisDataException: EXEC without MULTI > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332) > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) > at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350) > at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331) > at > org.apache.beam.sdk.io.redis.RedisIOTest.testWriteReadUsingDefaultAppendMethod(RedisIOTest.java:100) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) > at org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.
[jira] [Assigned] (BEAM-5729) Create ability to read/write database implementing database/sql contract
[ https://issues.apache.org/jira/browse/BEAM-5729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-5729: -- Assignee: (was: Robert Burke) > Create ability to read/write database implementing database/sql contract > - > > Key: BEAM-5729 > URL: https://issues.apache.org/jira/browse/BEAM-5729 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Affects Versions: 2.7.0 >Reporter: Adrian Witas >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5728) Unable to store nil value in BigQuery go sdk
[ https://issues.apache.org/jira/browse/BEAM-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-5728: -- Assignee: (was: Robert Burke) > Unable to store nil value in BigQuery go sdk > > > Key: BEAM-5728 > URL: https://issues.apache.org/jira/browse/BEAM-5728 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Affects Versions: 2.7.0 > Environment: any >Reporter: Adrian Witas >Priority: Major > > Any struct type that uses pointer type in any field is flag as invalid: > "Invalid underlying type: XXX " > With large BigQuery schema (200+ columns), and large table size, ability of > ingesting and enriching data from file system (GS, local) where many record > will have partial details with nulls is critial since it reduces cost, > BigQuery does not charge for NULL, but does for 0, and "empty" messages, -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5734) RedisIO: finishBundle calls Jedis.exec without checking if there are operations in the pipeline
[ https://issues.apache.org/jira/browse/BEAM-5734?focusedWorklogId=154044=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154044 ] ASF GitHub Bot logged work on BEAM-5734: Author: ASF GitHub Bot Created on: 12/Oct/18 21:55 Start Date: 12/Oct/18 21:55 Worklog Time Spent: 10m Work Description: casidiablo opened a new pull request #6682: [BEAM-5734] RedisIO: only call Jedis.exec() on finishBundle if there is something to send URL: https://github.com/apache/beam/pull/6682 This fixes a bug in the RedisIO.write sink. The `finishBundle()` calls Jedis' `pipeline.exec()` method without checking if there is actually something to flush. That results in this exception being thrown: ``` org.apache.beam.sdk.Pipeline$PipelineExecutionException: redis.clients.jedis.exceptions.JedisDataException: EXEC without MULTI at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332) at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302) at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197) at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350) at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331) at org.apache.beam.sdk.io.redis.RedisIOTest.testWriteReadUsingDefaultAppendMethod(RedisIOTest.java:100) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) at org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) at com.sun.proxy.$Proxy2.processTestClass(Unknown Source
[jira] [Commented] (BEAM-5728) Unable to store nil value in BigQuery go sdk
[ https://issues.apache.org/jira/browse/BEAM-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648509#comment-16648509 ] Robert Burke commented on BEAM-5728: Part of this has to do with not having a coder registry for handling user types, and handling interfaces{} fields. technically, a runner could require encoding of types between any two pardos, so user types must be codeable. https://issues.apache.org/jira/browse/BEAM-3306 That said, there's a work around by making your schema type pretend to be a Protocol buffer which will bypass that analysis, at the expense of needing to specify your own encoding and decoding for the type in the Marshal and Unmarshal methods. You can see how to do that, demonstrated in create_test.go for testProto. [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/create_test.go#L84] > Unable to store nil value in BigQuery go sdk > > > Key: BEAM-5728 > URL: https://issues.apache.org/jira/browse/BEAM-5728 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Affects Versions: 2.7.0 > Environment: any >Reporter: Adrian Witas >Assignee: Robert Burke >Priority: Major > > Any struct type that uses pointer type in any field is flag as invalid: > "Invalid underlying type: XXX " > With large BigQuery schema (200+ columns), and large table size, ability of > ingesting and enriching data from file system (GS, local) where many record > will have partial details with nulls is critial since it reduces cost, > BigQuery does not charge for NULL, but does for 0, and "empty" messages, -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5734) RedisIO: finishBundle calls Jedis.exec without checking if there are operations in the pipeline
Cristian created BEAM-5734: -- Summary: RedisIO: finishBundle calls Jedis.exec without checking if there are operations in the pipeline Key: BEAM-5734 URL: https://issues.apache.org/jira/browse/BEAM-5734 Project: Beam Issue Type: Bug Components: io-java-redis Reporter: Cristian Assignee: Jean-Baptiste Onofré It throws: {code:java} org.apache.beam.sdk.Pipeline$PipelineExecutionException: redis.clients.jedis.exceptions.JedisDataException: EXEC without MULTI at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332) at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302) at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197) at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350) at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331) at org.apache.beam.sdk.io.redis.RedisIOTest.testWriteReadUsingDefaultAppendMethod(RedisIOTest.java:100) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) at org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:117) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:155
[jira] [Updated] (BEAM-5734) RedisIO: finishBundle calls Jedis.exec without checking if there are operations in the pipeline
[ https://issues.apache.org/jira/browse/BEAM-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cristian updated BEAM-5734: --- Description: It throws: {code:java} org.apache.beam.sdk.Pipeline$PipelineExecutionException: redis.clients.jedis.exceptions.JedisDataException: EXEC without MULTI at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332) at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302) at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197) at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350) at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331) at org.apache.beam.sdk.io.redis.RedisIOTest.testWriteReadUsingDefaultAppendMethod(RedisIOTest.java:100) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) at org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:117) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:155) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:137) at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404) at org.gradle.internal.concurrent.ExecutorPolicy
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154043=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154043 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 12/Oct/18 21:52 Start Date: 12/Oct/18 21:52 Worklog Time Spent: 10m Work Description: HuangLED edited a comment on issue #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#issuecomment-429468108 R: @herohde cc: @boyuanzz @pabloem Addressed. Also, option definition moved to WorkerOptions. Thanks to Boyuan for pointing out the right place for error message. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154043) Time Spent: 2h 20m (was: 2h 10m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()
[ https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154042=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154042 ] ASF GitHub Bot logged work on BEAM-5621: Author: ASF GitHub Bot Created on: 12/Oct/18 21:47 Start Date: 12/Oct/18 21:47 Worklog Time Spent: 10m Work Description: tvalentyn edited a comment on issue #6602: [BEAM-5621] Fix unorderable types in python 3 URL: https://github.com/apache/beam/pull/6602#issuecomment-429468709 Another somewhat related observation: Following snippets fails on Python 2, after this PR (in Direct Runner), but will pass in Python 3 where there is no distinction between `int` and `long`. ``` p = TestPipeline(options=PipelineOptions(pipeline_args)) input_data = p | beam.Create([1, -2]) # This becomes a [1, -2L]! (Unrelated to this PR). expected_result = [-2, 1] assert_that(input_data, equal_to(expected_result)) ``` ``` apache_beam.testing.util.BeamAssertException: Failed assert: [-2, 1] == [1, -2L] [while running 'assert_that/Match'] ``` Do we know why negatives are represented as longs? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154042) Time Spent: 2h 20m (was: 2h 10m) > Several tests fail on Python 3 with TypeError: unorderable types: str() < > int() > --- > > Key: BEAM-5621 > URL: https://issues.apache.org/jira/browse/BEAM-5621 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 2h 20m > Remaining Estimate: 0h > > == > ERROR: test_remove_duplicates > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 677, in process > self.do_fn_invoker.invoke_process(windowed_value) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 414, in invoke_process > windowed_value, self.process_method(windowed_value.value)) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1068, in > wrapper = lambda x: [fn(x)] > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py", > line 115, in _equal > sorted_expected = sorted(expected) > TypeError: unorderable types: str() < int() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5733) Pushdown filter to table scan
Rui Wang created BEAM-5733: -- Summary: Pushdown filter to table scan Key: BEAM-5733 URL: https://issues.apache.org/jira/browse/BEAM-5733 Project: Beam Issue Type: Sub-task Components: dsl-sql Reporter: Rui Wang Assignee: Rui Wang -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.
[ https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=154039=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154039 ] ASF GitHub Bot logged work on BEAM-5653: Author: ASF GitHub Bot Created on: 12/Oct/18 21:39 Start Date: 12/Oct/18 21:39 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #6649: [BEAM-5653] Fix overriding coders due to duplicate coderId generation URL: https://github.com/apache/beam/pull/6649#issuecomment-429470182 I merged the purported fix for that, so you should have better luck. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154039) Time Spent: 4h (was: 3h 50m) Remaining Estimate: 68h (was: 68h 10m) > Dataflow FnApi Worker overrides some of Coders due to coder ID generation > collision. > > > Key: BEAM-5653 > URL: https://issues.apache.org/jira/browse/BEAM-5653 > Project: Beam > Issue Type: Test > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Blocker > Fix For: 2.8.0 > > Original Estimate: 72h > Time Spent: 4h > Remaining Estimate: 68h > > Due to one of latest refactorings, we got a bug in Java FnApi Worker that it > overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes > jobs to fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5709) Tests in BeamFnControlServiceTest are flaky.
[ https://issues.apache.org/jira/browse/BEAM-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles resolved BEAM-5709. --- Resolution: Fixed Fix Version/s: Not applicable > Tests in BeamFnControlServiceTest are flaky. > > > Key: BEAM-5709 > URL: https://issues.apache.org/jira/browse/BEAM-5709 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Minor > Fix For: Not applicable > > Time Spent: 1.5h > Remaining Estimate: 0h > > https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlServiceTest.java > Tests for BeamFnControlService are currently flaky. The test attempts to > verify that onCompleted was called on the mock streams, but that function > gets called on a separate thread, so occasionally the function will not have > been called yet, despite the server being shut down. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()
[ https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154037=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154037 ] ASF GitHub Bot logged work on BEAM-5621: Author: ASF GitHub Bot Created on: 12/Oct/18 21:33 Start Date: 12/Oct/18 21:33 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #6602: [BEAM-5621] Fix unorderable types in python 3 URL: https://github.com/apache/beam/pull/6602#issuecomment-429468709 Another somewhat related observation: Following snippets fails on Python 2, after this PR (in Direct Runner), but will pass in Python 3 where there is no distinction between `int` and `long`. ``` p = TestPipeline(options=PipelineOptions(pipeline_args)) input_data = p | beam.Create([1, -2]) # This becomes a [1, -2L]! (Unrelated to this PR). expected_result = [-2, 1] assert_that(input_data, equal_to(expected_result)) ``` Do we know why negatives are represented as longs? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154037) Time Spent: 2h 10m (was: 2h) > Several tests fail on Python 3 with TypeError: unorderable types: str() < > int() > --- > > Key: BEAM-5621 > URL: https://issues.apache.org/jira/browse/BEAM-5621 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 2h 10m > Remaining Estimate: 0h > > == > ERROR: test_remove_duplicates > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 677, in process > self.do_fn_invoker.invoke_process(windowed_value) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 414, in invoke_process > windowed_value, self.process_method(windowed_value.value)) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1068, in > wrapper = lambda x: [fn(x)] > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py", > line 115, in _equal > sorted_expected = sorted(expected) > TypeError: unorderable types: str() < int() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154036=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154036 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 12/Oct/18 21:30 Start Date: 12/Oct/18 21:30 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#issuecomment-429468108 R: @boyuanzz @herohde @pabloem Addressed. Also, option definition moved to WorkerOptions. Thanks to Boyuan for pointing out the right place for error message. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154036) Time Spent: 2h 10m (was: 2h) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5732) expose runner mode to user through samza pipeline option
Hai created BEAM-5732: - Summary: expose runner mode to user through samza pipeline option Key: BEAM-5732 URL: https://issues.apache.org/jira/browse/BEAM-5732 Project: Beam Issue Type: Improvement Components: runner-samza Reporter: Hai Assignee: Xinyu Liu We should expose runner mode to user through samza pipeline option so that user can decide whether to start samza job as local mode or remote mode. This should work consistently in both Java runner and Portable runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154028=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154028 ] ASF GitHub Bot logged work on BEAM-5636: Author: ASF GitHub Bot Created on: 12/Oct/18 21:00 Start Date: 12/Oct/18 21:00 Worklog Time Spent: 10m Work Description: boyuanzz commented on a change in pull request #6665: [BEAM-5636] Java support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6665#discussion_r224913769 ## File path: sdks/java/container/boot.go ## @@ -104,6 +104,9 @@ func main() { filepath.Join(jarsDir, "beam-sdks-java-harness.jar"), } for _, md := range artifacts { + if strings.HasPrefix(md.Name, "beam-runners-google-cloud-dataflow-java-fn-api-worker") { Review comment: The purpose here is, if there is java worker jar in artifacts, then this jar should not be included into sdk harness classpath, which seems like we don't need to check experiment. wdyt? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154028) Time Spent: 20m (was: 10m) > Java support for custom dataflow worker jar > --- > > Key: BEAM-5636 > URL: https://issues.apache.org/jira/browse/BEAM-5636 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Java jobs. That requires a change to the Java boot > code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154027=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154027 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 12/Oct/18 20:59 Start Date: 12/Oct/18 20:59 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#issuecomment-429460451 Run Python PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154027) Time Spent: 2h (was: 1h 50m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.
[ https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=154025=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154025 ] ASF GitHub Bot logged work on BEAM-5653: Author: ASF GitHub Bot Created on: 12/Oct/18 20:49 Start Date: 12/Oct/18 20:49 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #6649: [BEAM-5653] Fix overriding coders due to duplicate coderId generation URL: https://github.com/apache/beam/pull/6649#issuecomment-429457986 Precommits fail due to flaky test see [BEAM-5709](https://issues.apache.org/jira/browse/BEAM-5709) Succeeded precommit run: https://builds.apache.org/job/beam_PreCommit_Java_Phrase/318/ Failing precommit run: https://builds.apache.org/job/beam_PreCommit_Java_Phrase/319/ I accidentally started precommit twice in a row. Can we merge this? I feel this will be safer, than to try get green precommit. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154025) Time Spent: 3h 50m (was: 3h 40m) Remaining Estimate: 68h 10m (was: 68h 20m) > Dataflow FnApi Worker overrides some of Coders due to coder ID generation > collision. > > > Key: BEAM-5653 > URL: https://issues.apache.org/jira/browse/BEAM-5653 > Project: Beam > Issue Type: Test > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Blocker > Fix For: 2.8.0 > > Original Estimate: 72h > Time Spent: 3h 50m > Remaining Estimate: 68h 10m > > Due to one of latest refactorings, we got a bug in Java FnApi Worker that it > overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes > jobs to fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154024=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154024 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 12/Oct/18 20:47 Start Date: 12/Oct/18 20:47 Worklog Time Spent: 10m Work Description: HuangLED opened a new pull request #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680 Python support for customer worker jar (as a staged file). Tested positive and negative case by starting actual jobs. PreCommit pass locally. Follow this checklist to help us incorporate your contribution quickly and easily: - [X ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154024) Time Spent: 1h 50m (was: 1h 40m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Bea
[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154022=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154022 ] ASF GitHub Bot logged work on BEAM-5636: Author: ASF GitHub Bot Created on: 12/Oct/18 20:46 Start Date: 12/Oct/18 20:46 Worklog Time Spent: 10m Work Description: herohde commented on a change in pull request #6665: [BEAM-5636] Java support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6665#discussion_r224910454 ## File path: sdks/java/container/boot.go ## @@ -104,6 +104,9 @@ func main() { filepath.Join(jarsDir, "beam-sdks-java-harness.jar"), } for _, md := range artifacts { + if strings.HasPrefix(md.Name, "beam-runners-google-cloud-dataflow-java-fn-api-worker") { Review comment: We should only make this check if the experiment is set. Also, the name will change to "dataflow-worker.jar" when the artifact bug is fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154022) Time Spent: 10m Remaining Estimate: 0h > Java support for custom dataflow worker jar > --- > > Key: BEAM-5636 > URL: https://issues.apache.org/jira/browse/BEAM-5636 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Java jobs. That requires a change to the Java boot > code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154021=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154021 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 12/Oct/18 20:44 Start Date: 12/Oct/18 20:44 Worklog Time Spent: 10m Work Description: HuangLED commented on a change in pull request #6667: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6667#discussion_r224910055 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -674,7 +674,12 @@ def _add_argparse_args(cls, parser): 'job submission, the files will be staged in the staging area ' '(--staging_location option) and the workers will install them in ' 'same order they were specified on the command line.')) - +parser.add_argument( +'--dataflow_worker_jar', +dest='dataflow_worker_jar', +type=str, +help='Dataflow worker jar.' +) Review comment: Thanks! Issue addressed but lost the status in this PR due to my sub-optional git operations. Opening another PR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154021) Time Spent: 1h 40m (was: 1.5h) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()
[ https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154019=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154019 ] ASF GitHub Bot logged work on BEAM-5621: Author: ASF GitHub Bot Created on: 12/Oct/18 20:42 Start Date: 12/Oct/18 20:42 Worklog Time Spent: 10m Work Description: aaltay commented on issue #6602: [BEAM-5621] Fix unorderable types in python 3 URL: https://github.com/apache/beam/pull/6602#issuecomment-429456279 This is clearly backward incompatible change, however I think this is the right behavior. In that sense I do not think it is a regression. However, we should clearly highlight this in our release notes/blog post etc. @tvalentyn Could you create a JIRA, mark it for 2.8.0, explain the change in behaviour and mark it as fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154019) Time Spent: 2h (was: 1h 50m) > Several tests fail on Python 3 with TypeError: unorderable types: str() < > int() > --- > > Key: BEAM-5621 > URL: https://issues.apache.org/jira/browse/BEAM-5621 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 2h > Remaining Estimate: 0h > > == > ERROR: test_remove_duplicates > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 677, in process > self.do_fn_invoker.invoke_process(windowed_value) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 414, in invoke_process > windowed_value, self.process_method(windowed_value.value)) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1068, in > wrapper = lambda x: [fn(x)] > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py", > line 115, in _equal > sorted_expected = sorted(expected) > TypeError: unorderable types: str() < int() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154020=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154020 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 12/Oct/18 20:42 Start Date: 12/Oct/18 20:42 Worklog Time Spent: 10m Work Description: HuangLED closed pull request #6667: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6667 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/python/apache_beam/options/pipeline_options.py b/sdks/python/apache_beam/options/pipeline_options.py index a172535b100..2c061e0ec52 100644 --- a/sdks/python/apache_beam/options/pipeline_options.py +++ b/sdks/python/apache_beam/options/pipeline_options.py @@ -674,7 +674,12 @@ def _add_argparse_args(cls, parser): 'job submission, the files will be staged in the staging area ' '(--staging_location option) and the workers will install them in ' 'same order they were specified on the command line.')) - +parser.add_argument( +'--dataflow_worker_jar', +dest='dataflow_worker_jar', +type=str, +help='Dataflow worker jar.' +) class PortableOptions(PipelineOptions): diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py index 1acd3488524..5be60bd701b 100644 --- a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py +++ b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py @@ -381,6 +381,12 @@ def run_pipeline(self, pipeline): self.dataflow_client = apiclient.DataflowApplicationClient( pipeline._options) +if setup_options.dataflow_worker_jar: + experiments = ["use_staged_dataflow_worker_jar"] + if debug_options.experiments is not None: +experiments = list(set(experiments + debug_options.experiments)) + debug_options.experiments = experiments + # Create the job description and send a request to the service. The result # can be None if there is no need to send a request to the service (e.g. # template creation). If a request was sent and failed then the call will diff --git a/sdks/python/apache_beam/runners/portability/stager.py b/sdks/python/apache_beam/runners/portability/stager.py index ef7401ac6aa..e336fd3f9b9 100644 --- a/sdks/python/apache_beam/runners/portability/stager.py +++ b/sdks/python/apache_beam/runners/portability/stager.py @@ -123,8 +123,7 @@ def stage_job_resources(self, Returns: A list of file names (no paths) for the resources staged. All the - files - are assumed to be staged at staging_location. + files are assumed to be staged at staging_location. Raises: RuntimeError: If files specified are not found or error encountered @@ -256,6 +255,13 @@ def stage_job_resources(self, 'The file "%s" cannot be found. Its location was specified by ' 'the --sdk_location command-line option.' % sdk_path) +if hasattr(setup_options, 'dataflow_worker_jar') and \ +setup_options.dataflow_worker_jar: + jar_staged_filename = 'dataflow-worker.jar' + staged_path = FileSystems.join(staging_location, jar_staged_filename) + self.stage_artifact(setup_options.dataflow_worker_jar, staged_path) + resources.append(jar_staged_filename) + # Delete all temp files created while staging job resources. shutil.rmtree(temp_dir) retrieval_token = self.commit_manifest() This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154020) Time Spent: 1.5h (was: 1h 20m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Pyth
[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.
[ https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=154016=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154016 ] ASF GitHub Bot logged work on BEAM-5653: Author: ASF GitHub Bot Created on: 12/Oct/18 20:41 Start Date: 12/Oct/18 20:41 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #6649: [BEAM-5653] Fix overriding coders due to duplicate coderId generation URL: https://github.com/apache/beam/pull/6649#issuecomment-429456093 run java precommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154016) Time Spent: 3h 20m (was: 3h 10m) Remaining Estimate: 68h 40m (was: 68h 50m) > Dataflow FnApi Worker overrides some of Coders due to coder ID generation > collision. > > > Key: BEAM-5653 > URL: https://issues.apache.org/jira/browse/BEAM-5653 > Project: Beam > Issue Type: Test > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Blocker > Fix For: 2.8.0 > > Original Estimate: 72h > Time Spent: 3h 20m > Remaining Estimate: 68h 40m > > Due to one of latest refactorings, we got a bug in Java FnApi Worker that it > overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes > jobs to fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.
[ https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=154018=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154018 ] ASF GitHub Bot logged work on BEAM-5653: Author: ASF GitHub Bot Created on: 12/Oct/18 20:41 Start Date: 12/Oct/18 20:41 Worklog Time Spent: 10m Work Description: Ardagan removed a comment on issue #6649: [BEAM-5653] Fix overriding coders due to duplicate coderId generation URL: https://github.com/apache/beam/pull/6649#issuecomment-429396509 run java precommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154018) Time Spent: 3h 40m (was: 3.5h) Remaining Estimate: 68h 20m (was: 68.5h) > Dataflow FnApi Worker overrides some of Coders due to coder ID generation > collision. > > > Key: BEAM-5653 > URL: https://issues.apache.org/jira/browse/BEAM-5653 > Project: Beam > Issue Type: Test > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Blocker > Fix For: 2.8.0 > > Original Estimate: 72h > Time Spent: 3h 40m > Remaining Estimate: 68h 20m > > Due to one of latest refactorings, we got a bug in Java FnApi Worker that it > overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes > jobs to fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.
[ https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=154017=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154017 ] ASF GitHub Bot logged work on BEAM-5653: Author: ASF GitHub Bot Created on: 12/Oct/18 20:41 Start Date: 12/Oct/18 20:41 Worklog Time Spent: 10m Work Description: Ardagan removed a comment on issue #6649: [BEAM-5653] Fix overriding coders due to duplicate coderId generation URL: https://github.com/apache/beam/pull/6649#issuecomment-429456093 run java precommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154017) Time Spent: 3.5h (was: 3h 20m) Remaining Estimate: 68.5h (was: 68h 40m) > Dataflow FnApi Worker overrides some of Coders due to coder ID generation > collision. > > > Key: BEAM-5653 > URL: https://issues.apache.org/jira/browse/BEAM-5653 > Project: Beam > Issue Type: Test > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Blocker > Fix For: 2.8.0 > > Original Estimate: 72h > Time Spent: 3.5h > Remaining Estimate: 68.5h > > Due to one of latest refactorings, we got a bug in Java FnApi Worker that it > overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes > jobs to fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-1251) Python 3 Support
[ https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=154015=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154015 ] ASF GitHub Bot logged work on BEAM-1251: Author: ASF GitHub Bot Created on: 12/Oct/18 20:37 Start Date: 12/Oct/18 20:37 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #6679: [BEAM-1251] Add a link to Python 3 Conversion Quick Start Guide to the list of ongoing efforts on Beam site. URL: https://github.com/apache/beam/pull/6679#issuecomment-429455025 Hey @swegner, am I using a correct way to change the Beam-site? Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154015) Time Spent: 22h 40m (was: 22.5h) > Python 3 Support > > > Key: BEAM-1251 > URL: https://issues.apache.org/jira/browse/BEAM-1251 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Eyad Sibai >Assignee: Robbe >Priority: Major > Time Spent: 22h 40m > Remaining Estimate: 0h > > I have been trying to use google datalab with python3. As I see there are > several packages that does not support python3 yet which google datalab > depends on. This is one of them. > https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-1251) Python 3 Support
[ https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=154014=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154014 ] ASF GitHub Bot logged work on BEAM-1251: Author: ASF GitHub Bot Created on: 12/Oct/18 20:34 Start Date: 12/Oct/18 20:34 Worklog Time Spent: 10m Work Description: tvalentyn opened a new pull request #6679: [BEAM-1251] Add a link to Python 3 Conversion Quick Start Guide to the list of ongoing efforts on Beam site. URL: https://github.com/apache/beam/pull/6679 Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154014) Time Spent: 22.5h (was: 22h 20m) > Python 3 Support > > > Key: BEAM-1251 > URL: https://issues.apache.org/jira/browse/BEAM-1251 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Eyad Sibai >Assignee: Robbe >Prio
[jira] [Work logged] (BEAM-1251) Python 3 Support
[ https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=154013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154013 ] ASF GitHub Bot logged work on BEAM-1251: Author: ASF GitHub Bot Created on: 12/Oct/18 20:32 Start Date: 12/Oct/18 20:32 Worklog Time Spent: 10m Work Description: tvalentyn opened a new pull request #6678: [BEAM-1251] Add a link to Python 3 Conversion Quick Start Guide to the list of ongoing efforts on Beam site. URL: https://github.com/apache/beam/pull/6678 Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154013) Time Spent: 22h 20m (was: 22h 10m) > Python 3 Support > > > Key: BEAM-1251 > URL: https://issues.apache.org/jira/browse/BEAM-1251 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Eyad Sibai >Assignee: Robbe >Prio
[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()
[ https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154011=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154011 ] ASF GitHub Bot logged work on BEAM-5621: Author: ASF GitHub Bot Created on: 12/Oct/18 20:26 Start Date: 12/Oct/18 20:26 Worklog Time Spent: 10m Work Description: charlesccychen commented on issue #6602: [BEAM-5621] Fix unorderable types in python 3 URL: https://github.com/apache/beam/pull/6602#issuecomment-429452493 CC: @Juta @robertwb @aaltay This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154011) Time Spent: 1h 50m (was: 1h 40m) > Several tests fail on Python 3 with TypeError: unorderable types: str() < > int() > --- > > Key: BEAM-5621 > URL: https://issues.apache.org/jira/browse/BEAM-5621 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 1h 50m > Remaining Estimate: 0h > > == > ERROR: test_remove_duplicates > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 677, in process > self.do_fn_invoker.invoke_process(windowed_value) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 414, in invoke_process > windowed_value, self.process_method(windowed_value.value)) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1068, in > wrapper = lambda x: [fn(x)] > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py", > line 115, in _equal > sorted_expected = sorted(expected) > TypeError: unorderable types: str() < int() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()
[ https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154010=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154010 ] ASF GitHub Bot logged work on BEAM-5621: Author: ASF GitHub Bot Created on: 12/Oct/18 20:26 Start Date: 12/Oct/18 20:26 Worklog Time Spent: 10m Work Description: charlesccychen commented on issue #6602: [BEAM-5621] Fix unorderable types in python 3 URL: https://github.com/apache/beam/pull/6602#issuecomment-429452434 Because we now sort by types, we may now encounter different behavior when using different string types. For example, previously `assert_that(equal_to(['a', u'b', b'c'], ['a', 'b', 'c]))` worked, but now it may not because this sorting order now depends on the exact type (i.e. the sorting may produce `[u'b', 'a', b'c'`) even for orderable types. Should we consider this a regression? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154010) Time Spent: 1h 40m (was: 1.5h) > Several tests fail on Python 3 with TypeError: unorderable types: str() < > int() > --- > > Key: BEAM-5621 > URL: https://issues.apache.org/jira/browse/BEAM-5621 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 1h 40m > Remaining Estimate: 0h > > == > ERROR: test_remove_duplicates > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 677, in process > self.do_fn_invoker.invoke_process(windowed_value) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 414, in invoke_process > windowed_value, self.process_method(windowed_value.value)) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1068, in > wrapper = lambda x: [fn(x)] > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py", > line 115, in _equal > sorted_expected = sorted(expected) > TypeError: unorderable types: str() < int() -- This message was sent by Atlassian JIRA (v7.6.3#76005)