[jira] [Commented] (BEAM-6794) [beam_PostCommit_Java_PortabilityApi][testBigQueryStorageRead1G] coder failure
[ https://issues.apache.org/jira/browse/BEAM-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790210#comment-16790210 ] Kenneth Jung commented on BEAM-6794: It is ready to close from my perspective. [~Ardagan] can you verify? > [beam_PostCommit_Java_PortabilityApi][testBigQueryStorageRead1G] coder failure > -- > > Key: BEAM-6794 > URL: https://issues.apache.org/jira/browse/BEAM-6794 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Kenneth Jung >Priority: Critical > Labels: currently-failing, triaged > Fix For: Not applicable > > Time Spent: 1h 10m > Remaining Estimate: 0h > > First failure > [https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/1178/] > > Culprit PR: > https://github.com/apache/beam/pull/7967 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6798) Reconsider usage of gradle release plugin
[ https://issues.apache.org/jira/browse/BEAM-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790167#comment-16790167 ] Kenneth Knowles commented on BEAM-6798: --- Are you doing modifications? Should this be assigned to you? > Reconsider usage of gradle release plugin > - > > Key: BEAM-6798 > URL: https://issues.apache.org/jira/browse/BEAM-6798 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Michael Luckey >Priority: Major > > Currently, we use the gradle release plugin in a way probably not matching > plugins own expectations. Some of this was discussed in [1] > After release branch was cut, we call [2] > {noformat} > ./gradlew release > {noformat} > Apart from doing some validations, this creates two commits changing version > property > # sets version in gradle.properties to '${RELEASE}-RC${RC_NUM}' (Commit_1) > # sets version in gradle.properties to back to '${RELEASE}-SNAPSHOT' > (Commit_2) > Commit_1 will also be tagged as (tag: v${RELEASE}-RC${RC_NUM}) > Afterwards, we continue with 'Commit_2' in testing, bundling and publishing. > I.e. looking into source distribution published, this is not the one tagged, > but its successor. This is probably suboptimal. > The release plugins expectations would probably more along the lines to > actually increment next version (either patch, minor or even major) and > release on that Commit_1. > Based on my current understanding, it seems easier to either > * drop usage of gradle release plugin and just fall back to a plain 'exec > git tag' > * use a beam-release task which depends on gradle release checks, but does > no version changes nor commits > The former has the drawback to also drop the checks done by release plugin, > e.g. > * checkCommitNeeded > * checkUpdateNeeded > * checkSnapshotDependencies > * runBuildTasks > * createReleaseTag > which might be still valuable. > [1] > [https://lists.apache.org/thread.html/205472bdaf3c2c5876533750d417c19b0d1078131a3dc04916082ce8@%3Cdev.beam.apache.org%3E] > [2] > [https://github.com/apache/beam/blob/master/release/src/main/scripts/build_release_candidate.sh#L92-L94] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-3204) Coders only should have a FunctionSpec, not an SdkFunctionSpec
[ https://issues.apache.org/jira/browse/BEAM-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-3204: -- Labels: portability triaged (was: portability) > Coders only should have a FunctionSpec, not an SdkFunctionSpec > -- > > Key: BEAM-3204 > URL: https://issues.apache.org/jira/browse/BEAM-3204 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Kenneth Knowles >Priority: Major > Labels: portability, triaged > > We added environments to coders to account for "custom" coders where it is > only really possible for one SDK to understand them, like this: > {code} > Coder { > spec: SdkFunctionSpec { > environment: "java_sdk_docker_container", > spec: FunctionSpec { > urn: "beam:coder:java_custom_coder", > payload: > } > } > } > {code} > But a coder must be understood by both the producer of a PCollection and its > consumers. A coder is not the same as other UDF, though these are > user-defined. > A pipeline where either the producer or consumer cannot handle the coder is > invalid, and we will have to build our cross-language APIs to prevent > construction of such a pipeline. So we can drop the environment. > I think there are some folks who want to reserve the ability to add an > environment later, perhaps, to not pain ourselves into a corner. In this > case, we can just add a field to Coder. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4497) Add pages for master Javadocs / Pydocs and incorporate into post-commit job
[ https://issues.apache.org/jira/browse/BEAM-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-4497: -- Issue Type: New Feature (was: Sub-task) Parent: (was: BEAM-5671) > Add pages for master Javadocs / Pydocs and incorporate into post-commit job > --- > > Key: BEAM-4497 > URL: https://issues.apache.org/jira/browse/BEAM-4497 > Project: Beam > Issue Type: New Feature > Components: website >Reporter: Scott Wegner >Priority: Major > Labels: beam-site-automation-reliability, triage > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3743) Support for SDF splitting protocol in ULR
[ https://issues.apache.org/jira/browse/BEAM-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790138#comment-16790138 ] Kenneth Knowles commented on BEAM-3743: --- [~robertwb] is this supported in the Python ULR? > Support for SDF splitting protocol in ULR > - > > Key: BEAM-3743 > URL: https://issues.apache.org/jira/browse/BEAM-3743 > Project: Beam > Issue Type: Sub-task > Components: runner-core, runner-direct >Reporter: Eugene Kirpichov >Priority: Major > Labels: portability, triaged > Fix For: 2.6.0 > > > If I understand correctly what ULR does and where it currently stands - this > is the task for a reference implementation of the runner side of things from > https://s.apache.org/beam-breaking-fusion -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-3223) PTransform spec should not reuse FunctionSpec
[ https://issues.apache.org/jira/browse/BEAM-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-3223: -- Labels: portability triaged (was: portability) > PTransform spec should not reuse FunctionSpec > - > > Key: BEAM-3223 > URL: https://issues.apache.org/jira/browse/BEAM-3223 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Henning Rohde >Priority: Major > Labels: portability, triaged > > We should add a new type instead, TransformSpec, say, or just inline a URN > and payload. It's confusing otherwise. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3279) Deprecate and remove Coder consistentWithEquals in favor of overriding structuredValue
[ https://issues.apache.org/jira/browse/BEAM-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790142#comment-16790142 ] Kenneth Knowles commented on BEAM-3279: --- [~AlexKbit] great! I've added you to the Contributors permission so you can be assigned issues. > Deprecate and remove Coder consistentWithEquals in favor of overriding > structuredValue > -- > > Key: BEAM-3279 > URL: https://issues.apache.org/jira/browse/BEAM-3279 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Kenneth Knowles >Priority: Minor > Labels: starter > > Summary of discussion linked below: > consistentWithEquals() is redundant w.r.t. structuralValue(), and should be > deprecated. I think our mutation detectors are already using > structuralValue(), so the work here would be to simply mark the method > deprecated, remove all remaining overrides in the SDK, and document that > overriding the method is a no-op. > https://lists.apache.org/thread.html/8b2dcf09ba8e46b3c008293d99e4028d10463148b68326687dc29a4d@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6794) [beam_PostCommit_Java_PortabilityApi][testBigQueryStorageRead1G] coder failure
[ https://issues.apache.org/jira/browse/BEAM-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-6794: -- Issue Type: Bug (was: New Feature) > [beam_PostCommit_Java_PortabilityApi][testBigQueryStorageRead1G] coder failure > -- > > Key: BEAM-6794 > URL: https://issues.apache.org/jira/browse/BEAM-6794 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Kenneth Jung >Priority: Major > Labels: currently-failing > Fix For: Not applicable > > Time Spent: 1h 10m > Remaining Estimate: 0h > > First failure > [https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/1178/] > > Culprit PR: > https://github.com/apache/beam/pull/7967 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4497) Add pages for master Javadocs / Pydocs and incorporate into post-commit job
[ https://issues.apache.org/jira/browse/BEAM-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-4497: -- Labels: beam-site-automation-reliability triaged (was: beam-site-automation-reliability triage) > Add pages for master Javadocs / Pydocs and incorporate into post-commit job > --- > > Key: BEAM-4497 > URL: https://issues.apache.org/jira/browse/BEAM-4497 > Project: Beam > Issue Type: New Feature > Components: website >Reporter: Scott Wegner >Priority: Major > Labels: beam-site-automation-reliability, triaged > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5525) Intuitive default behavior for sdk_location pipeline option
[ https://issues.apache.org/jira/browse/BEAM-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-5525: -- Labels: portability triaged (was: portability) > Intuitive default behavior for sdk_location pipeline option > --- > > Key: BEAM-5525 > URL: https://issues.apache.org/jira/browse/BEAM-5525 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Affects Versions: 2.7.0 >Reporter: Thomas Weise >Priority: Major > Labels: portability, triaged > > The current default value of "default" implies a Dataflow specific behavior > of the artifact stager. The same stager is also used by the portable runner, > which has to specify a value "container", which actually means to not stage > the SDK. That should be the default behavior and the default value for the > sdk_location should be None. The Dataflow runner can then specify a value > such as "pypi" which conveys more closely the expected behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5397) Flink portable runner GRPC cleanup failure after user class loader was removed
[ https://issues.apache.org/jira/browse/BEAM-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-5397: -- Labels: portability triaged (was: portability) > Flink portable runner GRPC cleanup failure after user class loader was removed > -- > > Key: BEAM-5397 > URL: https://issues.apache.org/jira/browse/BEAM-5397 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Affects Versions: 2.8.0 >Reporter: Thomas Weise >Priority: Major > Labels: portability, triaged > > Looks like another attempt to perform cleanup after close. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4497) Add pages for master Javadocs / Pydocs and incorporate into post-commit job
[ https://issues.apache.org/jira/browse/BEAM-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-4497: -- Labels: beam-site-automation-reliability triage (was: beam-site-automation-reliability) > Add pages for master Javadocs / Pydocs and incorporate into post-commit job > --- > > Key: BEAM-4497 > URL: https://issues.apache.org/jira/browse/BEAM-4497 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Scott Wegner >Priority: Major > Labels: beam-site-automation-reliability, triage > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5397) Flink portable runner GRPC cleanup failure after user class loader was removed
[ https://issues.apache.org/jira/browse/BEAM-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790140#comment-16790140 ] Kenneth Knowles commented on BEAM-5397: --- How about in 2.10.0 or 2.11.0? > Flink portable runner GRPC cleanup failure after user class loader was removed > -- > > Key: BEAM-5397 > URL: https://issues.apache.org/jira/browse/BEAM-5397 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Affects Versions: 2.8.0 >Reporter: Thomas Weise >Priority: Major > Labels: portability, triaged > > Looks like another attempt to perform cleanup after close. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-3743) Support for SDF splitting protocol in ULR
[ https://issues.apache.org/jira/browse/BEAM-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-3743: -- Labels: portability triaged (was: portability) > Support for SDF splitting protocol in ULR > - > > Key: BEAM-3743 > URL: https://issues.apache.org/jira/browse/BEAM-3743 > Project: Beam > Issue Type: Sub-task > Components: runner-core, runner-direct >Reporter: Eugene Kirpichov >Priority: Major > Labels: portability, triaged > Fix For: 2.6.0 > > > If I understand correctly what ULR does and where it currently stands - this > is the task for a reference implementation of the runner side of things from > https://s.apache.org/beam-breaking-fusion -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6794) [beam_PostCommit_Java_PortabilityApi][testBigQueryStorageRead1G] coder failure
[ https://issues.apache.org/jira/browse/BEAM-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-6794: -- Priority: Critical (was: Major) > [beam_PostCommit_Java_PortabilityApi][testBigQueryStorageRead1G] coder failure > -- > > Key: BEAM-6794 > URL: https://issues.apache.org/jira/browse/BEAM-6794 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Kenneth Jung >Priority: Critical > Labels: currently-failing > Fix For: Not applicable > > Time Spent: 1h 10m > Remaining Estimate: 0h > > First failure > [https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/1178/] > > Culprit PR: > https://github.com/apache/beam/pull/7967 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6794) [beam_PostCommit_Java_PortabilityApi][testBigQueryStorageRead1G] coder failure
[ https://issues.apache.org/jira/browse/BEAM-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-6794: -- Labels: currently-failing triaged (was: currently-failing) > [beam_PostCommit_Java_PortabilityApi][testBigQueryStorageRead1G] coder failure > -- > > Key: BEAM-6794 > URL: https://issues.apache.org/jira/browse/BEAM-6794 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Kenneth Jung >Priority: Critical > Labels: currently-failing, triaged > Fix For: Not applicable > > Time Spent: 1h 10m > Remaining Estimate: 0h > > First failure > [https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/1178/] > > Culprit PR: > https://github.com/apache/beam/pull/7967 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6794) [beam_PostCommit_Java_PortabilityApi][testBigQueryStorageRead1G] coder failure
[ https://issues.apache.org/jira/browse/BEAM-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790137#comment-16790137 ] Kenneth Knowles commented on BEAM-6794: --- Can this be closed? > [beam_PostCommit_Java_PortabilityApi][testBigQueryStorageRead1G] coder failure > -- > > Key: BEAM-6794 > URL: https://issues.apache.org/jira/browse/BEAM-6794 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Kenneth Jung >Priority: Critical > Labels: currently-failing, triaged > Fix For: Not applicable > > Time Spent: 1h 10m > Remaining Estimate: 0h > > First failure > [https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/1178/] > > Culprit PR: > https://github.com/apache/beam/pull/7967 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6147) Python process environment factory
[ https://issues.apache.org/jira/browse/BEAM-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-6147: -- Labels: portability portability-flink triaged (was: portability portability-flink) > Python process environment factory > -- > > Key: BEAM-6147 > URL: https://issues.apache.org/jira/browse/BEAM-6147 > Project: Beam > Issue Type: Task > Components: runner-flink, sdk-py-harness >Affects Versions: 2.9.0 >Reporter: Thomas Weise >Priority: Major > Labels: portability, portability-flink, triaged > > Provide an easy to use process environment factory that allows for Python > worker execution as Docker alternative. Note that we have a base that the > user can configure and an attempt to utilize it for the Python Flink post > commit test. However, that setup is specific to the Jenkins environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6327) Don't attempt to fuse subtransforms of primitive/known transforms.
[ https://issues.apache.org/jira/browse/BEAM-6327?focusedWorklogId=211423=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211423 ] ASF GitHub Bot logged work on BEAM-6327: Author: ASF GitHub Bot Created on: 12/Mar/19 00:48 Start Date: 12/Mar/19 00:48 Worklog Time Spent: 10m Work Description: ibzib commented on issue #8011: [BEAM-6327] move pipeline trimming logic from Flink runner to core co… URL: https://github.com/apache/beam/pull/8011#issuecomment-471800050 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211423) Time Spent: 1h 50m (was: 1h 40m) > Don't attempt to fuse subtransforms of primitive/known transforms. > -- > > Key: BEAM-6327 > URL: https://issues.apache.org/jira/browse/BEAM-6327 > Project: Beam > Issue Type: New Feature > Components: runner-direct >Reporter: Robert Bradshaw >Assignee: Kyle Weaver >Priority: Major > Labels: triaged > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently we must remove all sub-components of any known transform that may > have an optional substructure, e.g. > [https://github.com/apache/beam/blob/release-2.9.0/sdks/python/apache_beam/runners/portability/portable_runner.py#L126] > (for GBK) and [https://github.com/apache/beam/pull/7360] (Reshuffle). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6527) Parallel tox (unit) tests run on Jenkins
[ https://issues.apache.org/jira/browse/BEAM-6527?focusedWorklogId=211415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211415 ] ASF GitHub Bot logged work on BEAM-6527: Author: ASF GitHub Bot Created on: 12/Mar/19 00:35 Start Date: 12/Mar/19 00:35 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #7675: [BEAM-6527] Use Gradle to parallel Python tox tests URL: https://github.com/apache/beam/pull/7675#issuecomment-471797048 PTAL @tvalentyn This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211415) Time Spent: 3h 20m (was: 3h 10m) > Parallel tox (unit) tests run on Jenkins > > > Key: BEAM-6527 > URL: https://issues.apache.org/jira/browse/BEAM-6527 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Labels: triaged > Time Spent: 3h 20m > Remaining Estimate: 0h > > Existing tox unit test suite (basic, gcp and cython) will be enabled in > multiple version of Python 3, which will significantly increase runtime of > Pre/PostCommit build. A parallel is wanted in tox or Gradle invocation to > control the time in a reasonable range (<30mins for PreCommit is desired). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6527) Parallel tox (unit) tests run on Jenkins
[ https://issues.apache.org/jira/browse/BEAM-6527?focusedWorklogId=211386=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211386 ] ASF GitHub Bot logged work on BEAM-6527: Author: ASF GitHub Bot Created on: 11/Mar/19 23:51 Start Date: 11/Mar/19 23:51 Worklog Time Spent: 10m Work Description: markflyhigh commented on pull request #7675: [BEAM-6527] Use Gradle to parallel Python tox tests URL: https://github.com/apache/beam/pull/7675#discussion_r264474846 ## File path: sdks/python/test-suites/tox/py3/build.gradle ## @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Unit tests for Python 3 + */ + +apply plugin: org.apache.beam.gradle.BeamModulePlugin Review comment: thanks! done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211386) Time Spent: 3h 10m (was: 3h) > Parallel tox (unit) tests run on Jenkins > > > Key: BEAM-6527 > URL: https://issues.apache.org/jira/browse/BEAM-6527 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Labels: triaged > Time Spent: 3h 10m > Remaining Estimate: 0h > > Existing tox unit test suite (basic, gcp and cython) will be enabled in > multiple version of Python 3, which will significantly increase runtime of > Pre/PostCommit build. A parallel is wanted in tox or Gradle invocation to > control the time in a reasonable range (<30mins for PreCommit is desired). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6771) Spark Runner Fails on Certain Versions of Spark 2.X
[ https://issues.apache.org/jira/browse/BEAM-6771?focusedWorklogId=211384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211384 ] ASF GitHub Bot logged work on BEAM-6771: Author: ASF GitHub Bot Created on: 11/Mar/19 23:45 Start Date: 11/Mar/19 23:45 Worklog Time Spent: 10m Work Description: kyle-winkelman commented on issue #8032: [BEAM-6771] MetricsContainerStepMap#equals required for Spark. URL: https://github.com/apache/beam/pull/8032#issuecomment-471785643 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211384) Time Spent: 40m (was: 0.5h) > Spark Runner Fails on Certain Versions of Spark 2.X > --- > > Key: BEAM-6771 > URL: https://issues.apache.org/jira/browse/BEAM-6771 > Project: Beam > Issue Type: Bug > Components: runner-spark >Affects Versions: 2.11.0 >Reporter: Kyle Winkelman >Priority: Blocker > Time Spent: 40m > Remaining Estimate: 0h > > When updating to Beam 2.11.0, I ran into the exception at the bottom of this > issue while running a pipeline on the Spark Runner (which worked in 2.9.0). > My cluster uses Spark 2.2.1. > Related Issues: > SPARK-23697 (Proof that equals must be implemented for items being > accumulated.) > BEAM-1920 (In PR#3808, equals was implemented on MetricsContainerStepMap to > get Spark to run on 2.X.) > My analysis has lead me to believe that BEAM-6138 is the reason for this > issue. > Before this change, versions of Spark that are affected by SPARK-23697 would > create a new MetricsContainerStepMap and make sure that the copied and reset > instance (the one serialized for distribution) is equal to the initial empty > MetricsContainerStepMap that is passed in. This would effectively check if > two empty ConcurrentHashMaps were equal. This results in true. > After this change, versions of Spark that are affected by SPARK-23697 would > effectively be checking if two empty ConcurrentHashMaps were equal _*AND*_ if > two different instances of the MetricsContainerImpl are equal. Because > MetricsContainerImpl doesn't implement equals, this results in false. > I believe BEAM-6546 will fix this issue, but I wanted to raise a red flag. I > am also hoping someone can verify my analysis. > {noformat} > ERROR ApplicationMaster: User class threw exception: > java.lang.RuntimeException: java.lang.AssertionError: assertion failed: > copyAndReset must return a zero value copy > java.lang.RuntimeException: java.lang.AssertionError: assertion failed: > copyAndReset must return a zero value copy > at > org.apache.beam.runners.spark.SparkPipelineResult.runtimeExceptionFrom(SparkPipelineResult.java:54) > at > org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom(SparkPipelineResult.java:71) > at > org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:98) > at com.optum.analyticstore.execution.Exec.run(Exec.java:276) > at com.optum.analyticstore.execution.Exec.main(Exec.java:364) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637) > Caused by: java.lang.AssertionError: assertion failed: copyAndReset must > return a zero value copy > at scala.Predef$.assert(Predef.scala:170) > at > org.apache.spark.util.AccumulatorV2.writeReplace(AccumulatorV2.scala:163) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1218) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1136) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at >
[jira] [Work logged] (BEAM-6703) Support Java 11 in Jenkins
[ https://issues.apache.org/jira/browse/BEAM-6703?focusedWorklogId=211375=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211375 ] ASF GitHub Bot logged work on BEAM-6703: Author: ASF GitHub Bot Created on: 11/Mar/19 23:11 Start Date: 11/Mar/19 23:11 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8010: [BEAM-6703] Added a phrase-triggered Jenkins job to test a Direct runner with Java 11 runtime URL: https://github.com/apache/beam/pull/8010#issuecomment-471776988 exciting : D This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211375) Time Spent: 6.5h (was: 6h 20m) > Support Java 11 in Jenkins > -- > > Key: BEAM-6703 > URL: https://issues.apache.org/jira/browse/BEAM-6703 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow, runner-direct >Reporter: Michal Walenia >Assignee: Michal Walenia >Priority: Minor > Time Spent: 6.5h > Remaining Estimate: 0h > > In this issue I'll create a Jenkins job that compiles Dataflow and Direct > runners with tests using Java 8 and runs Validates Runner suites with Java 11 > Runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5985) Create jenkins jobs to run the load tests for Java SDK
[ https://issues.apache.org/jira/browse/BEAM-5985?focusedWorklogId=211374=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211374 ] ASF GitHub Bot logged work on BEAM-5985: Author: ASF GitHub Bot Created on: 11/Mar/19 23:09 Start Date: 11/Mar/19 23:09 Worklog Time Spent: 10m Work Description: pabloem commented on issue #7903: [BEAM-5985] Dataflow batch load test jobs URL: https://github.com/apache/beam/pull/7903#issuecomment-471776612 Ok this LGTM. Feel free to self-merge: ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211374) Time Spent: 19h 50m (was: 19h 40m) > Create jenkins jobs to run the load tests for Java SDK > -- > > Key: BEAM-5985 > URL: https://issues.apache.org/jira/browse/BEAM-5985 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Lukasz Gajowy >Assignee: Kasia Kucharczyk >Priority: Major > Time Spent: 19h 50m > Remaining Estimate: 0h > > How/how often/in what cases we run those tests is yet to be decided (this is > part of the task) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5985) Create jenkins jobs to run the load tests for Java SDK
[ https://issues.apache.org/jira/browse/BEAM-5985?focusedWorklogId=211373=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211373 ] ASF GitHub Bot logged work on BEAM-5985: Author: ASF GitHub Bot Created on: 11/Mar/19 23:09 Start Date: 11/Mar/19 23:09 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #7903: [BEAM-5985] Dataflow batch load test jobs URL: https://github.com/apache/beam/pull/7903#discussion_r264466167 ## File path: .test-infra/jenkins/job_LoadTests_Java.groovy ## @@ -17,123 +17,215 @@ */ import CommonJobProperties as commonJobProperties +import CommonTestProperties import LoadTestsBuilder as loadTestsBuilder import PhraseTriggeringPostCommitBuilder +import CronJobBuilder def loadTestConfigurations = [ [ -jobName : 'beam_Java_LoadTests_GroupByKey_Dataflow_Small', -jobDescription: 'Runs GroupByKey load tests on Dataflow runner small records 10b', -itClass : 'org.apache.beam.sdk.loadtests.GroupByKeyLoadTest', -prCommitStatusName: 'Java GroupByKey Small Load Test Dataflow', -prTriggerPhrase : 'Run GroupByKey Small Java Load Test Dataflow', -runner: CommonTestProperties.Runner.DATAFLOW, -sdk : CommonTestProperties.SDK.JAVA, -jobProperties : [ +title: 'Load test: 2GB of 10B records', +itClass : 'org.apache.beam.sdk.loadtests.GroupByKeyLoadTest', +runner : CommonTestProperties.Runner.DATAFLOW, +jobProperties: [ project : 'apache-beam-testing', +appName : 'load_tests_Java_Dataflow_Batch_GBK_1', tempLocation: 'gs://temp-storage-for-perf-tests/loadtests', publishToBigQuery : true, -bigQueryDataset : 'load_test_PRs', -bigQueryTable : 'dataflow_gbk_small', -sourceOptions : '{"numRecords":10,"splitPointFrequencyRecords":1,"keySizeBytes":1,"valueSizeBytes":9,"numHotKeys":0,"hotKeyFraction":0,"seed":123456,"bundleSizeDistribution":{"type":"const","const":42},"forceNumInitialBundles":100,"progressShape":"LINEAR","initializeDelayDistribution":{"type":"const","const":42}}', -stepOptions : '{"outputRecordsPerInputRecord":1,"preservesInputKeyDistribution":true,"perBundleDelay":1,"perBundleDelayType":"MIXED","cpuUtilizationInMixedDelay":0.5}', -fanout : 10, +bigQueryDataset : 'load_test', +bigQueryTable : 'java_dataflow_batch_GBK_1', +sourceOptions : """ +{ + "numRecords": 2, + "keySizeBytes": 1, + "valueSizeBytes": 9 +} + """.trim().replaceAll("\\s", ""), +fanout : 1, iterations : 1, -maxNumWorkers : 32, +maxNumWorkers : 5, +numWorkers : 5, +autoscalingAlgorithm: "NONE" ] - ], -] +[ +title: 'Load test: 2GB of 100B records', +itClass : 'org.apache.beam.sdk.loadtests.GroupByKeyLoadTest', +runner : CommonTestProperties.Runner.DATAFLOW, +jobProperties: [ +project : 'apache-beam-testing', +appName : 'load_tests_Java_Dataflow_Batch_GBK_2', +tempLocation: 'gs://temp-storage-for-perf-tests/loadtests', +publishToBigQuery : true, +bigQueryDataset : 'load_test', +bigQueryTable : 'java_dataflow_batch_GBK_2', +sourceOptions : """ +{ + "numRecords": 2000, + "keySizeBytes": 10, + "valueSizeBytes": 90 +} + """.trim().replaceAll("\\s", ""), +fanout : 1, +iterations : 1, +
[jira] [Work logged] (BEAM-6703) Support Java 11 in Jenkins
[ https://issues.apache.org/jira/browse/BEAM-6703?focusedWorklogId=211372=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211372 ] ASF GitHub Bot logged work on BEAM-6703: Author: ASF GitHub Bot Created on: 11/Mar/19 23:07 Start Date: 11/Mar/19 23:07 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8010: [BEAM-6703] Added a phrase-triggered Jenkins job to test a Direct runner with Java 11 runtime URL: https://github.com/apache/beam/pull/8010#issuecomment-471776042 Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211372) Time Spent: 6h 20m (was: 6h 10m) > Support Java 11 in Jenkins > -- > > Key: BEAM-6703 > URL: https://issues.apache.org/jira/browse/BEAM-6703 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow, runner-direct >Reporter: Michal Walenia >Assignee: Michal Walenia >Priority: Minor > Time Spent: 6h 20m > Remaining Estimate: 0h > > In this issue I'll create a Jenkins job that compiles Dataflow and Direct > runners with tests using Java 8 and runs Validates Runner suites with Java 11 > Runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6703) Support Java 11 in Jenkins
[ https://issues.apache.org/jira/browse/BEAM-6703?focusedWorklogId=211371=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211371 ] ASF GitHub Bot logged work on BEAM-6703: Author: ASF GitHub Bot Created on: 11/Mar/19 23:07 Start Date: 11/Mar/19 23:07 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #8010: [BEAM-6703] Added a phrase-triggered Jenkins job to test a Direct runner with Java 11 runtime URL: https://github.com/apache/beam/pull/8010 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211371) Time Spent: 6h 10m (was: 6h) > Support Java 11 in Jenkins > -- > > Key: BEAM-6703 > URL: https://issues.apache.org/jira/browse/BEAM-6703 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow, runner-direct >Reporter: Michal Walenia >Assignee: Michal Walenia >Priority: Minor > Time Spent: 6h 10m > Remaining Estimate: 0h > > In this issue I'll create a Jenkins job that compiles Dataflow and Direct > runners with tests using Java 8 and runs Validates Runner suites with Java 11 > Runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6771) Spark Runner Fails on Certain Versions of Spark 2.X
[ https://issues.apache.org/jira/browse/BEAM-6771?focusedWorklogId=211370=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211370 ] ASF GitHub Bot logged work on BEAM-6771: Author: ASF GitHub Bot Created on: 11/Mar/19 23:04 Start Date: 11/Mar/19 23:04 Worklog Time Spent: 10m Work Description: iemejia commented on issue #8032: [BEAM-6771] MetricsContainerStepMap#equals required for Spark. URL: https://github.com/apache/beam/pull/8032#issuecomment-471775206 R: @ajamato This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211370) Time Spent: 0.5h (was: 20m) > Spark Runner Fails on Certain Versions of Spark 2.X > --- > > Key: BEAM-6771 > URL: https://issues.apache.org/jira/browse/BEAM-6771 > Project: Beam > Issue Type: Bug > Components: runner-spark >Affects Versions: 2.11.0 >Reporter: Kyle Winkelman >Priority: Blocker > Time Spent: 0.5h > Remaining Estimate: 0h > > When updating to Beam 2.11.0, I ran into the exception at the bottom of this > issue while running a pipeline on the Spark Runner (which worked in 2.9.0). > My cluster uses Spark 2.2.1. > Related Issues: > SPARK-23697 (Proof that equals must be implemented for items being > accumulated.) > BEAM-1920 (In PR#3808, equals was implemented on MetricsContainerStepMap to > get Spark to run on 2.X.) > My analysis has lead me to believe that BEAM-6138 is the reason for this > issue. > Before this change, versions of Spark that are affected by SPARK-23697 would > create a new MetricsContainerStepMap and make sure that the copied and reset > instance (the one serialized for distribution) is equal to the initial empty > MetricsContainerStepMap that is passed in. This would effectively check if > two empty ConcurrentHashMaps were equal. This results in true. > After this change, versions of Spark that are affected by SPARK-23697 would > effectively be checking if two empty ConcurrentHashMaps were equal _*AND*_ if > two different instances of the MetricsContainerImpl are equal. Because > MetricsContainerImpl doesn't implement equals, this results in false. > I believe BEAM-6546 will fix this issue, but I wanted to raise a red flag. I > am also hoping someone can verify my analysis. > {noformat} > ERROR ApplicationMaster: User class threw exception: > java.lang.RuntimeException: java.lang.AssertionError: assertion failed: > copyAndReset must return a zero value copy > java.lang.RuntimeException: java.lang.AssertionError: assertion failed: > copyAndReset must return a zero value copy > at > org.apache.beam.runners.spark.SparkPipelineResult.runtimeExceptionFrom(SparkPipelineResult.java:54) > at > org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom(SparkPipelineResult.java:71) > at > org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:98) > at com.optum.analyticstore.execution.Exec.run(Exec.java:276) > at com.optum.analyticstore.execution.Exec.main(Exec.java:364) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637) > Caused by: java.lang.AssertionError: assertion failed: copyAndReset must > return a zero value copy > at scala.Predef$.assert(Predef.scala:170) > at > org.apache.spark.util.AccumulatorV2.writeReplace(AccumulatorV2.scala:163) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1218) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1136) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at >
[jira] [Work logged] (BEAM-6754) Support multi core machines for python pipeline on flink for loopback environment
[ https://issues.apache.org/jira/browse/BEAM-6754?focusedWorklogId=211363=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211363 ] ASF GitHub Bot logged work on BEAM-6754: Author: ASF GitHub Bot Created on: 11/Mar/19 22:55 Start Date: 11/Mar/19 22:55 Worklog Time Spent: 10m Work Description: angoenka commented on issue #7984: [BEAM-6754] Use subprocess instead of threads in loopback environment URL: https://github.com/apache/beam/pull/7984#issuecomment-471772836 Sounds good, updated the default to use thread. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211363) Time Spent: 3h (was: 2h 50m) > Support multi core machines for python pipeline on flink for loopback > environment > - > > Key: BEAM-6754 > URL: https://issues.apache.org/jira/browse/BEAM-6754 > Project: Beam > Issue Type: Task > Components: runner-core, runner-flink >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > Loopbck worker is shared across multiple taskmanagers on a single machine. We > should support starting multiple process in loopback worker based on number > of cores. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6777) SDK Harness Resilience
[ https://issues.apache.org/jira/browse/BEAM-6777?focusedWorklogId=211355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211355 ] ASF GitHub Bot logged work on BEAM-6777: Author: ASF GitHub Bot Created on: 11/Mar/19 22:29 Start Date: 11/Mar/19 22:29 Worklog Time Spent: 10m Work Description: aaltay commented on issue #8012: [BEAM-6777] Add HealthDaemon and tests URL: https://github.com/apache/beam/pull/8012#issuecomment-471765270 Is the expectation that this will ping an endpoint hosted by dataflow service or the runner harness? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211355) Time Spent: 1h 40m (was: 1.5h) > SDK Harness Resilience > -- > > Key: BEAM-6777 > URL: https://issues.apache.org/jira/browse/BEAM-6777 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > If the Python SDK Harness crashes in any way (user code exception, OOM, etc) > the job will hang and waste resources. The fix is to add a daemon in the SDK > Harness and Runner Harness to communicate with Dataflow to restart the VM > when stuckness is detected. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4265) Add a dead letter queue to Python streaming BigQuery sink
[ https://issues.apache.org/jira/browse/BEAM-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790025#comment-16790025 ] Pablo Estrada commented on BEAM-4265: - I added this in Pr https://github.com/apache/beam/pull/7677 > Add a dead letter queue to Python streaming BigQuery sink > - > > Key: BEAM-4265 > URL: https://issues.apache.org/jira/browse/BEAM-4265 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Jayalath >Priority: Major > > When writing to BigQuery using streaming writes, Java SDK supports writing > failed records to a dead letter queue: > [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1375] > > This is a very useful feature for long running pipelines so we should add > this to Python BQ sink: > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1279 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-4265) Add a dead letter queue to Python streaming BigQuery sink
[ https://issues.apache.org/jira/browse/BEAM-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada resolved BEAM-4265. - Resolution: Fixed Fix Version/s: 2.12.0 > Add a dead letter queue to Python streaming BigQuery sink > - > > Key: BEAM-4265 > URL: https://issues.apache.org/jira/browse/BEAM-4265 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Jayalath >Priority: Major > Fix For: 2.12.0 > > > When writing to BigQuery using streaming writes, Java SDK supports writing > failed records to a dead letter queue: > [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1375] > > This is a very useful feature for long running pipelines so we should add > this to Python BQ sink: > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1279 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6808) Use gav or something equivalent in announcement for dependency uogrades
Romain Manni-Bucau created BEAM-6808: Summary: Use gav or something equivalent in announcement for dependency uogrades Key: BEAM-6808 URL: https://issues.apache.org/jira/browse/BEAM-6808 Project: Beam Issue Type: Improvement Components: build-system Affects Versions: 2.11.0 Reporter: Romain Manni-Bucau Annoucement/changelog uses gradle variables which is not very user friendly since it is beam internals. Would be great to move to actual gav. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6443) decrease the number of threads for BigQuery streaming insertAll
[ https://issues.apache.org/jira/browse/BEAM-6443?focusedWorklogId=211352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211352 ] ASF GitHub Bot logged work on BEAM-6443: Author: ASF GitHub Bot Created on: 11/Mar/19 22:24 Start Date: 11/Mar/19 22:24 Worklog Time Spent: 10m Work Description: ihji commented on issue #7547: [BEAM-6443] decrease the number of thread for BigQuery streaming inse… URL: https://github.com/apache/beam/pull/7547#issuecomment-471763651 > Can you describe how this PR has been tested at scale? I created UnboundedSource that generates very small (9 bytes) and maximum (1MB streaming insert row size limit) sized elements and ran a BigQuery inserting pipeline on DataflowRunner with multiple threadpool configurations (unlimited, single, 1 semaphored, 3 semaphored). Running time was about 20 minutes each. You can find exact numbers in a benchmark note: https://docs.google.com/document/d/1EhRNWLevm86GD_QtvlrTauHITVMwQBzuemyp-w4Z_ck/edit#heading=h.c0angyd9tn21 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211352) Time Spent: 4h 50m (was: 4h 40m) > decrease the number of threads for BigQuery streaming insertAll > --- > > Key: BEAM-6443 > URL: https://issues.apache.org/jira/browse/BEAM-6443 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Labels: triaged > Time Spent: 4h 50m > Remaining Estimate: 0h > > When inserting (a large number of ) very small elements into BigQuery via > streaming insertAll, BigQueryIO causes lots of quota exceeded errors. This > implies that 1) BigQueryIO puts unnecessary overheads on BigQuery API layer > by sending requests too fast 2) log file becomes very big because of repeated > same error messages. Currently we use 50 shards for writing data into > BigQuery and in each bundle 20-30 futures are executed simultaneously with > unlimited thread pool. It would be worth investigating whether just single > thread pool is sufficient for running concurrent insertAll. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6771) Spark Runner Fails on Certain Versions of Spark 2.X
[ https://issues.apache.org/jira/browse/BEAM-6771?focusedWorklogId=211347=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211347 ] ASF GitHub Bot logged work on BEAM-6771: Author: ASF GitHub Bot Created on: 11/Mar/19 22:17 Start Date: 11/Mar/19 22:17 Worklog Time Spent: 10m Work Description: kyle-winkelman commented on pull request #8032: [BEAM-6771] MetricsContainerStepMap#equals required for Spark. URL: https://github.com/apache/beam/pull/8032 Please see the [jira](https://issues.apache.org/jira/browse/BEAM-6771) for information. I have tested this with a local build of release-2.11.0 branch and my pipeline now succeeds on Spark 2.2.1. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/) | --- | --- | --- See
[jira] [Work logged] (BEAM-6443) decrease the number of threads for BigQuery streaming insertAll
[ https://issues.apache.org/jira/browse/BEAM-6443?focusedWorklogId=211339=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211339 ] ASF GitHub Bot logged work on BEAM-6443: Author: ASF GitHub Bot Created on: 11/Mar/19 22:02 Start Date: 11/Mar/19 22:02 Worklog Time Spent: 10m Work Description: ihji commented on pull request #7547: [BEAM-6443] decrease the number of thread for BigQuery streaming inse… URL: https://github.com/apache/beam/pull/7547#discussion_r264449198 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java ## @@ -1001,4 +1007,141 @@ public void close() { client.close(); } } + + private static class BoundedExecutorService implements ExecutorService { +private final ExecutorService executor; +private final Semaphore semaphore; +private final int parallelism; + +BoundedExecutorService(ExecutorService executor, int parallelism) { + this.executor = executor; + this.parallelism = parallelism; + this.semaphore = new Semaphore(parallelism); +} + +@Override +public void shutdown() { + executor.shutdown(); +} + +@Override +public List shutdownNow() { + List runnables = executor.shutdownNow(); + // try to release permits as many as possible before returning semaphored runnables. + synchronized (this) { +if (semaphore.availablePermits() <= parallelism) { + semaphore.release(Integer.MAX_VALUE - parallelism); Review comment: I think we don't have to pair acquire() and release(). Excerpted from release() API doc: > There is no requirement that a thread that releases a permit must have acquired that permit by calling acquire(). > https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Semaphore.html#release-- The possible edge case would be that if we put the total number of permits more than Integer.MAX_VALUE by calling release() then it throws an exception. By checking availablePermits() before release() in synchronized section we can avoid those cases. Other option here is we can just return semaphored callables as is and document it clearly in a comment. I believe that this `BoundedExecutorService` class will hardly be reused anyway. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211339) Time Spent: 4h 40m (was: 4.5h) > decrease the number of threads for BigQuery streaming insertAll > --- > > Key: BEAM-6443 > URL: https://issues.apache.org/jira/browse/BEAM-6443 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Labels: triaged > Time Spent: 4h 40m > Remaining Estimate: 0h > > When inserting (a large number of ) very small elements into BigQuery via > streaming insertAll, BigQueryIO causes lots of quota exceeded errors. This > implies that 1) BigQueryIO puts unnecessary overheads on BigQuery API layer > by sending requests too fast 2) log file becomes very big because of repeated > same error messages. Currently we use 50 shards for writing data into > BigQuery and in each bundle 20-30 futures are executed simultaneously with > unlimited thread pool. It would be worth investigating whether just single > thread pool is sufficient for running concurrent insertAll. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6777) SDK Harness Resilience
[ https://issues.apache.org/jira/browse/BEAM-6777?focusedWorklogId=211335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211335 ] ASF GitHub Bot logged work on BEAM-6777: Author: ASF GitHub Bot Created on: 11/Mar/19 21:59 Start Date: 11/Mar/19 21:59 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8012: [BEAM-6777] Add HealthDaemon and tests URL: https://github.com/apache/beam/pull/8012#issuecomment-471753341 This looks good. Can you please squash the commits into one? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211335) Time Spent: 1.5h (was: 1h 20m) > SDK Harness Resilience > -- > > Key: BEAM-6777 > URL: https://issues.apache.org/jira/browse/BEAM-6777 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > If the Python SDK Harness crashes in any way (user code exception, OOM, etc) > the job will hang and waste resources. The fix is to add a daemon in the SDK > Harness and Runner Harness to communicate with Dataflow to restart the VM > when stuckness is detected. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6298) Can not insert into BigQuery table that is not empty
[ https://issues.apache.org/jira/browse/BEAM-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Mingmin reassigned BEAM-6298: Assignee: (was: Xu Mingmin) > Can not insert into BigQuery table that is not empty > > > Key: BEAM-6298 > URL: https://issues.apache.org/jira/browse/BEAM-6298 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.10.0 >Reporter: Luat Nguyen >Priority: Major > Labels: triaged > > There is a Exception when I try to insert into BigQuery table that is not > empty. > Example code Beam SQL: > {code:java} > BeamSqlRelUtils.toPCollection(pipeline, sqlEnv.parseQuery("INSERT INTO > D_CARD_LITE(DIM_ID) VALUES('')")){code} > The exception messages as below: > {code:java} > java.lang.IllegalStateException: BigQuery table is not empty: > mydataset:samples.D_CARD_LITE. > at com.google.common.base.Preconditions.checkState(Preconditions.java:518) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.verifyTableNotExistOrEmpty(BigQueryHelpers.java:470) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$Write.validate(BigQueryIO.java:1564) > at > org.apache.beam.sdk.Pipeline$ValidateVisitor.enterCompositeTransform(Pipeline.java:641) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:645) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:649) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:649) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:311) > at > org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:245) > at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:458) > at org.apache.beam.sdk.Pipeline.validate(Pipeline.java:577) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:312) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-2478) Distinct Aggregates
[ https://issues.apache.org/jira/browse/BEAM-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Mingmin closed BEAM-2478. Resolution: Won't Do Fix Version/s: Not applicable It's supported by Calcite rules as Julian's comment. > Distinct Aggregates > --- > > Key: BEAM-2478 > URL: https://issues.apache.org/jira/browse/BEAM-2478 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Jingsong Lee >Assignee: Xu Mingmin >Priority: Major > Labels: triaged > Fix For: Not applicable > > > eg: COUNT(DISTINCT empno) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6185) Upgrade to Spark 2.4.0
[ https://issues.apache.org/jira/browse/BEAM-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789997#comment-16789997 ] Ismaël Mejía commented on BEAM-6185: Dataproc is still in 2.3.x but I think the time seems better now at least the majoirity is now in 2.4.x, Can we just wait the (on going vote) release of version Spark 2.4.1 before doing the move. In that moment we will re open JB's PR. WDYT [~aromanenko]? > Upgrade to Spark 2.4.0 > -- > > Key: BEAM-6185 > URL: https://issues.apache.org/jira/browse/BEAM-6185 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré >Priority: Major > Labels: triaged > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6807) Implement an Azure blobstore filesystem for Python SDK
Pablo Estrada created BEAM-6807: --- Summary: Implement an Azure blobstore filesystem for Python SDK Key: BEAM-6807 URL: https://issues.apache.org/jira/browse/BEAM-6807 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: Pablo Estrada Assignee: Pablo Estrada This is similar to BEAM-2572, but for Azure's blobstore. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-5203) expose PaneInfo and BoundedWindow as UDF
[ https://issues.apache.org/jira/browse/BEAM-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Mingmin closed BEAM-5203. Resolution: Won't Do Fix Version/s: Not applicable > expose PaneInfo and BoundedWindow as UDF > > > Key: BEAM-5203 > URL: https://issues.apache.org/jira/browse/BEAM-5203 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Xu Mingmin >Assignee: Xu Mingmin >Priority: Major > Labels: triaged > Fix For: Not applicable > > Time Spent: 2h > Remaining Estimate: 0h > > besides adding new keywords in Calcite, there's an alternative way to expose > PaneInfo and BoundedWindow of Row by UDF. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6527) Parallel tox (unit) tests run on Jenkins
[ https://issues.apache.org/jira/browse/BEAM-6527?focusedWorklogId=211322=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211322 ] ASF GitHub Bot logged work on BEAM-6527: Author: ASF GitHub Bot Created on: 11/Mar/19 21:44 Start Date: 11/Mar/19 21:44 Worklog Time Spent: 10m Work Description: markflyhigh commented on pull request #7675: [BEAM-6527] Use Gradle to parallel Python tox tests URL: https://github.com/apache/beam/pull/7675#discussion_r264443764 ## File path: sdks/python/scripts/run_tox.sh ## @@ -24,9 +24,10 @@ ### # Usage check. -if [[ $# != 1 ]]; then - printf "Usage: \n$> ./scripts/run_tox.sh " +if [[ $# < 1 || $# > 2 ]]; then + printf "Usage: \n$> ./scripts/run_tox.sh " Review comment: sg. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211322) Time Spent: 2.5h (was: 2h 20m) > Parallel tox (unit) tests run on Jenkins > > > Key: BEAM-6527 > URL: https://issues.apache.org/jira/browse/BEAM-6527 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Labels: triaged > Time Spent: 2.5h > Remaining Estimate: 0h > > Existing tox unit test suite (basic, gcp and cython) will be enabled in > multiple version of Python 3, which will significantly increase runtime of > Pre/PostCommit build. A parallel is wanted in tox or Gradle invocation to > control the time in a reasonable range (<30mins for PreCommit is desired). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-5976) use AbstractInstant as DATEITME type in functions
[ https://issues.apache.org/jira/browse/BEAM-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Mingmin closed BEAM-5976. Resolution: Fixed Fix Version/s: Not applicable > use AbstractInstant as DATEITME type in functions > - > > Key: BEAM-5976 > URL: https://issues.apache.org/jira/browse/BEAM-5976 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Xu Mingmin >Assignee: Xu Mingmin >Priority: Minor > Labels: triaged > Fix For: Not applicable > > Time Spent: 1h 10m > Remaining Estimate: 0h > > refer to discussion in > [https://github.com/apache/beam/pull/6913#discussion_r230148526] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6105) Support "partition by XXX order by XXX" SQL
[ https://issues.apache.org/jira/browse/BEAM-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Mingmin reassigned BEAM-6105: Assignee: (was: Xu Mingmin) > Support "partition by XXX order by XXX" SQL > --- > > Key: BEAM-6105 > URL: https://issues.apache.org/jira/browse/BEAM-6105 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Brandon Jiang >Priority: Minor > Labels: triaged > > Based on our expereince, looks like for bounded stream, beam SQL does not > able to support statement like "partition by XXX order by XXX". It will not > be able to parition data to different nodes and sorting data in each > partition/node parallelly. > We have to use Java SDK and extension to convert following SQL statement to > GroupByKey + SortValues to achieve this. > > Does we miss anything? If not, is this something that we can improve? and > took a quick look at calcite, seems that it can explain the query plan for > "partition by... order by..." fine. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6297) There is a NullPointerException when read null-value field in BigQuery table
[ https://issues.apache.org/jira/browse/BEAM-6297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Mingmin reassigned BEAM-6297: Assignee: (was: Xu Mingmin) > There is a NullPointerException when read null-value field in BigQuery table > > > Key: BEAM-6297 > URL: https://issues.apache.org/jira/browse/BEAM-6297 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.10.0 >Reporter: Luat Nguyen >Priority: Major > Labels: triaged > > I run query on a BigQuery table by Beam SQL. > Ex: > {code:java} > BeamSqlRelUtils.toPCollection(pipeline, sqlEnv.parseQuery("SELECT * FROM > X_bigquery_table")); > {code} > There is a NullPointerException when it reads null-value field in the > BigQuery table as below: > {code:java} > Dec 22, 2018 11:05:21 AM org.apache.beam.sdk.io.FileBasedSource createReader > INFO: Matched 1 files for pattern > gs://xxx/tmp/BigQueryExtractTemp/a84545971aa94cf6b6717984e9d71642/.avro > java.lang.NullPointerException > at > org.apache.beam.sdk.io.gcp.bigquery.AvroUtils.convertAvroString(AvroUtils.java:81) > at > org.apache.beam.sdk.io.gcp.bigquery.AvroUtils.convertAvroPrimitiveTypes(AvroUtils.java:104) > at > org.apache.beam.sdk.io.gcp.bigquery.AvroUtils.convertAvroFormat(AvroUtils.java:46) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toBeamRow(BigQueryUtils.java:206) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils$ToBeamRow.apply(BigQueryUtils.java:198) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils$ToBeamRow.apply(BigQueryUtils.java:185) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply(BigQuerySourceBase.java:221) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply(BigQuerySourceBase.java:214) > at > org.apache.beam.sdk.io.AvroSource$AvroBlock.readNextRecord(AvroSource.java:567) > at > org.apache.beam.sdk.io.BlockBasedSource$BlockBasedReader.readNextRecord(BlockBasedSource.java:209) > at > org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:484) > at > org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:479) > at > org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:249) > at > org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.processElement(BoundedReadEvaluatorFactory.java:147) > at > org.apache.beam.runners.direct.DirectTransformExecutor.processElements(DirectTransformExecutor.java:160) > at > org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:124) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6527) Parallel tox (unit) tests run on Jenkins
[ https://issues.apache.org/jira/browse/BEAM-6527?focusedWorklogId=211328=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211328 ] ASF GitHub Bot logged work on BEAM-6527: Author: ASF GitHub Bot Created on: 11/Mar/19 21:51 Start Date: 11/Mar/19 21:51 Worklog Time Spent: 10m Work Description: markflyhigh commented on pull request #7675: [BEAM-6527] Use Gradle to parallel Python tox tests URL: https://github.com/apache/beam/pull/7675#discussion_r264445998 ## File path: sdks/python/scripts/run_tox.sh ## @@ -24,9 +24,10 @@ ### # Usage check. -if [[ $# != 1 ]]; then - printf "Usage: \n$> ./scripts/run_tox.sh " +if [[ $# < 1 || $# > 2 ]]; then + printf "Usage: \n$> ./scripts/run_tox.sh " printf "\n\ttox_environment: [required] Tox environment to run the test in.\n" + printf "\n\tsdk_location: [optional] SDK artifact location to install dependencies.\n" Review comment: `--install` behaves the same as `--installpkg` but we should use the documented one. Thanks for pointing out. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211328) Time Spent: 3h (was: 2h 50m) > Parallel tox (unit) tests run on Jenkins > > > Key: BEAM-6527 > URL: https://issues.apache.org/jira/browse/BEAM-6527 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Labels: triaged > Time Spent: 3h > Remaining Estimate: 0h > > Existing tox unit test suite (basic, gcp and cython) will be enabled in > multiple version of Python 3, which will significantly increase runtime of > Pre/PostCommit build. A parallel is wanted in tox or Gradle invocation to > control the time in a reasonable range (<30mins for PreCommit is desired). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6527) Parallel tox (unit) tests run on Jenkins
[ https://issues.apache.org/jira/browse/BEAM-6527?focusedWorklogId=211327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211327 ] ASF GitHub Bot logged work on BEAM-6527: Author: ASF GitHub Bot Created on: 11/Mar/19 21:51 Start Date: 11/Mar/19 21:51 Worklog Time Spent: 10m Work Description: markflyhigh commented on pull request #7675: [BEAM-6527] Use Gradle to parallel Python tox tests URL: https://github.com/apache/beam/pull/7675#discussion_r264445998 ## File path: sdks/python/scripts/run_tox.sh ## @@ -24,9 +24,10 @@ ### # Usage check. -if [[ $# != 1 ]]; then - printf "Usage: \n$> ./scripts/run_tox.sh " +if [[ $# < 1 || $# > 2 ]]; then + printf "Usage: \n$> ./scripts/run_tox.sh " printf "\n\ttox_environment: [required] Tox environment to run the test in.\n" + printf "\n\tsdk_location: [optional] SDK artifact location to install dependencies.\n" Review comment: `--install` behaves the same as `--installpkg` but we should use the documented one. Thanks for pointed out. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211327) Time Spent: 2h 50m (was: 2h 40m) > Parallel tox (unit) tests run on Jenkins > > > Key: BEAM-6527 > URL: https://issues.apache.org/jira/browse/BEAM-6527 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Labels: triaged > Time Spent: 2h 50m > Remaining Estimate: 0h > > Existing tox unit test suite (basic, gcp and cython) will be enabled in > multiple version of Python 3, which will significantly increase runtime of > Pre/PostCommit build. A parallel is wanted in tox or Gradle invocation to > control the time in a reasonable range (<30mins for PreCommit is desired). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6527) Parallel tox (unit) tests run on Jenkins
[ https://issues.apache.org/jira/browse/BEAM-6527?focusedWorklogId=211326=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211326 ] ASF GitHub Bot logged work on BEAM-6527: Author: ASF GitHub Bot Created on: 11/Mar/19 21:48 Start Date: 11/Mar/19 21:48 Worklog Time Spent: 10m Work Description: markflyhigh commented on pull request #7675: [BEAM-6527] Use Gradle to parallel Python tox tests URL: https://github.com/apache/beam/pull/7675#discussion_r26893 ## File path: sdks/python/test-suites/tox/py3/build.gradle ## @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Unit tests for Python 3 + */ + +apply plugin: org.apache.beam.gradle.BeamModulePlugin +applyPythonNature() + +// Required to setup a Python 3 virtualenv. +project.ext.python3 = true + +task lint {} +check.dependsOn lint + +toxTask "lintPy3", "py3-lint" +lint.dependsOn lintPy3 + +toxTask "testPython3", "py3" +test.dependsOn testPython3 + +toxTask "testPy3Gcp", "py3-gcp" +test.dependsOn testPy3Gcp + +toxTask "testPy3Cython", "py3-cython" +test.dependsOn testPy3Cython +// Ensure that testPy3Cython runs exclusively to other tests. +testPy3Cython.mustRunAfter testPython3, testPy3Gcp +testPy3Cython.mustRunAfter ':beam-sdks-python:testPy2Cython' Review comment: testPy3Cython and testPy2Cython run in parallel even I use `finalizedBy` in [here](https://github.com/apache/beam/pull/7675/files#diff-c197962302397baf3a4cc36463dce5eaR197). If they do not affect each other, I can remove this line. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211326) Time Spent: 2h 40m (was: 2.5h) > Parallel tox (unit) tests run on Jenkins > > > Key: BEAM-6527 > URL: https://issues.apache.org/jira/browse/BEAM-6527 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Labels: triaged > Time Spent: 2h 40m > Remaining Estimate: 0h > > Existing tox unit test suite (basic, gcp and cython) will be enabled in > multiple version of Python 3, which will significantly increase runtime of > Pre/PostCommit build. A parallel is wanted in tox or Gradle invocation to > control the time in a reasonable range (<30mins for PreCommit is desired). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6527) Parallel tox (unit) tests run on Jenkins
[ https://issues.apache.org/jira/browse/BEAM-6527?focusedWorklogId=211320=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211320 ] ASF GitHub Bot logged work on BEAM-6527: Author: ASF GitHub Bot Created on: 11/Mar/19 21:39 Start Date: 11/Mar/19 21:39 Worklog Time Spent: 10m Work Description: markflyhigh commented on pull request #7675: [BEAM-6527] Use Gradle to parallel Python tox tests URL: https://github.com/apache/beam/pull/7675#discussion_r264442150 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -1611,18 +1611,48 @@ class BeamModulePlugin implements Plugin { outputs.dirs(project.ext.envdir) } + def pythonSdkDeps = project.files( + project.fileTree(dir: "${project.rootDir}", include: [ +'model/**', +'sdks/python/apache_beam/**/*.py', +'sdks/python/apache_beam/**/*.pyx', +'sdks/python/apache_beam/**/*.pxd', +'sdks/python/apache_beam/testing/data/**', +'sdks/python/apache_beam/scripts/**', +'sdks/python/.pylintrc', +'sdks/python/MANIFEST.in', +'sdks/python/gen_protos.py', +'sdks/python/setup.cfg', +'sdks/python/setup.py', +'sdks/python/test_config.py', +'sdks/python/tox.ini', + ]) + ) + project.configurations { distConfig } project.task('sdist', dependsOn: 'setupVirtualenv') { doLast { + // Copy sdk sources to isolate directory + def copiedSrcDir = "${project.buildDir}/srcs" + project.copy { +from pythonSdkDeps +into copiedSrcDir + } + + // Build artifact project.exec { executable 'sh' -args '-c', ". ${project.ext.envdir}/bin/activate && cd ${pythonRootDir} && python setup.py sdist --keep-temp --formats zip,gztar --dist-dir ${project.buildDir}" +args '-c', ". ${project.ext.envdir}/bin/activate && cd ${copiedSrcDir}/sdks/python && python setup.py sdist --formats zip,gztar --dist-dir ${project.buildDir}" Review comment: I added this flag to fix the parallel failure in integration tests since the temp directory is shared between different build processes and by default it's deleted after a build finish. However, it never works for tox tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211320) Time Spent: 2h 10m (was: 2h) > Parallel tox (unit) tests run on Jenkins > > > Key: BEAM-6527 > URL: https://issues.apache.org/jira/browse/BEAM-6527 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Labels: triaged > Time Spent: 2h 10m > Remaining Estimate: 0h > > Existing tox unit test suite (basic, gcp and cython) will be enabled in > multiple version of Python 3, which will significantly increase runtime of > Pre/PostCommit build. A parallel is wanted in tox or Gradle invocation to > control the time in a reasonable range (<30mins for PreCommit is desired). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6527) Parallel tox (unit) tests run on Jenkins
[ https://issues.apache.org/jira/browse/BEAM-6527?focusedWorklogId=211321=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211321 ] ASF GitHub Bot logged work on BEAM-6527: Author: ASF GitHub Bot Created on: 11/Mar/19 21:44 Start Date: 11/Mar/19 21:44 Worklog Time Spent: 10m Work Description: markflyhigh commented on pull request #7675: [BEAM-6527] Use Gradle to parallel Python tox tests URL: https://github.com/apache/beam/pull/7675#discussion_r264443718 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -1673,6 +1703,20 @@ class BeamModulePlugin implements Plugin { } return argList.join(' ') } + + project.ext.toxTask = { name, tox_env -> +project.tasks.create(name) { + dependsOn = ['sdist'] + doLast { +project.exec { + executable 'sh' + args '-c', ". ${project.ext.envdir}/bin/activate && ${pythonRootDir}/scripts/run_tox.sh $tox_env ${project.buildDir}/apache-beam.tar.gz" Review comment: I only pass tarball to the script. `$tox_env` is the name of environment that we want to run (like `py27-lint, py3-gcp`). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211321) Time Spent: 2h 20m (was: 2h 10m) > Parallel tox (unit) tests run on Jenkins > > > Key: BEAM-6527 > URL: https://issues.apache.org/jira/browse/BEAM-6527 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Labels: triaged > Time Spent: 2h 20m > Remaining Estimate: 0h > > Existing tox unit test suite (basic, gcp and cython) will be enabled in > multiple version of Python 3, which will significantly increase runtime of > Pre/PostCommit build. A parallel is wanted in tox or Gradle invocation to > control the time in a reasonable range (<30mins for PreCommit is desired). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6711) Bigquery Tornadoes IT is broken in Python3 PostCommit test suite.
[ https://issues.apache.org/jira/browse/BEAM-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789988#comment-16789988 ] Tanay Tummalapalli commented on BEAM-6711: -- [~tvalentyn] I'll find the answer to those questions. > Bigquery Tornadoes IT is broken in Python3 PostCommit test suite. > -- > > Key: BEAM-6711 > URL: https://issues.apache.org/jira/browse/BEAM-6711 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.12.0 > > Time Spent: 6h 50m > Remaining Estimate: 0h > > First failure was observed in > https://builds.apache.org/job/beam_PostCommit_Python3_Verify/54 , after > https://github.com/apache/beam/commit/cdea885872b3be7de9ba22f22700be89f7d53766 > was merged. > [~pabloem], could you please take a look? I suggest we do a rollback + > rollforward with a fix. > {noformat} > root: ERROR: Exception at bundle > , > due to an exception. > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/runners/common.py", > line 727, in process > return self.do_fn_invoker.invoke_process(windowed_value) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/runners/common.py", > line 556, in invoke_process > windowed_value, additional_args, additional_kwargs, output_processor) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/runners/common.py", > line 622, in _invoke_per_window > self.process_method(*args_for_process, **kwargs_for_process)) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/runners/common.py", > line 823, in process_outputs > for result in results: > File "/home/jenkins/jenkins-slave/works > pace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py", > line 191, in process > if destination in self._destination_to_file_writer: > TypeError: unhashable type: 'TableReference' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6777) SDK Harness Resilience
[ https://issues.apache.org/jira/browse/BEAM-6777?focusedWorklogId=211302=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211302 ] ASF GitHub Bot logged work on BEAM-6777: Author: ASF GitHub Bot Created on: 11/Mar/19 21:17 Start Date: 11/Mar/19 21:17 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #8012: [BEAM-6777] Add HealthDaemon and tests URL: https://github.com/apache/beam/pull/8012#discussion_r264434326 ## File path: sdks/python/apache_beam/runners/worker/health_daemon.py ## @@ -0,0 +1,121 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from __future__ import absolute_import + +import errno +import http.client +import logging +import socket +import time +from builtins import object + + +class HealthDaemon(object): + """Sends periodic HTTP PUT /sdk requests to the health server. + + The purpose of this class is to communicate to the health server that this + SDK Harness is alive. If this SDK Harness does not communicate to the health + server after a configured amount of time, the health server will restart the + container. + + Expected Usage: +# The HealthDaemon is expected to spin forever, start it on a separate +# thread. +health_thread = threading.Thread(target=HealthDaemon(8080).start) + +# Automatically kill the thread when the program exists. +health_thread.daemon = True +health_thread.setName('health-client-demon') + +# Start the HealthDaemon. +health_thread.start() + + """ + + def __init__(self, health_http_port): +self._health_http_port = health_http_port + + @staticmethod + def connect_to_server(health_http_port, timeout=5): +"""Connects to the health server on the given port. + +Args: + health_http_port(int): Binding port for the debug server. +Default is 0 which means any free unsecured port + timeout(int): Timeout in seconds for all operations. + +Returns: + The connection to the health server. +""" + +logging.info('Connecting to localhost:%s', health_http_port) +return http.client.HTTPConnection('localhost', health_http_port, + timeout=timeout) + + @staticmethod + def try_health_ping(health_server): +"""Attempts to ping the given health server. + +Args: + health_server(http.client.HTTPConnection): Connection to the health +server. + +Returns: + True if the health ping succeeded, false otherwise. +""" + +success = False +try: + health_server.request('PUT', '/sdk') Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211302) Time Spent: 1h 20m (was: 1h 10m) > SDK Harness Resilience > -- > > Key: BEAM-6777 > URL: https://issues.apache.org/jira/browse/BEAM-6777 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > If the Python SDK Harness crashes in any way (user code exception, OOM, etc) > the job will hang and waste resources. The fix is to add a daemon in the SDK > Harness and Runner Harness to communicate with Dataflow to restart the VM > when stuckness is detected. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6777) SDK Harness Resilience
[ https://issues.apache.org/jira/browse/BEAM-6777?focusedWorklogId=211301=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211301 ] ASF GitHub Bot logged work on BEAM-6777: Author: ASF GitHub Bot Created on: 11/Mar/19 21:17 Start Date: 11/Mar/19 21:17 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #8012: [BEAM-6777] Add HealthDaemon and tests URL: https://github.com/apache/beam/pull/8012#discussion_r264434288 ## File path: sdks/python/apache_beam/runners/worker/health_daemon.py ## @@ -0,0 +1,121 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from __future__ import absolute_import + +import errno +import http.client +import logging +import socket +import time +from builtins import object + + +class HealthDaemon(object): + """Sends periodic HTTP PUT /sdk requests to the health server. + + The purpose of this class is to communicate to the health server that this + SDK Harness is alive. If this SDK Harness does not communicate to the health + server after a configured amount of time, the health server will restart the + container. + + Expected Usage: +# The HealthDaemon is expected to spin forever, start it on a separate +# thread. +health_thread = threading.Thread(target=HealthDaemon(8080).start) + +# Automatically kill the thread when the program exists. Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211301) Time Spent: 1h 10m (was: 1h) > SDK Harness Resilience > -- > > Key: BEAM-6777 > URL: https://issues.apache.org/jira/browse/BEAM-6777 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > If the Python SDK Harness crashes in any way (user code exception, OOM, etc) > the job will hang and waste resources. The fix is to add a daemon in the SDK > Harness and Runner Harness to communicate with Dataflow to restart the VM > when stuckness is detected. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6726) Gradle Publish fails with Gradle 5
[ https://issues.apache.org/jira/browse/BEAM-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Luckey resolved BEAM-6726. -- Resolution: Fixed > Gradle Publish fails with Gradle 5 > -- > > Key: BEAM-6726 > URL: https://issues.apache.org/jira/browse/BEAM-6726 > Project: Beam > Issue Type: Bug > Components: build-system >Affects Versions: 2.11.0 >Reporter: Ahmet Altay >Assignee: Michael Luckey >Priority: Blocker > Fix For: 2.12.0 > > Time Spent: 4h > Remaining Estimate: 0h > > cc: [~alanmyrvold] [~kenn] > :beam-sdks-java-bom:signMavenJavaPublication task fails with an obscure > error: > (https://scans.gradle.com/s/mcbb4axlx6agy/failure?openFailures=WzBd=WzFd#top=0): > Duplicate key pom-default.xml.asc:xml.asc:asc:null (attempted merging values > Signature pom-default.xml.asc:xml.asc:asc:null and Signature > pom-default.xml.asc:xml.asc:asc:null) > Downgrading to Gradle 4 by reverting > https://github.com/apache/beam/commit/cadb6f7fabc6faedc6037104338306688f17652f > works. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6726) Gradle Publish fails with Gradle 5
[ https://issues.apache.org/jira/browse/BEAM-6726?focusedWorklogId=211288=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211288 ] ASF GitHub Bot logged work on BEAM-6726: Author: ASF GitHub Bot Created on: 11/Mar/19 20:56 Start Date: 11/Mar/19 20:56 Worklog Time Spent: 10m Work Description: adude3141 commented on issue #8026: [BEAM-6726] explicitly specify signing key URL: https://github.com/apache/beam/pull/8026#issuecomment-471726275 Nice. Thx for merging. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211288) Time Spent: 4h (was: 3h 50m) > Gradle Publish fails with Gradle 5 > -- > > Key: BEAM-6726 > URL: https://issues.apache.org/jira/browse/BEAM-6726 > Project: Beam > Issue Type: Bug > Components: build-system >Affects Versions: 2.11.0 >Reporter: Ahmet Altay >Assignee: Michael Luckey >Priority: Blocker > Fix For: 2.12.0 > > Time Spent: 4h > Remaining Estimate: 0h > > cc: [~alanmyrvold] [~kenn] > :beam-sdks-java-bom:signMavenJavaPublication task fails with an obscure > error: > (https://scans.gradle.com/s/mcbb4axlx6agy/failure?openFailures=WzBd=WzFd#top=0): > Duplicate key pom-default.xml.asc:xml.asc:asc:null (attempted merging values > Signature pom-default.xml.asc:xml.asc:asc:null and Signature > pom-default.xml.asc:xml.asc:asc:null) > Downgrading to Gradle 4 by reverting > https://github.com/apache/beam/commit/cadb6f7fabc6faedc6037104338306688f17652f > works. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6726) Gradle Publish fails with Gradle 5
[ https://issues.apache.org/jira/browse/BEAM-6726?focusedWorklogId=211284=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211284 ] ASF GitHub Bot logged work on BEAM-6726: Author: ASF GitHub Bot Created on: 11/Mar/19 20:53 Start Date: 11/Mar/19 20:53 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #8026: [BEAM-6726] explicitly specify signing key URL: https://github.com/apache/beam/pull/8026 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211284) Time Spent: 3h 50m (was: 3h 40m) > Gradle Publish fails with Gradle 5 > -- > > Key: BEAM-6726 > URL: https://issues.apache.org/jira/browse/BEAM-6726 > Project: Beam > Issue Type: Bug > Components: build-system >Affects Versions: 2.11.0 >Reporter: Ahmet Altay >Assignee: Michael Luckey >Priority: Blocker > Fix For: 2.12.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > cc: [~alanmyrvold] [~kenn] > :beam-sdks-java-bom:signMavenJavaPublication task fails with an obscure > error: > (https://scans.gradle.com/s/mcbb4axlx6agy/failure?openFailures=WzBd=WzFd#top=0): > Duplicate key pom-default.xml.asc:xml.asc:asc:null (attempted merging values > Signature pom-default.xml.asc:xml.asc:asc:null and Signature > pom-default.xml.asc:xml.asc:asc:null) > Downgrading to Gradle 4 by reverting > https://github.com/apache/beam/commit/cadb6f7fabc6faedc6037104338306688f17652f > works. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6795) Improve Release Scripts
[ https://issues.apache.org/jira/browse/BEAM-6795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789950#comment-16789950 ] Michael Luckey commented on BEAM-6795: -- In discussion of PR [#8026|https://github.com/apache/beam/pull/8026] it was suggested to add some consistency validations - check that user input matches across scripts, especially the signing key Current script implementations do not support here. > Improve Release Scripts > --- > > Key: BEAM-6795 > URL: https://issues.apache.org/jira/browse/BEAM-6795 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ahmet Altay >Priority: Major > > - Scripts use sudo to install binaries. Could be improved by local > installations, or perhaps using a container for build the release. > - Scripts make changes to bashrc file (e.g. alias hub to git), these could be > avoided. Even though scripts attempt make a backup file, it is easy to > override them if the script is cancelled. > - There are too many yes/no questions, configuration questions for > validations. They are not set and forget requires attention. (Possible > solutions: use command line arguments) > - Once script fails at any step (e.g. invalid password at a step) it fails > without giving a second chance and requires re-running from the top. > (Posssible idea: use breadcrumbs to continue the script for its last known > location.) > - Signing with GPG is not friendly when used from a remote terminal. Has > modal dialogs and does not interact well with gradle. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6619) Add PostCommit suite for integration tests on DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-6619?focusedWorklogId=211278=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211278 ] ASF GitHub Bot logged work on BEAM-6619: Author: ASF GitHub Bot Created on: 11/Mar/19 20:31 Start Date: 11/Mar/19 20:31 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8016: [BEAM-6619] [BEAM-6593] Add pubsub integration tests to postcommit URL: https://github.com/apache/beam/pull/8016#issuecomment-471717285 r: @tvalentyn PTAL? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211278) Time Spent: 13h 40m (was: 13.5h) > Add PostCommit suite for integration tests on DataflowRunner > > > Key: BEAM-6619 > URL: https://issues.apache.org/jira/browse/BEAM-6619 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Labels: triaged > Fix For: Not applicable > > Time Spent: 13h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6619) Add PostCommit suite for integration tests on DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-6619?focusedWorklogId=211276=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211276 ] ASF GitHub Bot logged work on BEAM-6619: Author: ASF GitHub Bot Created on: 11/Mar/19 20:30 Start Date: 11/Mar/19 20:30 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8016: [BEAM-6619] [BEAM-6593] Add pubsub integration tests to postcommit URL: https://github.com/apache/beam/pull/8016#issuecomment-471717226 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211276) Time Spent: 13.5h (was: 13h 20m) > Add PostCommit suite for integration tests on DataflowRunner > > > Key: BEAM-6619 > URL: https://issues.apache.org/jira/browse/BEAM-6619 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Labels: triaged > Fix For: Not applicable > > Time Spent: 13.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6771) Spark Runner Fails on Certain Versions of Spark 2.X
[ https://issues.apache.org/jira/browse/BEAM-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Winkelman updated BEAM-6771: - Priority: Blocker (was: Critical) > Spark Runner Fails on Certain Versions of Spark 2.X > --- > > Key: BEAM-6771 > URL: https://issues.apache.org/jira/browse/BEAM-6771 > Project: Beam > Issue Type: Bug > Components: runner-spark >Affects Versions: 2.11.0 >Reporter: Kyle Winkelman >Priority: Blocker > > When updating to Beam 2.11.0, I ran into the exception at the bottom of this > issue while running a pipeline on the Spark Runner (which worked in 2.9.0). > My cluster uses Spark 2.2.1. > Related Issues: > SPARK-23697 (Proof that equals must be implemented for items being > accumulated.) > BEAM-1920 (In PR#3808, equals was implemented on MetricsContainerStepMap to > get Spark to run on 2.X.) > My analysis has lead me to believe that BEAM-6138 is the reason for this > issue. > Before this change, versions of Spark that are affected by SPARK-23697 would > create a new MetricsContainerStepMap and make sure that the copied and reset > instance (the one serialized for distribution) is equal to the initial empty > MetricsContainerStepMap that is passed in. This would effectively check if > two empty ConcurrentHashMaps were equal. This results in true. > After this change, versions of Spark that are affected by SPARK-23697 would > effectively be checking if two empty ConcurrentHashMaps were equal _*AND*_ if > two different instances of the MetricsContainerImpl are equal. Because > MetricsContainerImpl doesn't implement equals, this results in false. > I believe BEAM-6546 will fix this issue, but I wanted to raise a red flag. I > am also hoping someone can verify my analysis. > {noformat} > ERROR ApplicationMaster: User class threw exception: > java.lang.RuntimeException: java.lang.AssertionError: assertion failed: > copyAndReset must return a zero value copy > java.lang.RuntimeException: java.lang.AssertionError: assertion failed: > copyAndReset must return a zero value copy > at > org.apache.beam.runners.spark.SparkPipelineResult.runtimeExceptionFrom(SparkPipelineResult.java:54) > at > org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom(SparkPipelineResult.java:71) > at > org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:98) > at com.optum.analyticstore.execution.Exec.run(Exec.java:276) > at com.optum.analyticstore.execution.Exec.main(Exec.java:364) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637) > Caused by: java.lang.AssertionError: assertion failed: copyAndReset must > return a zero value copy > at scala.Predef$.assert(Predef.scala:170) > at > org.apache.spark.util.AccumulatorV2.writeReplace(AccumulatorV2.scala:163) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1218) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1136) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at
[jira] [Updated] (BEAM-6806) org.apache.beam.runners not importing in 2.10 & 2.11
[ https://issues.apache.org/jira/browse/BEAM-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Jon Anderson updated BEAM-6806: -- Description: When trying to upgrade our 2.9.0 pipeline to 2.10 or 2.11, all the packages under org.apache.beam.runners disappears (does not load, does not exist), breaking our scripts. This is preventing us from upgrading from 2.9. The error: {code:java} The import org.apache.beam.runners cannot be resolved.{code} Classes we need: {code:java} org.apache.beam.runners.dataflow.options.DataflowPipelineOptions org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions.AutoscalingAlgorithmType{code} Relevant POM {code:java} org.apache.beam beam-sdks-java-core 2.11.0 org.apache.beam beam-sdks-java-io-google-cloud-platform 2.11.0 org.apache.beam beam-runners-google-cloud-dataflow-java 2.11.0 runtime org.apache.beam beam-runners-direct-java 2.11.0 runtime {code} was: When trying to upgrade our 2.9.0 pipeline to 2.10 or 2.11, all the packages under org.apache.beam.runners disappears (does not load, does not exist), breaking our scripts. This is preventing us from upgrading from 2.9. The error: {code:java} The import org.apache.beam.runners cannot be resolved.{code} Classes we need: {code:java} org.apache.beam.runners.dataflow.options.DataflowPipelineOptions org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions.AutoscalingAlgorithmType{code} Relevant POM {code:java} org.apache.beam beam-sdks-java-core 2.11.0 org.apache.beam beam-sdks-java-io-google-cloud-platform 2.11.0 org.apache.beam beam-runners-google-cloud-dataflow-java 2.11.0 runtime org.apache.beam beam-runners-direct-java 2.11.0 runtime {code} > org.apache.beam.runners not importing in 2.10 & 2.11 > > > Key: BEAM-6806 > URL: https://issues.apache.org/jira/browse/BEAM-6806 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Affects Versions: 2.10.0, 2.11.0 >Reporter: Steven Jon Anderson >Priority: Blocker > > When trying to upgrade our 2.9.0 pipeline to 2.10 or 2.11, all the packages > under org.apache.beam.runners disappears (does not load, does not exist), > breaking our scripts. This is preventing us from upgrading from 2.9. > The error: > {code:java} > The import org.apache.beam.runners cannot be resolved.{code} > Classes we need: > {code:java} > org.apache.beam.runners.dataflow.options.DataflowPipelineOptions > org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions.AutoscalingAlgorithmType{code} > Relevant POM > {code:java} > > org.apache.beam > beam-sdks-java-core > 2.11.0 > > > org.apache.beam > beam-sdks-java-io-google-cloud-platform > 2.11.0 > > > org.apache.beam > beam-runners-google-cloud-dataflow-java > 2.11.0 > runtime > > > org.apache.beam > beam-runners-direct-java > 2.11.0 > runtime > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6806) org.apache.beam.runners not importing in 2.10 & 2.11
Steven Jon Anderson created BEAM-6806: - Summary: org.apache.beam.runners not importing in 2.10 & 2.11 Key: BEAM-6806 URL: https://issues.apache.org/jira/browse/BEAM-6806 Project: Beam Issue Type: Bug Components: runner-dataflow Affects Versions: 2.11.0, 2.10.0 Reporter: Steven Jon Anderson When trying to upgrade our 2.9.0 pipeline to 2.10 or 2.11, all the packages under org.apache.beam.runners disappears (does not load, does not exist), breaking our scripts. This is preventing us from upgrading from 2.9. The error: {code:java} The import org.apache.beam.runners cannot be resolved.{code} Classes we need: {code:java} org.apache.beam.runners.dataflow.options.DataflowPipelineOptions org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions.AutoscalingAlgorithmType{code} Relevant POM {code:java} org.apache.beam beam-sdks-java-core 2.11.0 org.apache.beam beam-sdks-java-io-google-cloud-platform 2.11.0 org.apache.beam beam-runners-google-cloud-dataflow-java 2.11.0 runtime org.apache.beam beam-runners-direct-java 2.11.0 runtime {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6735) WriteFiles with runner-determined sharding is forced to handle spilling
[ https://issues.apache.org/jira/browse/BEAM-6735?focusedWorklogId=211261=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211261 ] ASF GitHub Bot logged work on BEAM-6735: Author: ASF GitHub Bot Created on: 11/Mar/19 20:09 Start Date: 11/Mar/19 20:09 Worklog Time Spent: 10m Work Description: kyle-winkelman commented on issue #7929: [BEAM-6735] Add noSpilling option to WriteFiles. URL: https://github.com/apache/beam/pull/7929#issuecomment-471705486 Done! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211261) Time Spent: 1h 20m (was: 1h 10m) > WriteFiles with runner-determined sharding is forced to handle spilling > --- > > Key: BEAM-6735 > URL: https://issues.apache.org/jira/browse/BEAM-6735 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Kyle Winkelman >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > As a result of BEAM-2302, files in excess of WriteFiles > maxNumWritersPerBundle are shuffled to be written later. The downside to this > is that even if you can guarantee that maxNumWritersPerBundle is high enough > to handle all writes you still have to pay the overhead of this write now > being a MultiOutput ParDo. > e.g. In the Spark Runner when a ParDo has multiple outputs the returned data > is cached and if using the disableCache pipeline option it would cause > recalculation and all the temp files would be written again. > I'm sure that the Spark Runner is not the only runner that would benefit from > an optional setting for WriteFiles that would skip this spilling and simplify > the pipeline. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6713) FileIO and TextIO unable to alter WriteFiles maxNumWritersPerBundle
[ https://issues.apache.org/jira/browse/BEAM-6713?focusedWorklogId=211243=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211243 ] ASF GitHub Bot logged work on BEAM-6713: Author: ASF GitHub Bot Created on: 11/Mar/19 19:46 Start Date: 11/Mar/19 19:46 Worklog Time Spent: 10m Work Description: kyle-winkelman commented on pull request #7893: [BEAM-6713] Add withMaxNumWritersPerBundle from WriteFiles to FileIO … URL: https://github.com/apache/beam/pull/7893#discussion_r264399407 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java ## @@ -431,7 +431,8 @@ .setNumShards(0) .setCodec(TypedWrite.DEFAULT_SERIALIZABLE_CODEC) .setMetadata(ImmutableMap.of()) -.setWindowedWrites(false); +.setWindowedWrites(false) + .setMaxNumWritersPerBundle(WriteFiles.DEFAULT_MAX_NUM_WRITERS_PER_BUNDLE); Review comment: I have also come up with another approach to my issue: #7929. So it may be unnecessary to expose this if that is the consensus. I just want to highlight that it was a huge pain to work around this limitation so I could set a higher max. I had to copy most of the FileIO class because its all private internal stuff so that I could call WriteFiles on my own with a higher max. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211243) Time Spent: 1h 20m (was: 1h 10m) > FileIO and TextIO unable to alter WriteFiles maxNumWritersPerBundle > --- > > Key: BEAM-6713 > URL: https://issues.apache.org/jira/browse/BEAM-6713 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Kyle Winkelman >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > When attempting to run a batch workflow with a FileIO.write() I was getting > job failures due to WriteFiles.DEFAULT_MAX_NUM_WRITERS_PER_BUNDLE causing a > significant amount of data to be shuffled. My issues would be solved by > increasing this and luckily WriteFiles already has withMaxNumWritersPerBundle > but unfortunately FileIO and TextIO do not. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-6748) Block size difference in avro library on Python3 causes some AvroIO tests to fail.
[ https://issues.apache.org/jira/browse/BEAM-6748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev closed BEAM-6748. - Resolution: Fixed Fix Version/s: Not applicable > Block size difference in avro library on Python3 causes some AvroIO tests to > fail. > -- > > Key: BEAM-6748 > URL: https://issues.apache.org/jira/browse/BEAM-6748 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: Not applicable > > Time Spent: 1h 20m > Remaining Estimate: 0h > > *apache_beam.io.avroio_test.TestAvro.test_split_points* > *apache_beam.io.avroio_test.TestFastAvro.test_split_points* > fail with: > > {code:java} > Traceback (most recent call last): > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", > line 308, in test_split_points > self.assertEquals(split_points_report[-10:], [(2, 1)] * 10) > AssertionError: Lists differ: [(10, 1), (10, 1), (10, 1), (10, 1), (10, 1[42 > chars], 1)] != [(2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2[32 chars], 1)] > First differing element 0: > (10, 1) > (2, 1) > + [(2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), > (2, 1)] > - [(10, 1), > - (10, 1), > - (10, 1), > - (10, 1), > - (10, 1), > - (10, 1), > - (10, 1), > - (10, 1), > - (10, 1), > - (10, 1)] {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6735) WriteFiles with runner-determined sharding is forced to handle spilling
[ https://issues.apache.org/jira/browse/BEAM-6735?focusedWorklogId=211210=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211210 ] ASF GitHub Bot logged work on BEAM-6735: Author: ASF GitHub Bot Created on: 11/Mar/19 18:37 Start Date: 11/Mar/19 18:37 Worklog Time Spent: 10m Work Description: pabloem commented on issue #7929: [BEAM-6735] Add noSpilling option to WriteFiles. URL: https://github.com/apache/beam/pull/7929#issuecomment-471667430 Ismael may also be able to review if Luke can't This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211210) Time Spent: 1h 10m (was: 1h) > WriteFiles with runner-determined sharding is forced to handle spilling > --- > > Key: BEAM-6735 > URL: https://issues.apache.org/jira/browse/BEAM-6735 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Kyle Winkelman >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > As a result of BEAM-2302, files in excess of WriteFiles > maxNumWritersPerBundle are shuffled to be written later. The downside to this > is that even if you can guarantee that maxNumWritersPerBundle is high enough > to handle all writes you still have to pay the overhead of this write now > being a MultiOutput ParDo. > e.g. In the Spark Runner when a ParDo has multiple outputs the returned data > is cached and if using the disableCache pipeline option it would cause > recalculation and all the temp files would be written again. > I'm sure that the Spark Runner is not the only runner that would benefit from > an optional setting for WriteFiles that would skip this spilling and simplify > the pipeline. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6493) examples in Kotlin
[ https://issues.apache.org/jira/browse/BEAM-6493?focusedWorklogId=211208=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211208 ] ASF GitHub Bot logged work on BEAM-6493: Author: ASF GitHub Bot Created on: 11/Mar/19 18:36 Start Date: 11/Mar/19 18:36 Worklog Time Spent: 10m Work Description: pabloem commented on issue #7807: [BEAM-6493] Add wordcount example in kotlin URL: https://github.com/apache/beam/pull/7807#issuecomment-471667011 @the-dagger ping : ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211208) Time Spent: 2h 10m (was: 2h) Remaining Estimate: 502h 50m (was: 503h) > examples in Kotlin > -- > > Key: BEAM-6493 > URL: https://issues.apache.org/jira/browse/BEAM-6493 > Project: Beam > Issue Type: Task > Components: examples-java >Affects Versions: Not applicable >Reporter: Harshit Dwivedi >Assignee: Harshit Dwivedi >Priority: Minor > Labels: documentation > Fix For: Not applicable > > Original Estimate: 504h > Time Spent: 2h 10m > Remaining Estimate: 502h 50m > > I have been using Apache Beam for few of my projects in production since the > past 6 months and apart from Java, [Kotlin|https://kotlinlang.org/] also > seems to work as well with no issues whatsoever. > But currently, the Github Repository of Apache Beam contains examples only in > Java which might be an issue for other developers who want to use Apache Beam > SDK with kotlin as there are no sample resources available. > That said, I would love to go ahead and add kotlin examples alongside the > current java examples in the [Beam > repository|https://github.com/apache/beam/tree/master/examples/java]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6735) WriteFiles with runner-determined sharding is forced to handle spilling
[ https://issues.apache.org/jira/browse/BEAM-6735?focusedWorklogId=211209=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211209 ] ASF GitHub Bot logged work on BEAM-6735: Author: ASF GitHub Bot Created on: 11/Mar/19 18:36 Start Date: 11/Mar/19 18:36 Worklog Time Spent: 10m Work Description: pabloem commented on issue #7929: [BEAM-6735] Add noSpilling option to WriteFiles. URL: https://github.com/apache/beam/pull/7929#issuecomment-471667330 Kyle could you rebase this? And would you mind adding the Javadoc? <3 Luke is out on leave, but he'll be back soon and he can review... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211209) Time Spent: 1h (was: 50m) > WriteFiles with runner-determined sharding is forced to handle spilling > --- > > Key: BEAM-6735 > URL: https://issues.apache.org/jira/browse/BEAM-6735 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Kyle Winkelman >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > As a result of BEAM-2302, files in excess of WriteFiles > maxNumWritersPerBundle are shuffled to be written later. The downside to this > is that even if you can guarantee that maxNumWritersPerBundle is high enough > to handle all writes you still have to pay the overhead of this write now > being a MultiOutput ParDo. > e.g. In the Spark Runner when a ParDo has multiple outputs the returned data > is cached and if using the disableCache pipeline option it would cause > recalculation and all the temp files would be written again. > I'm sure that the Spark Runner is not the only runner that would benefit from > an optional setting for WriteFiles that would skip this spilling and simplify > the pipeline. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-1251) Python 3 Support
[ https://issues.apache.org/jira/browse/BEAM-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789854#comment-16789854 ] Valentyn Tymofieiev commented on BEAM-1251: --- A recently released Apache Beam 2.11.0 is the first release to offer partial support for Python 3.5+. Python 3 support remains an active work in progress, and the support offered in 2.11.0 has limitations and known issues. * Beam 2.11.0 release has been tested only with Python 3.5 on Direct and Dataflow runners. * IO availability is limited on Python 3 as of Beam 2.11.0: * BEAM-4543: Datastore IO connector is not available in Python 3. * BEAM-6522: Avro IO connector has issues in Python 3. * BEAM-5844: VCF IO connector is not available in Python 3. * BEAM-6769: BigQuery IO does not support raw bytes in Python 3. * Dataflow Runner supports Python 2.7 and 3.5 versions only and will not send jobs to Dataflow service if the SDK is running using a different version of the interpreter. * Other known issues: ** Main sessions that contain invocations of superclass constructors fail to save: [https://github.com/uqfoundation/dill/issues/300] ** New syntactic constructs introduced in Python 3, may not be supported in Beam 2.11: *** BEAM-5878 - Support functions with keyword-only arguments. * Breaking changes in Beam 2.11.0: ** BEAM-5731 - Top.Of and Top.PerKey no longer accepts a compare parameter in line with Python's change to its sorting operations. We will likely uncover more Python 3-related issues in the future and we appreciate your feedback. Feel free to report reproducible Python-3 related issues as sub-tasks in BEAM-1251. Contributions are welcome, see: [https://beam.apache.org/roadmap/python-sdk/#python-3-support] for details. > Python 3 Support > > > Key: BEAM-1251 > URL: https://issues.apache.org/jira/browse/BEAM-1251 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Eyad Sibai >Assignee: Robbe >Priority: Major > Labels: triaged > Time Spent: 28h 50m > Remaining Estimate: 0h > > I have been trying to use google datalab with python3. As I see there are > several packages that does not support python3 yet which google datalab > depends on. This is one of them. > https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6777) SDK Harness Resilience
[ https://issues.apache.org/jira/browse/BEAM-6777?focusedWorklogId=211204=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211204 ] ASF GitHub Bot logged work on BEAM-6777: Author: ASF GitHub Bot Created on: 11/Mar/19 18:25 Start Date: 11/Mar/19 18:25 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #8012: [BEAM-6777] Add HealthDaemon and tests URL: https://github.com/apache/beam/pull/8012#discussion_r264054020 ## File path: sdks/python/apache_beam/runners/worker/health_daemon.py ## @@ -0,0 +1,121 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from __future__ import absolute_import + +import errno +import http.client +import logging +import socket +import time +from builtins import object + + +class HealthDaemon(object): + """Sends periodic HTTP PUT /sdk requests to the health server. + + The purpose of this class is to communicate to the health server that this + SDK Harness is alive. If this SDK Harness does not communicate to the health + server after a configured amount of time, the health server will restart the + container. + + Expected Usage: +# The HealthDaemon is expected to spin forever, start it on a separate +# thread. +health_thread = threading.Thread(target=HealthDaemon(8080).start) + +# Automatically kill the thread when the program exists. +health_thread.daemon = True +health_thread.setName('health-client-demon') + +# Start the HealthDaemon. +health_thread.start() + + """ + + def __init__(self, health_http_port): +self._health_http_port = health_http_port + + @staticmethod + def connect_to_server(health_http_port, timeout=5): +"""Connects to the health server on the given port. + +Args: + health_http_port(int): Binding port for the debug server. +Default is 0 which means any free unsecured port + timeout(int): Timeout in seconds for all operations. + +Returns: + The connection to the health server. +""" + +logging.info('Connecting to localhost:%s', health_http_port) +return http.client.HTTPConnection('localhost', health_http_port, + timeout=timeout) + + @staticmethod + def try_health_ping(health_server): +"""Attempts to ping the given health server. + +Args: + health_server(http.client.HTTPConnection): Connection to the health +server. + +Returns: + True if the health ping succeeded, false otherwise. +""" + +success = False +try: + health_server.request('PUT', '/sdk') Review comment: nit: Maybe add `'/sdk'` to a class variable? `HEALTH_CHECK_ENDPOINT` or something like that? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211204) Time Spent: 1h (was: 50m) > SDK Harness Resilience > -- > > Key: BEAM-6777 > URL: https://issues.apache.org/jira/browse/BEAM-6777 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > If the Python SDK Harness crashes in any way (user code exception, OOM, etc) > the job will hang and waste resources. The fix is to add a daemon in the SDK > Harness and Runner Harness to communicate with Dataflow to restart the VM > when stuckness is detected. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5638) Add exception handling to single message transforms in Java SDK
[ https://issues.apache.org/jira/browse/BEAM-5638?focusedWorklogId=211194=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211194 ] ASF GitHub Bot logged work on BEAM-5638: Author: ASF GitHub Bot Created on: 11/Mar/19 18:00 Start Date: 11/Mar/19 18:00 Worklog Time Spent: 10m Work Description: pabloem commented on issue #7736: [BEAM-5638] Exception handling for Java MapElements and FlatMapElements URL: https://github.com/apache/beam/pull/7736#issuecomment-471653308 @reuvenlax This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211194) Time Spent: 9h 10m (was: 9h) Remaining Estimate: 158h 50m (was: 159h) > Add exception handling to single message transforms in Java SDK > --- > > Key: BEAM-5638 > URL: https://issues.apache.org/jira/browse/BEAM-5638 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Jeff Klukas >Assignee: Jeff Klukas >Priority: Minor > Labels: triaged > Original Estimate: 168h > Time Spent: 9h 10m > Remaining Estimate: 158h 50m > > Add methods to MapElements, FlatMapElements, and Filter that allow users to > specify expected exceptions and tuple tags to associate with the with > collections of the successfully and unsuccessfully processed elements. > See discussion on dev list: > https://lists.apache.org/thread.html/936ed2a5f2c01be066fd903abf70130625e0b8cf4028c11b89b8b23f@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6719) Allow multiple Joins in the same pipeline
[ https://issues.apache.org/jira/browse/BEAM-6719?focusedWorklogId=211191=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211191 ] ASF GitHub Bot logged work on BEAM-6719: Author: ASF GitHub Bot Created on: 11/Mar/19 17:56 Start Date: 11/Mar/19 17:56 Worklog Time Spent: 10m Work Description: pabloem commented on issue #7813: [BEAM-6719] Allow multiple Joins in the same pipeline URL: https://github.com/apache/beam/pull/7813#issuecomment-471651587 I've requested myself and ismael as reviewers. I'll take a look soon. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211191) Time Spent: 0.5h (was: 20m) > Allow multiple Joins in the same pipeline > - > > Key: BEAM-6719 > URL: https://issues.apache.org/jira/browse/BEAM-6719 > Project: Beam > Issue Type: Improvement > Components: sdk-java-join-library >Reporter: Daniel Mescheder >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently it is not possible to have multiple joins in the same pipeline > without wrapping them in individual PTransforms as this would generate name > clashes. > Consider the following test case: > {code:java} > @Test > public void testMultipleJoinsInSamePipeline() { > leftListOfKv.add(KV.of("Key2", 4L)); > PCollection> leftCollection = p.apply("CreateLeft", > Create.of(leftListOfKv)); > rightListOfKv.add(KV.of("Key2", "bar")); > PCollection> rightCollection = p.apply("CreateRight", > Create.of(rightListOfKv)); > expectedResult.add(KV.of("Key2", KV.of(4L, "bar"))); > PCollection>> output1 = > Join.innerJoin(leftCollection, rightCollection); > PCollection>> output2 = > Join.innerJoin(leftCollection, rightCollection); > PAssert.that(output1).containsInAnyOrder(expectedResult); > PAssert.that(output2).containsInAnyOrder(expectedResult); > p.run(); > } > {code} > This fails because of clashing names in the pipeline and there is currently > no way to use the join library to give the joins different names. > Therefore I find myself routinely wrapping joins in new PTransforms which > leads me to believe that this should be part of the library itself. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6719) Allow multiple Joins in the same pipeline
[ https://issues.apache.org/jira/browse/BEAM-6719?focusedWorklogId=211190=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211190 ] ASF GitHub Bot logged work on BEAM-6719: Author: ASF GitHub Bot Created on: 11/Mar/19 17:55 Start Date: 11/Mar/19 17:55 Worklog Time Spent: 10m Work Description: pabloem commented on issue #7813: [BEAM-6719] Allow multiple Joins in the same pipeline URL: https://github.com/apache/beam/pull/7813#issuecomment-471651186 Hello Daniel! I'm so sorry that we did not pick this up. Luke is away on leave, so we'd need to get you a new reviewer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211190) Time Spent: 20m (was: 10m) > Allow multiple Joins in the same pipeline > - > > Key: BEAM-6719 > URL: https://issues.apache.org/jira/browse/BEAM-6719 > Project: Beam > Issue Type: Improvement > Components: sdk-java-join-library >Reporter: Daniel Mescheder >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Currently it is not possible to have multiple joins in the same pipeline > without wrapping them in individual PTransforms as this would generate name > clashes. > Consider the following test case: > {code:java} > @Test > public void testMultipleJoinsInSamePipeline() { > leftListOfKv.add(KV.of("Key2", 4L)); > PCollection> leftCollection = p.apply("CreateLeft", > Create.of(leftListOfKv)); > rightListOfKv.add(KV.of("Key2", "bar")); > PCollection> rightCollection = p.apply("CreateRight", > Create.of(rightListOfKv)); > expectedResult.add(KV.of("Key2", KV.of(4L, "bar"))); > PCollection>> output1 = > Join.innerJoin(leftCollection, rightCollection); > PCollection>> output2 = > Join.innerJoin(leftCollection, rightCollection); > PAssert.that(output1).containsInAnyOrder(expectedResult); > PAssert.that(output2).containsInAnyOrder(expectedResult); > p.run(); > } > {code} > This fails because of clashing names in the pipeline and there is currently > no way to use the join library to give the joins different names. > Therefore I find myself routinely wrapping joins in new PTransforms which > leads me to believe that this should be part of the library itself. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3660) Port ReadSpannerSchemaTest off DoFnTester
[ https://issues.apache.org/jira/browse/BEAM-3660?focusedWorklogId=211189=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211189 ] ASF GitHub Bot logged work on BEAM-3660: Author: ASF GitHub Bot Created on: 11/Mar/19 17:54 Start Date: 11/Mar/19 17:54 Worklog Time Spent: 10m Work Description: pabloem commented on issue #7231: [BEAM-3660] Port ReadSpannerSchemaTest off DoFnTester URL: https://github.com/apache/beam/pull/7231#issuecomment-471650610 @Nisuuum : ( happy to merge, just looking for answers on the previous question This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211189) Time Spent: 1h 10m (was: 1h) > Port ReadSpannerSchemaTest off DoFnTester > - > > Key: BEAM-3660 > URL: https://issues.apache.org/jira/browse/BEAM-3660 > Project: Beam > Issue Type: Sub-task > Components: io-java-gcp >Reporter: Kenneth Knowles >Assignee: Evgeniy Musin >Priority: Major > Labels: beginner, newbie, starter, triaged > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4164) Make unit tests of CassandraIO use embeded server
[ https://issues.apache.org/jira/browse/BEAM-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Altay updated BEAM-4164: -- Fix Version/s: (was: 2.11.0) 2.12.0 > Make unit tests of CassandraIO use embeded server > - > > Key: BEAM-4164 > URL: https://issues.apache.org/jira/browse/BEAM-4164 > Project: Beam > Issue Type: Test > Components: io-java-cassandra >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Labels: triaged > Fix For: 2.12.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > The UT currently use a mock of the cassandra server. It would be good to do > the tests using embeded Cassandra instance to be as close as possible from a > real Cassandra server in the UT. Why not the one from cassandra-unit > ([https://mvnrepository.com/artifact/org.cassandraunit/cassandra-unit/3.3.0.2]) > ? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6805) assertTrue used where assertEquals should be
[ https://issues.apache.org/jira/browse/BEAM-6805?focusedWorklogId=211175=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211175 ] ASF GitHub Bot logged work on BEAM-6805: Author: ASF GitHub Bot Created on: 11/Mar/19 17:45 Start Date: 11/Mar/19 17:45 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #7806: [BEAM-6805] Use assertEquals(x, y) instead of assertTrue(x.equals(y)) URL: https://github.com/apache/beam/pull/7806 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211175) Time Spent: 10m Remaining Estimate: 0h > assertTrue used where assertEquals should be > > > Key: BEAM-6805 > URL: https://issues.apache.org/jira/browse/BEAM-6805 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Pablo Estrada >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6292) PasswordDecrypter: Delay decryption / Avoid serialization
[ https://issues.apache.org/jira/browse/BEAM-6292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Altay updated BEAM-6292: -- Fix Version/s: (was: 2.11.0) 2.12.0 > PasswordDecrypter: Delay decryption / Avoid serialization > - > > Key: BEAM-6292 > URL: https://issues.apache.org/jira/browse/BEAM-6292 > Project: Beam > Issue Type: Improvement > Components: io-java-cassandra >Reporter: Mathieu Blanchard >Assignee: Mathieu Blanchard >Priority: Minor > Labels: triaged > Fix For: 2.12.0 > > Time Spent: 10h 40m > Remaining Estimate: 0h > > Currently, the password is decrypted before the serialization of the pipeline > and this causes the raw version to be visible to everyone on the staging > location. > To avoid this, we delayed the decryption of the password when connecting to > the cluster, which ensures that the raw password is never serialized in the > pipeline. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6805) assertTrue used where assertEquals should be
[ https://issues.apache.org/jira/browse/BEAM-6805?focusedWorklogId=211177=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211177 ] ASF GitHub Bot logged work on BEAM-6805: Author: ASF GitHub Bot Created on: 11/Mar/19 17:45 Start Date: 11/Mar/19 17:45 Worklog Time Spent: 10m Work Description: pabloem commented on issue #7806: [BEAM-6805] Use assertEquals(x, y) instead of assertTrue(x.equals(y)) URL: https://github.com/apache/beam/pull/7806#issuecomment-471647338 Sorry about the delay. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211177) Time Spent: 20m (was: 10m) > assertTrue used where assertEquals should be > > > Key: BEAM-6805 > URL: https://issues.apache.org/jira/browse/BEAM-6805 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Pablo Estrada >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6591) CassandraIO split does not work in some corner cases.
[ https://issues.apache.org/jira/browse/BEAM-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Altay updated BEAM-6591: -- Fix Version/s: (was: 2.11.0) 2.12.0 > CassandraIO split does not work in some corner cases. > - > > Key: BEAM-6591 > URL: https://issues.apache.org/jira/browse/BEAM-6591 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Labels: triaged > Fix For: 2.12.0 > > > CassandraIO split uses token ranges to split data in the Read part of the IO. > When one split ends up using the minimum token in the token ring, then the IO > reads all the data in one split, leading to duplication. This is due to > behavior of Cassandra: see > https://issues.apache.org/jira/browse/CASSANDRA-14684 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6805) assertTrue used where assertEquals should be
Pablo Estrada created BEAM-6805: --- Summary: assertTrue used where assertEquals should be Key: BEAM-6805 URL: https://issues.apache.org/jira/browse/BEAM-6805 Project: Beam Issue Type: Improvement Components: sdk-java-core Reporter: Pablo Estrada -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6748) Block size difference in avro library on Python3 causes some AvroIO tests to fail.
[ https://issues.apache.org/jira/browse/BEAM-6748?focusedWorklogId=211172=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211172 ] ASF GitHub Bot logged work on BEAM-6748: Author: ASF GitHub Bot Created on: 11/Mar/19 17:42 Start Date: 11/Mar/19 17:42 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #8015: [BEAM-6748] Account for synchronization interval when estimating amount of blocks in generated Avro test file. URL: https://github.com/apache/beam/pull/8015#issuecomment-471645741 Thanks for review & merge, @chamikaramj . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211172) Time Spent: 1h 20m (was: 1h 10m) > Block size difference in avro library on Python3 causes some AvroIO tests to > fail. > -- > > Key: BEAM-6748 > URL: https://issues.apache.org/jira/browse/BEAM-6748 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > *apache_beam.io.avroio_test.TestAvro.test_split_points* > *apache_beam.io.avroio_test.TestFastAvro.test_split_points* > fail with: > > {code:java} > Traceback (most recent call last): > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", > line 308, in test_split_points > self.assertEquals(split_points_report[-10:], [(2, 1)] * 10) > AssertionError: Lists differ: [(10, 1), (10, 1), (10, 1), (10, 1), (10, 1[42 > chars], 1)] != [(2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2[32 chars], 1)] > First differing element 0: > (10, 1) > (2, 1) > + [(2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), > (2, 1)] > - [(10, 1), > - (10, 1), > - (10, 1), > - (10, 1), > - (10, 1), > - (10, 1), > - (10, 1), > - (10, 1), > - (10, 1), > - (10, 1)] {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6185) Upgrade to Spark 2.4.0
[ https://issues.apache.org/jira/browse/BEAM-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789795#comment-16789795 ] Alexey Romanenko commented on BEAM-6185: Cloudera CDH 6.1.0 is already based on Apache Spark 2.4 upstream version. [https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_610_new_features.html#spark_new_features] Does it make sense to move forward and upgrade Spark version in Beam? > Upgrade to Spark 2.4.0 > -- > > Key: BEAM-6185 > URL: https://issues.apache.org/jira/browse/BEAM-6185 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré >Priority: Major > Labels: triaged > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6804) [beam_PostCommit_Java] [PubsubReadIT.testReadPublicData] Timeout waiting on Sub
Mikhail Gryzykhin created BEAM-6804: --- Summary: [beam_PostCommit_Java] [PubsubReadIT.testReadPublicData] Timeout waiting on Sub Key: BEAM-6804 URL: https://issues.apache.org/jira/browse/BEAM-6804 Project: Beam Issue Type: Bug Components: test-failures Reporter: Mikhail Gryzykhin Assignee: Kenneth Knowles _Use this form to file an issue for test failure:_ * [Jenkins Job|https://builds.apache.org/job/beam_PostCommit_Java/2796/testReport/junit/org.apache.beam.sdk.io.gcp.pubsub/PubsubReadIT/testReadPublicData/] * [Gradle Build Scan|https://scans.gradle.com/s/3s4lnjovurqdi] * [Test source code|https://github.com/apache/beam/blame/b953645ed6db837d24284d7fe1fe091e7309f821/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubReadIT.java] Initial investigation: java.lang.AssertionError: Did not receive signal on projects/apache-beam-testing/subscriptions/start-subscription-313044384168895769 in 300s at org.apache.beam.sdk.io.gcp.pubsub.TestPubsubSignal.pollForResultForDuration(TestPubsubSignal.java:269) at org.apache.beam.sdk.io.gcp.pubsub.TestPubsubSignal.lambda$waitForStart$0(TestPubsubSignal.java:218) at org.apache.beam.vendor.guava.v20_0.com.google.common.base.Suppliers$MemoizingSupplier.get(Suppliers.java:120) at org.apache.beam.sdk.io.gcp.pubsub.PubsubReadIT.testReadPublicData(PubsubReadIT.java:54) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) _After you've filled out the above details, please [assign the issue to an individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. Assignee should [treat test failures as high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], helping to fix the issue or find a more appropriate owner. See [Apache Beam Post-Commit Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6748) Block size difference in avro library on Python3 causes some AvroIO tests to fail.
[ https://issues.apache.org/jira/browse/BEAM-6748?focusedWorklogId=211144=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211144 ] ASF GitHub Bot logged work on BEAM-6748: Author: ASF GitHub Bot Created on: 11/Mar/19 17:15 Start Date: 11/Mar/19 17:15 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #8015: [BEAM-6748] Account for synchronization interval when estimating amount of blocks in generated Avro test file. URL: https://github.com/apache/beam/pull/8015 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211144) Time Spent: 1h 10m (was: 1h) > Block size difference in avro library on Python3 causes some AvroIO tests to > fail. > -- > > Key: BEAM-6748 > URL: https://issues.apache.org/jira/browse/BEAM-6748 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > *apache_beam.io.avroio_test.TestAvro.test_split_points* > *apache_beam.io.avroio_test.TestFastAvro.test_split_points* > fail with: > > {code:java} > Traceback (most recent call last): > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", > line 308, in test_split_points > self.assertEquals(split_points_report[-10:], [(2, 1)] * 10) > AssertionError: Lists differ: [(10, 1), (10, 1), (10, 1), (10, 1), (10, 1[42 > chars], 1)] != [(2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2[32 chars], 1)] > First differing element 0: > (10, 1) > (2, 1) > + [(2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), > (2, 1)] > - [(10, 1), > - (10, 1), > - (10, 1), > - (10, 1), > - (10, 1), > - (10, 1), > - (10, 1), > - (10, 1), > - (10, 1), > - (10, 1)] {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6726) Gradle Publish fails with Gradle 5
[ https://issues.apache.org/jira/browse/BEAM-6726?focusedWorklogId=211156=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211156 ] ASF GitHub Bot logged work on BEAM-6726: Author: ASF GitHub Bot Created on: 11/Mar/19 17:18 Start Date: 11/Mar/19 17:18 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #8026: [BEAM-6726] explicitly specify signing key URL: https://github.com/apache/beam/pull/8026#discussion_r264339885 ## File path: release/src/main/scripts/build_release_candidate.sh ## @@ -56,12 +56,19 @@ read USER_GITHUB_ID USER_REMOTE_URL=g...@github.com:${USER_GITHUB_ID}/beam-site +echo "Listing all GPG keys=" +gpg --list-keys --keyid-format LONG --fingerprint --fingerprint +echo "Please copy the public key which is associated with your Apache account:" + +read SIGNING_KEY Review comment: Sounds good. Do you mind add a JIRA todo comment here to clean this up in all scripts? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211156) Time Spent: 3h 40m (was: 3.5h) > Gradle Publish fails with Gradle 5 > -- > > Key: BEAM-6726 > URL: https://issues.apache.org/jira/browse/BEAM-6726 > Project: Beam > Issue Type: Bug > Components: build-system >Affects Versions: 2.11.0 >Reporter: Ahmet Altay >Assignee: Michael Luckey >Priority: Blocker > Fix For: 2.12.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > cc: [~alanmyrvold] [~kenn] > :beam-sdks-java-bom:signMavenJavaPublication task fails with an obscure > error: > (https://scans.gradle.com/s/mcbb4axlx6agy/failure?openFailures=WzBd=WzFd#top=0): > Duplicate key pom-default.xml.asc:xml.asc:asc:null (attempted merging values > Signature pom-default.xml.asc:xml.asc:asc:null and Signature > pom-default.xml.asc:xml.asc:asc:null) > Downgrading to Gradle 4 by reverting > https://github.com/apache/beam/commit/cadb6f7fabc6faedc6037104338306688f17652f > works. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6726) Gradle Publish fails with Gradle 5
[ https://issues.apache.org/jira/browse/BEAM-6726?focusedWorklogId=211155=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211155 ] ASF GitHub Bot logged work on BEAM-6726: Author: ASF GitHub Bot Created on: 11/Mar/19 17:18 Start Date: 11/Mar/19 17:18 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #8026: [BEAM-6726] explicitly specify signing key URL: https://github.com/apache/beam/pull/8026#discussion_r264339709 ## File path: release/src/main/scripts/build_release_candidate.sh ## @@ -98,7 +105,8 @@ if [[ $confirmation = "y" ]]; then echo "2. new rc tag has created in github." echo "-Staging Java Artifacts into Maven---" - ./gradlew publish -PisRelease --no-daemon + gpg --local-user ${SIGNING_KEY} --output /dev/null --sign ~/.bashrc Review comment: Fair enough. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211155) Time Spent: 3.5h (was: 3h 20m) > Gradle Publish fails with Gradle 5 > -- > > Key: BEAM-6726 > URL: https://issues.apache.org/jira/browse/BEAM-6726 > Project: Beam > Issue Type: Bug > Components: build-system >Affects Versions: 2.11.0 >Reporter: Ahmet Altay >Assignee: Michael Luckey >Priority: Blocker > Fix For: 2.12.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > cc: [~alanmyrvold] [~kenn] > :beam-sdks-java-bom:signMavenJavaPublication task fails with an obscure > error: > (https://scans.gradle.com/s/mcbb4axlx6agy/failure?openFailures=WzBd=WzFd#top=0): > Duplicate key pom-default.xml.asc:xml.asc:asc:null (attempted merging values > Signature pom-default.xml.asc:xml.asc:asc:null and Signature > pom-default.xml.asc:xml.asc:asc:null) > Downgrading to Gradle 4 by reverting > https://github.com/apache/beam/commit/cadb6f7fabc6faedc6037104338306688f17652f > works. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6726) Gradle Publish fails with Gradle 5
[ https://issues.apache.org/jira/browse/BEAM-6726?focusedWorklogId=211146=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211146 ] ASF GitHub Bot logged work on BEAM-6726: Author: ASF GitHub Bot Created on: 11/Mar/19 17:16 Start Date: 11/Mar/19 17:16 Worklog Time Spent: 10m Work Description: adude3141 commented on issue #8026: [BEAM-6726] explicitly specify signing key URL: https://github.com/apache/beam/pull/8026#issuecomment-471635008 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211146) Time Spent: 3h 10m (was: 3h) > Gradle Publish fails with Gradle 5 > -- > > Key: BEAM-6726 > URL: https://issues.apache.org/jira/browse/BEAM-6726 > Project: Beam > Issue Type: Bug > Components: build-system >Affects Versions: 2.11.0 >Reporter: Ahmet Altay >Assignee: Michael Luckey >Priority: Blocker > Fix For: 2.12.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > cc: [~alanmyrvold] [~kenn] > :beam-sdks-java-bom:signMavenJavaPublication task fails with an obscure > error: > (https://scans.gradle.com/s/mcbb4axlx6agy/failure?openFailures=WzBd=WzFd#top=0): > Duplicate key pom-default.xml.asc:xml.asc:asc:null (attempted merging values > Signature pom-default.xml.asc:xml.asc:asc:null and Signature > pom-default.xml.asc:xml.asc:asc:null) > Downgrading to Gradle 4 by reverting > https://github.com/apache/beam/commit/cadb6f7fabc6faedc6037104338306688f17652f > works. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6726) Gradle Publish fails with Gradle 5
[ https://issues.apache.org/jira/browse/BEAM-6726?focusedWorklogId=211147=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211147 ] ASF GitHub Bot logged work on BEAM-6726: Author: ASF GitHub Bot Created on: 11/Mar/19 17:16 Start Date: 11/Mar/19 17:16 Worklog Time Spent: 10m Work Description: adude3141 commented on issue #8026: [BEAM-6726] explicitly specify signing key URL: https://github.com/apache/beam/pull/8026#issuecomment-471635196 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211147) Time Spent: 3h 20m (was: 3h 10m) > Gradle Publish fails with Gradle 5 > -- > > Key: BEAM-6726 > URL: https://issues.apache.org/jira/browse/BEAM-6726 > Project: Beam > Issue Type: Bug > Components: build-system >Affects Versions: 2.11.0 >Reporter: Ahmet Altay >Assignee: Michael Luckey >Priority: Blocker > Fix For: 2.12.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > cc: [~alanmyrvold] [~kenn] > :beam-sdks-java-bom:signMavenJavaPublication task fails with an obscure > error: > (https://scans.gradle.com/s/mcbb4axlx6agy/failure?openFailures=WzBd=WzFd#top=0): > Duplicate key pom-default.xml.asc:xml.asc:asc:null (attempted merging values > Signature pom-default.xml.asc:xml.asc:asc:null and Signature > pom-default.xml.asc:xml.asc:asc:null) > Downgrading to Gradle 4 by reverting > https://github.com/apache/beam/commit/cadb6f7fabc6faedc6037104338306688f17652f > works. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6726) Gradle Publish fails with Gradle 5
[ https://issues.apache.org/jira/browse/BEAM-6726?focusedWorklogId=211143=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211143 ] ASF GitHub Bot logged work on BEAM-6726: Author: ASF GitHub Bot Created on: 11/Mar/19 17:14 Start Date: 11/Mar/19 17:14 Worklog Time Spent: 10m Work Description: adude3141 commented on pull request #8026: [BEAM-6726] explicitly specify signing key URL: https://github.com/apache/beam/pull/8026#discussion_r264338204 ## File path: release/src/main/scripts/build_release_candidate.sh ## @@ -56,12 +56,19 @@ read USER_GITHUB_ID USER_REMOTE_URL=g...@github.com:${USER_GITHUB_ID}/beam-site +echo "Listing all GPG keys=" +gpg --list-keys --keyid-format LONG --fingerprint --fingerprint +echo "Please copy the public key which is associated with your Apache account:" + +read SIGNING_KEY Review comment: Probably yes. But we did not check before [1], so I did not bother to implement this. As it probably would require to keep some state across scripts. Currently this is left to manual release verification. As I tend to assume that these script need some rework anyway, I restricted the scope of this PR to a minimal viable solution to get release enabled on gradle5. [1] There is no check on signing key set in git config against key put into KEYS file nor against default key used for signing artefacts. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211143) Time Spent: 3h (was: 2h 50m) > Gradle Publish fails with Gradle 5 > -- > > Key: BEAM-6726 > URL: https://issues.apache.org/jira/browse/BEAM-6726 > Project: Beam > Issue Type: Bug > Components: build-system >Affects Versions: 2.11.0 >Reporter: Ahmet Altay >Assignee: Michael Luckey >Priority: Blocker > Fix For: 2.12.0 > > Time Spent: 3h > Remaining Estimate: 0h > > cc: [~alanmyrvold] [~kenn] > :beam-sdks-java-bom:signMavenJavaPublication task fails with an obscure > error: > (https://scans.gradle.com/s/mcbb4axlx6agy/failure?openFailures=WzBd=WzFd#top=0): > Duplicate key pom-default.xml.asc:xml.asc:asc:null (attempted merging values > Signature pom-default.xml.asc:xml.asc:asc:null and Signature > pom-default.xml.asc:xml.asc:asc:null) > Downgrading to Gradle 4 by reverting > https://github.com/apache/beam/commit/cadb6f7fabc6faedc6037104338306688f17652f > works. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6803) Do not use conscrypt SSL by default
[ https://issues.apache.org/jira/browse/BEAM-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-6803: -- Fix Version/s: (was: 2.9.0) 2.7.1 > Do not use conscrypt SSL by default > --- > > Key: BEAM-6803 > URL: https://issues.apache.org/jira/browse/BEAM-6803 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Ahmet Altay >Assignee: Ahmet Altay >Priority: Blocker > Labels: triaged > Fix For: 2.7.1 > > > An experimental flag is being added to disable it for now with an option to > enable it per-workflow. > Also related: > https://issues.apache.org/jira/browse/BEAM-5747 - Upgrade conscrypt to its > latest version. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6803) LTS backport: Do not use conscrypt SSL by default
[ https://issues.apache.org/jira/browse/BEAM-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-6803: -- Priority: Major (was: Blocker) > LTS backport: Do not use conscrypt SSL by default > - > > Key: BEAM-6803 > URL: https://issues.apache.org/jira/browse/BEAM-6803 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Ahmet Altay >Assignee: Ahmet Altay >Priority: Major > Labels: triaged > Fix For: 2.7.1 > > > An experimental flag is being added to disable it for now with an option to > enable it per-workflow. > Also related: > https://issues.apache.org/jira/browse/BEAM-5747 - Upgrade conscrypt to its > latest version. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6182) Use of conscrypt SSL results in stuck workflows in Dataflow
[ https://issues.apache.org/jira/browse/BEAM-6182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles resolved BEAM-6182. --- Resolution: Fixed Assignee: Ahmet Altay (was: Tyler Akidau) > Use of conscrypt SSL results in stuck workflows in Dataflow > --- > > Key: BEAM-6182 > URL: https://issues.apache.org/jira/browse/BEAM-6182 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Ahmet Altay >Assignee: Ahmet Altay >Priority: Blocker > Labels: triaged > Fix For: 2.9.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > An experimental flag is being added to disable it for now with an option to > enable it per-workflow. > Also related: > https://issues.apache.org/jira/browse/BEAM-5747 - Upgrade conscrypt to its > latest version. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6726) Gradle Publish fails with Gradle 5
[ https://issues.apache.org/jira/browse/BEAM-6726?focusedWorklogId=211132=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211132 ] ASF GitHub Bot logged work on BEAM-6726: Author: ASF GitHub Bot Created on: 11/Mar/19 17:05 Start Date: 11/Mar/19 17:05 Worklog Time Spent: 10m Work Description: adude3141 commented on pull request #8026: [BEAM-6726] explicitly specify signing key URL: https://github.com/apache/beam/pull/8026#discussion_r264334366 ## File path: release/src/main/scripts/build_release_candidate.sh ## @@ -98,7 +105,8 @@ if [[ $confirmation = "y" ]]; then echo "2. new rc tag has created in github." echo "-Staging Java Artifacts into Maven---" - ./gradlew publish -PisRelease --no-daemon + gpg --local-user ${SIGNING_KEY} --output /dev/null --sign ~/.bashrc Review comment: No. It is to ensure the key is unlocked so gpg-agent will just provide access to the key without requesting for user input within gradle call. As gradle is configured to shell out to gpg cli, streams get broken and no input is possible. And yes, this will break, if .bashrc does not exist. But the same pattern was used before [1], so I just reused that. Of course, we might reconsider that. [1] https://github.com/apache/beam/blob/master/release/src/main/scripts/verify_release_build.sh#L140 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 211132) Time Spent: 2h 50m (was: 2h 40m) > Gradle Publish fails with Gradle 5 > -- > > Key: BEAM-6726 > URL: https://issues.apache.org/jira/browse/BEAM-6726 > Project: Beam > Issue Type: Bug > Components: build-system >Affects Versions: 2.11.0 >Reporter: Ahmet Altay >Assignee: Michael Luckey >Priority: Blocker > Fix For: 2.12.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > cc: [~alanmyrvold] [~kenn] > :beam-sdks-java-bom:signMavenJavaPublication task fails with an obscure > error: > (https://scans.gradle.com/s/mcbb4axlx6agy/failure?openFailures=WzBd=WzFd#top=0): > Duplicate key pom-default.xml.asc:xml.asc:asc:null (attempted merging values > Signature pom-default.xml.asc:xml.asc:asc:null and Signature > pom-default.xml.asc:xml.asc:asc:null) > Downgrading to Gradle 4 by reverting > https://github.com/apache/beam/commit/cadb6f7fabc6faedc6037104338306688f17652f > works. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6802) Re-enable conscrypt SSL as default when possible
[ https://issues.apache.org/jira/browse/BEAM-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-6802: -- Priority: Major (was: Blocker) > Re-enable conscrypt SSL as default when possible > > > Key: BEAM-6802 > URL: https://issues.apache.org/jira/browse/BEAM-6802 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Ahmet Altay >Assignee: Ahmet Altay >Priority: Major > Labels: triaged > > An experimental flag is being added to disable it for now with an option to > enable it per-workflow. > Also related: > https://issues.apache.org/jira/browse/BEAM-5747 - Upgrade conscrypt to its > latest version. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6802) Re-enable conscrypt SSL as default when possible
[ https://issues.apache.org/jira/browse/BEAM-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-6802: -- Fix Version/s: (was: 2.9.0) > Re-enable conscrypt SSL as default when possible > > > Key: BEAM-6802 > URL: https://issues.apache.org/jira/browse/BEAM-6802 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Ahmet Altay >Assignee: Ahmet Altay >Priority: Blocker > Labels: triaged > > An experimental flag is being added to disable it for now with an option to > enable it per-workflow. > Also related: > https://issues.apache.org/jira/browse/BEAM-5747 - Upgrade conscrypt to its > latest version. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6803) LTS backport: Do not use conscrypt SSL by default
[ https://issues.apache.org/jira/browse/BEAM-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-6803: -- Summary: LTS backport: Do not use conscrypt SSL by default (was: Do not use conscrypt SSL by default) > LTS backport: Do not use conscrypt SSL by default > - > > Key: BEAM-6803 > URL: https://issues.apache.org/jira/browse/BEAM-6803 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Ahmet Altay >Assignee: Ahmet Altay >Priority: Blocker > Labels: triaged > Fix For: 2.7.1 > > > An experimental flag is being added to disable it for now with an option to > enable it per-workflow. > Also related: > https://issues.apache.org/jira/browse/BEAM-5747 - Upgrade conscrypt to its > latest version. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6803) Do not use conscrypt SSL by default
Kenneth Knowles created BEAM-6803: - Summary: Do not use conscrypt SSL by default Key: BEAM-6803 URL: https://issues.apache.org/jira/browse/BEAM-6803 Project: Beam Issue Type: Bug Components: runner-dataflow Reporter: Ahmet Altay Assignee: Ahmet Altay Fix For: 2.9.0 An experimental flag is being added to disable it for now with an option to enable it per-workflow. Also related: https://issues.apache.org/jira/browse/BEAM-5747 - Upgrade conscrypt to its latest version. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6802) Re-enable conscrypt SSL as default when possible
Kenneth Knowles created BEAM-6802: - Summary: Re-enable conscrypt SSL as default when possible Key: BEAM-6802 URL: https://issues.apache.org/jira/browse/BEAM-6802 Project: Beam Issue Type: Bug Components: runner-dataflow Reporter: Ahmet Altay Assignee: Ahmet Altay Fix For: 2.9.0 An experimental flag is being added to disable it for now with an option to enable it per-workflow. Also related: https://issues.apache.org/jira/browse/BEAM-5747 - Upgrade conscrypt to its latest version. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6182) Use of conscrypt SSL results in stuck workflows in Dataflow
[ https://issues.apache.org/jira/browse/BEAM-6182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789760#comment-16789760 ] Kenneth Knowles commented on BEAM-6182: --- The blog post mentions it but the auto-generated release notes are missing this one. Probably fine. I think I will resolve this to 2.9.0 anyhow and create clones for other actions. > Use of conscrypt SSL results in stuck workflows in Dataflow > --- > > Key: BEAM-6182 > URL: https://issues.apache.org/jira/browse/BEAM-6182 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Ahmet Altay >Assignee: Tyler Akidau >Priority: Blocker > Labels: triaged > Time Spent: 1.5h > Remaining Estimate: 0h > > An experimental flag is being added to disable it for now with an option to > enable it per-workflow. > Also related: > https://issues.apache.org/jira/browse/BEAM-5747 - Upgrade conscrypt to its > latest version. -- This message was sent by Atlassian JIRA (v7.6.3#76005)