[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard
[ https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185149=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185149 ] ASF GitHub Bot logged work on BEAM-6427: Author: ASF GitHub Bot Created on: 15/Jan/19 06:27 Start Date: 15/Jan/19 06:27 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #7504: [BEAM-6427] INTERSECT ALL is not compatible with SQL standard. URL: https://github.com/apache/beam/pull/7504 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185149) Time Spent: 2h 10m (was: 2h) > INTERSECT ALL is not compatible with SQL standard > - > > Key: BEAM-6427 > URL: https://issues.apache.org/jira/browse/BEAM-6427 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > say Row R appears m times on one side and n times on another side. > Beam's INTERSECT ALL is implemented to return MAX(m, n). > And SQL1999 standard says INTERSECT ALL should return MIN(m, n). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6430) EXCEPT is not compatible with SQL standard
[ https://issues.apache.org/jira/browse/BEAM-6430?focusedWorklogId=185147=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185147 ] ASF GitHub Bot logged work on BEAM-6430: Author: ASF GitHub Bot Created on: 15/Jan/19 06:26 Start Date: 15/Jan/19 06:26 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #7506: [BEAM-6430] Fix EXCEPT URL: https://github.com/apache/beam/pull/7506 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185147) Time Spent: 1h 10m (was: 1h) > EXCEPT is not compatible with SQL standard > -- > > Key: BEAM-6430 > URL: https://issues.apache.org/jira/browse/BEAM-6430 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > EXCEPT ALL should return MAX(m - n, 0). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard
[ https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185146=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185146 ] ASF GitHub Bot logged work on BEAM-6427: Author: ASF GitHub Bot Created on: 15/Jan/19 06:25 Start Date: 15/Jan/19 06:25 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #7504: [BEAM-6427] INTERSECT ALL is not compatible with SQL standard. URL: https://github.com/apache/beam/pull/7504#discussion_r247775196 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java ## @@ -80,8 +80,23 @@ public void processElement(ProcessContext ctx) { case INTERSECT: if (leftRows.iterator().hasNext() && rightRows.iterator().hasNext()) { if (all) { - for (Row leftRow : leftRows) { -ctx.output(leftRow); + Iterator iter = leftRows.iterator(); + int leftCount = 0; + int rightCount = 0; + while (iter.hasNext()) { +iter.next(); +leftCount++; + } + iter = rightRows.iterator(); + while (iter.hasNext()) { +iter.next(); +rightCount++; + } + + // output MIN(m, n) + iter = (leftCount <= rightCount) ? leftRows.iterator() : rightRows.iterator(); + while (iter.hasNext()) { Review comment: I had the same thought. To do the `row -> KV` without a shuffle (and we definitely want to do some version of it before a shuffle) there are lots of performance considerations and some nondeterminism. The Dataflow implementation is now open source [here](https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/PartialGroupByKeyParDoFns.java) and we could do something like that in a `ParDo`. Data consistency requires that this be followed up by a full combine. The intermediate results do not work as any kind of `Rel` but we _can_ use this as intermediate PCollections just for doing set operations. It would have to actually be: `row -> KV -> CoGBK -> KV` but small iterables of the partial counts that are easy to sum inline with the emit logic. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185146) Time Spent: 2h (was: 1h 50m) > INTERSECT ALL is not compatible with SQL standard > - > > Key: BEAM-6427 > URL: https://issues.apache.org/jira/browse/BEAM-6427 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > say Row R appears m times on one side and n times on another side. > Beam's INTERSECT ALL is implemented to return MAX(m, n). > And SQL1999 standard says INTERSECT ALL should return MIN(m, n). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard
[ https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185135 ] ASF GitHub Bot logged work on BEAM-6427: Author: ASF GitHub Bot Created on: 15/Jan/19 05:52 Start Date: 15/Jan/19 05:52 Worklog Time Spent: 10m Work Description: akedin commented on pull request #7504: [BEAM-6427] INTERSECT ALL is not compatible with SQL standard. URL: https://github.com/apache/beam/pull/7504#discussion_r247771055 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java ## @@ -80,8 +80,23 @@ public void processElement(ProcessContext ctx) { case INTERSECT: if (leftRows.iterator().hasNext() && rightRows.iterator().hasNext()) { if (all) { - for (Row leftRow : leftRows) { -ctx.output(leftRow); + Iterator iter = leftRows.iterator(); + int leftCount = 0; + int rightCount = 0; + while (iter.hasNext()) { +iter.next(); +leftCount++; + } + iter = rightRows.iterator(); + while (iter.hasNext()) { +iter.next(); +rightCount++; + } + + // output MIN(m, n) + iter = (leftCount <= rightCount) ? leftRows.iterator() : rightRows.iterator(); + while (iter.hasNext()) { Review comment: Can we instead potentially implement it with a combiner on both inputs, that counts the rows? I.e. `row -> KV -> CoGBK -> emit min(count) number of times`? Would it be more optimal? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185135) Time Spent: 1h 50m (was: 1h 40m) > INTERSECT ALL is not compatible with SQL standard > - > > Key: BEAM-6427 > URL: https://issues.apache.org/jira/browse/BEAM-6427 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > say Row R appears m times on one side and n times on another side. > Beam's INTERSECT ALL is implemented to return MAX(m, n). > And SQL1999 standard says INTERSECT ALL should return MIN(m, n). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard
[ https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185133 ] ASF GitHub Bot logged work on BEAM-6427: Author: ASF GitHub Bot Created on: 15/Jan/19 05:49 Start Date: 15/Jan/19 05:49 Worklog Time Spent: 10m Work Description: akedin commented on pull request #7504: [BEAM-6427] INTERSECT ALL is not compatible with SQL standard. URL: https://github.com/apache/beam/pull/7504#discussion_r247770795 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java ## @@ -80,8 +80,23 @@ public void processElement(ProcessContext ctx) { case INTERSECT: if (leftRows.iterator().hasNext() && rightRows.iterator().hasNext()) { if (all) { - for (Row leftRow : leftRows) { -ctx.output(leftRow); + Iterator iter = leftRows.iterator(); + int leftCount = 0; + int rightCount = 0; + while (iter.hasNext()) { +iter.next(); +leftCount++; + } + iter = rightRows.iterator(); + while (iter.hasNext()) { +iter.next(); +rightCount++; + } + + // output MIN(m, n) + iter = (leftCount <= rightCount) ? leftRows.iterator() : rightRows.iterator(); + while (iter.hasNext()) { Review comment: I see. What I was missing is how this whole thing works, i.e. that it runs `CoGBK` with keys and values being the whole row, producing a `CoGBKResult` with a collection of duplicates for each distinct row, which we then emit here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185133) Time Spent: 1h 40m (was: 1.5h) > INTERSECT ALL is not compatible with SQL standard > - > > Key: BEAM-6427 > URL: https://issues.apache.org/jira/browse/BEAM-6427 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > say Row R appears m times on one side and n times on another side. > Beam's INTERSECT ALL is implemented to return MAX(m, n). > And SQL1999 standard says INTERSECT ALL should return MIN(m, n). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6430) EXCEPT is not compatible with SQL standard
[ https://issues.apache.org/jira/browse/BEAM-6430?focusedWorklogId=185125=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185125 ] ASF GitHub Bot logged work on BEAM-6430: Author: ASF GitHub Bot Created on: 15/Jan/19 05:20 Start Date: 15/Jan/19 05:20 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #7506: [BEAM-6430] Fix EXCEPT URL: https://github.com/apache/beam/pull/7506#issuecomment-454270404 run java precommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185125) Time Spent: 1h (was: 50m) > EXCEPT is not compatible with SQL standard > -- > > Key: BEAM-6430 > URL: https://issues.apache.org/jira/browse/BEAM-6430 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > EXCEPT ALL should return MAX(m - n, 0). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard
[ https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185113=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185113 ] ASF GitHub Bot logged work on BEAM-6427: Author: ASF GitHub Bot Created on: 15/Jan/19 04:34 Start Date: 15/Jan/19 04:34 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #7504: [BEAM-6427] INTERSECT ALL is not compatible with SQL standard. URL: https://github.com/apache/beam/pull/7504#discussion_r247761589 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java ## @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.impl.transform; +import com.google.common.collect.Iterators; Review comment: using vendored Guava now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185113) Time Spent: 1.5h (was: 1h 20m) > INTERSECT ALL is not compatible with SQL standard > - > > Key: BEAM-6427 > URL: https://issues.apache.org/jira/browse/BEAM-6427 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > say Row R appears m times on one side and n times on another side. > Beam's INTERSECT ALL is implemented to return MAX(m, n). > And SQL1999 standard says INTERSECT ALL should return MIN(m, n). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6418) beam_PreCommit_Java_Cron failing
[ https://issues.apache.org/jira/browse/BEAM-6418?focusedWorklogId=185115=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185115 ] ASF GitHub Bot logged work on BEAM-6418: Author: ASF GitHub Bot Created on: 15/Jan/19 04:40 Start Date: 15/Jan/19 04:40 Worklog Time Spent: 10m Work Description: mxm commented on issue #7510: [BEAM-6418] Execute Flink tests serially to avoid memory issues URL: https://github.com/apache/beam/pull/7510#issuecomment-454265116 The serial solution makes sense, should this be a harder to fix problem. It is also inconvenient that at the moment the tests can't be executed at all for 1.6/1.7 without manual intervention. Attempting to fix this here: https://github.com/apache/beam/pull/7512 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185115) Time Spent: 1h 40m (was: 1.5h) > beam_PreCommit_Java_Cron failing > > > Key: BEAM-6418 > URL: https://issues.apache.org/jira/browse/BEAM-6418 > Project: Beam > Issue Type: Bug > Components: runner-flink, test-failures >Reporter: Ahmet Altay >Assignee: Maximilian Michels >Priority: Critical > Time Spent: 1h 40m > Remaining Estimate: 0h > > [https://builds.apache.org/job/beam_PreCommit_Java_Cron/814/console] > > *16:22:16* 1: Task failed with an exception.*16:22:16* ---*16:22:16* > * What went wrong:*16:22:16* Execution failed for task > ':beam-runners-flink-1.6:test'.*16:22:16* > Process 'Gradle Test Executor > 108' finished with non-zero exit value 137*16:22:16* This problem might be > caused by incorrect test process configuration.*16:22:16* Please refer to > the test execution section in the user guide at > [https://docs.gradle.org/4.10.3/userguide/java_plugin.html#sec:test_execution] > *16:22:16* *16:22:16* * Try:*16:22:16* Run with --stacktrace option to get > the stack trace. Run with --info or --debug option to get more log output. > Run with --scan to get full insights.*16:22:16* > ==*16:22:16* > *16:22:16* 2: Task failed with an exception.*16:22:16* ---*16:22:16* > * What went wrong:*16:22:16* Execution failed for task > ':beam-runners-flink_2.11:test'.*16:22:16* > Process 'Gradle Test Executor > 110' finished with non-zero exit value 1*16:22:16* This problem might be > caused by incorrect test process configuration.*16:22:16* Please refer to > the test execution section in the user guide at > [https://docs.gradle.org/4.10.3/userguide/java_plugin.html#sec:test_execution] > *16:22:16* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6418) beam_PreCommit_Java_Cron failing
[ https://issues.apache.org/jira/browse/BEAM-6418?focusedWorklogId=185114=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185114 ] ASF GitHub Bot logged work on BEAM-6418: Author: ASF GitHub Bot Created on: 15/Jan/19 04:38 Start Date: 15/Jan/19 04:38 Worklog Time Spent: 10m Work Description: mxm commented on pull request #7512: [BEAM-6418] Lower memory consumption of Flink integration tests URL: https://github.com/apache/beam/pull/7512 1) The portability integration tests bring up Flink clusters and the FnHarness. Ensure that not multiple instances of those exist in parallel. 2) If not specified, set 2 as the default parallelism for all tests Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185114) Time Spent: 1.5h (was: 1h 20m) > beam_PreCommit_Java_Cron failing > > > Key: BEAM-6418 > URL: https://issues.apache.org/jira/browse/BEAM-6418 > Project: Beam > Issue Type: Bug > Components: runner-flink, test-failures >Reporter: Ahmet Altay >Assignee: Maximilian Michels >Priority: Critical > Time Spent: 1.5h > Remaining Estimate: 0h > > [https://builds.apache.org/job/beam_PreCommit_Java_Cron/814/console] > >
[jira] [Work logged] (BEAM-6430) EXCEPT is not compatible with SQL standard
[ https://issues.apache.org/jira/browse/BEAM-6430?focusedWorklogId=185112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185112 ] ASF GitHub Bot logged work on BEAM-6430: Author: ASF GitHub Bot Created on: 15/Jan/19 04:34 Start Date: 15/Jan/19 04:34 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #7506: [BEAM-6430] Fix EXCEPT URL: https://github.com/apache/beam/pull/7506#discussion_r247761532 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java ## @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.impl.transform; +import com.google.common.collect.Iterators; Review comment: using vendored Guava now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185112) Time Spent: 50m (was: 40m) > EXCEPT is not compatible with SQL standard > -- > > Key: BEAM-6430 > URL: https://issues.apache.org/jira/browse/BEAM-6430 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > EXCEPT ALL should return MAX(m - n, 0). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6432) Set dependent libraries' versions for the starter archetype automatically
[ https://issues.apache.org/jira/browse/BEAM-6432?focusedWorklogId=185111=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185111 ] ASF GitHub Bot logged work on BEAM-6432: Author: ASF GitHub Bot Created on: 15/Jan/19 04:26 Start Date: 15/Jan/19 04:26 Worklog Time Spent: 10m Work Description: sekikn commented on pull request #7511: [BEAM-6432] Set dependent libraries' versions for the starter archetype automatically URL: https://github.com/apache/beam/pull/7511 For now, users have to replace the placeholders in pom.xml generated by beam-sdks-java-maven-archetypes-starter themselves. This PR makes it to be replaced automatically, just like beam-sdks-java-maven-archetypes-examples does. Follow this checklist to help us incorporate your contribution quickly and easily: - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at:
[jira] [Created] (BEAM-6432) Set dependent libraries' versions for the starter archetype automatically
Kengo Seki created BEAM-6432: Summary: Set dependent libraries' versions for the starter archetype automatically Key: BEAM-6432 URL: https://issues.apache.org/jira/browse/BEAM-6432 Project: Beam Issue Type: Improvement Components: examples-java Reporter: Kengo Seki Assignee: Kengo Seki I generated an empty project from beam-sdks-java-maven-archetypes-starter and found that I had to replace the placeholders for dependency versions ({{@...version@}}) with concrete values myself. It'd be convenient for users if they were automatically replaced, just like beam-sdks-java-maven-archetypes-examples do. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6430) EXCEPT is not compatible with SQL standard
[ https://issues.apache.org/jira/browse/BEAM-6430?focusedWorklogId=185100=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185100 ] ASF GitHub Bot logged work on BEAM-6430: Author: ASF GitHub Bot Created on: 15/Jan/19 03:41 Start Date: 15/Jan/19 03:41 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #7506: [BEAM-6430] Fix EXCEPT URL: https://github.com/apache/beam/pull/7506#discussion_r247755699 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java ## @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.impl.transform; +import com.google.common.collect.Iterators; Review comment: Use vendored Guava here too. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185100) Time Spent: 40m (was: 0.5h) > EXCEPT is not compatible with SQL standard > -- > > Key: BEAM-6430 > URL: https://issues.apache.org/jira/browse/BEAM-6430 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > EXCEPT ALL should return MAX(m - n, 0). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6404) FnAPI translation error
[ https://issues.apache.org/jira/browse/BEAM-6404?focusedWorklogId=185097=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185097 ] ASF GitHub Bot logged work on BEAM-6404: Author: ASF GitHub Bot Created on: 15/Jan/19 03:31 Start Date: 15/Jan/19 03:31 Worklog Time Spent: 10m Work Description: udim commented on pull request #7456: [BEAM-6404] Fix issue with side inputs and flatten encoding. URL: https://github.com/apache/beam/pull/7456#discussion_r247754156 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner_transforms.py ## @@ -591,7 +591,6 @@ def sink_flattens(stages, pipeline_context): if pcollections[pcoll_in].coder_id != output_coder_id: # Flatten inputs must all be written with the same coder as is Review comment: I think this comment would be better like this: `Flatten requires that all its inputs use the same coder as its output. Add stages to transcode Flatten inputs that do not use the same coder.` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185097) Time Spent: 0.5h (was: 20m) > FnAPI translation error > --- > > Key: BEAM-6404 > URL: https://issues.apache.org/jira/browse/BEAM-6404 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > {code:java} > def run(argv=None): > parser = argparse.ArgumentParser() > _, pipeline_args = parser.parse_known_args(argv) > options = pipeline_options.PipelineOptions(pipeline_args) > numbers = [1, 2] > with beam.Pipeline(options=options) as p: > sum_1 = (p > | 'ReadNumber1' >> transforms.Create(numbers) > | 'CalculateSum1' >> beam.CombineGlobally(fn_sum)) > sum_2 = (p > | 'ReadNumber2' >> transforms.Create(numbers) > | beam.ParDo(_copy_number, pvalue.AsSingleton(sum_1)) > | 'CalculateSum2' >> beam.CombineGlobally(fn_sum)) > _ = ((sum_1, sum_2) > | beam.Flatten() > | 'CalculateSum3' >> beam.CombineGlobally(fn_sum) > | beam.io.WriteToText('out.txt')) > run() > {code} > > fails with > KeyError: u'ref_Coder_FastPrimitivesCoder_4_windowed' -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard
[ https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185099=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185099 ] ASF GitHub Bot logged work on BEAM-6427: Author: ASF GitHub Bot Created on: 15/Jan/19 03:33 Start Date: 15/Jan/19 03:33 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #7504: [BEAM-6427] INTERSECT ALL is not compatible with SQL standard. URL: https://github.com/apache/beam/pull/7504#discussion_r247754564 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java ## @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.impl.transform; +import com.google.common.collect.Iterators; Review comment: Yea, we depend on non-Vendored Guava as well, because `ValuesRel` requires an `ImmutableList` to a protected constructor. Other place we should not use it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185099) Time Spent: 1h 20m (was: 1h 10m) > INTERSECT ALL is not compatible with SQL standard > - > > Key: BEAM-6427 > URL: https://issues.apache.org/jira/browse/BEAM-6427 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > say Row R appears m times on one side and n times on another side. > Beam's INTERSECT ALL is implemented to return MAX(m, n). > And SQL1999 standard says INTERSECT ALL should return MIN(m, n). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6352) Watch PTransform is broken
[ https://issues.apache.org/jira/browse/BEAM-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742715#comment-16742715 ] Kenneth Knowles commented on BEAM-6352: --- One approach to giving access to the delegate in a semi-disciplined way would be to add something like a self-type so that the type can be {{RestrictionTracker>}}. Then the user can get access to the more-specific RestrictionTracker type. But since the whole point of the wrapper is not having this special access, it doesn't seem like an appropriate solution. > Watch PTransform is broken > -- > > Key: BEAM-6352 > URL: https://issues.apache.org/jira/browse/BEAM-6352 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.9.0 >Reporter: Gleb Kanterov >Assignee: Boyuan Zhang >Priority: Blocker > Fix For: 2.10.0 > > > List of affected tests: > org.apache.beam.sdk.transforms.WatchTest > > testSinglePollMultipleInputsWithSideInput FAILED > org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithKeyExtractor > FAILED > org.apache.beam.sdk.transforms.WatchTest > testSinglePollMultipleInputs FAILED > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsWithTerminationDueToTerminationCondition FAILED > org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithManyResults > FAILED > org.apache.beam.sdk.transforms.WatchTest > testSinglePollWithManyResults > FAILED > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsStopAfterTimeSinceNewOutput > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsWithTerminationBecauseOutputIsFinal FAILED > org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > > testContinuouslyWriteAndReadMultipleFilepatterns[0: true] FAILED > org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > > testContinuouslyWriteAndReadMultipleFilepatterns[1: false] FAILED > org.apache.beam.sdk.io.FileIOTest > testMatchWatchForNewFiles FAILED > org.apache.beam.sdk.io.TextIOReadTest$BasicIOTest > testReadWatchForNewFiles > FAILED > {code} > java.lang.IllegalArgumentException: > org.apache.beam.sdk.transforms.Watch$WatchGrowthFn, @ProcessElement > process(ProcessContext, GrowthTracker): Has tracker type > Watch.GrowthTracker, but the DoFn's tracker > type must be of type RestrictionTracker. > {code} > Relevant pull requests: > - https://github.com/apache/beam/pull/6467 > - https://github.com/apache/beam/pull/7374 > Now tests are marked with @Ignore referencing this JIRA issue -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6418) beam_PreCommit_Java_Cron failing
[ https://issues.apache.org/jira/browse/BEAM-6418?focusedWorklogId=185095=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185095 ] ASF GitHub Bot logged work on BEAM-6418: Author: ASF GitHub Bot Created on: 15/Jan/19 03:29 Start Date: 15/Jan/19 03:29 Worklog Time Spent: 10m Work Description: tweise commented on issue #7510: [BEAM-6418] Execute Flink tests serially to avoid memory issues URL: https://github.com/apache/beam/pull/7510#issuecomment-454255003 IMO, there is no urgency to enable these tests again and we can work on a permanent solution instead. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185095) Time Spent: 1h 20m (was: 1h 10m) > beam_PreCommit_Java_Cron failing > > > Key: BEAM-6418 > URL: https://issues.apache.org/jira/browse/BEAM-6418 > Project: Beam > Issue Type: Bug > Components: runner-flink, test-failures >Reporter: Ahmet Altay >Assignee: Maximilian Michels >Priority: Critical > Time Spent: 1h 20m > Remaining Estimate: 0h > > [https://builds.apache.org/job/beam_PreCommit_Java_Cron/814/console] > > *16:22:16* 1: Task failed with an exception.*16:22:16* ---*16:22:16* > * What went wrong:*16:22:16* Execution failed for task > ':beam-runners-flink-1.6:test'.*16:22:16* > Process 'Gradle Test Executor > 108' finished with non-zero exit value 137*16:22:16* This problem might be > caused by incorrect test process configuration.*16:22:16* Please refer to > the test execution section in the user guide at > [https://docs.gradle.org/4.10.3/userguide/java_plugin.html#sec:test_execution] > *16:22:16* *16:22:16* * Try:*16:22:16* Run with --stacktrace option to get > the stack trace. Run with --info or --debug option to get more log output. > Run with --scan to get full insights.*16:22:16* > ==*16:22:16* > *16:22:16* 2: Task failed with an exception.*16:22:16* ---*16:22:16* > * What went wrong:*16:22:16* Execution failed for task > ':beam-runners-flink_2.11:test'.*16:22:16* > Process 'Gradle Test Executor > 110' finished with non-zero exit value 1*16:22:16* This problem might be > caused by incorrect test process configuration.*16:22:16* Please refer to > the test execution section in the user guide at > [https://docs.gradle.org/4.10.3/userguide/java_plugin.html#sec:test_execution] > *16:22:16* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6352) Watch PTransform is broken
[ https://issues.apache.org/jira/browse/BEAM-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742709#comment-16742709 ] Kenneth Knowles commented on BEAM-6352: --- I think SDF is too immature to start with hacks like exposing the delegate or special-casing Watch. If this doesn't fit in cleanly, there's a redesign needed. > Watch PTransform is broken > -- > > Key: BEAM-6352 > URL: https://issues.apache.org/jira/browse/BEAM-6352 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.9.0 >Reporter: Gleb Kanterov >Assignee: Boyuan Zhang >Priority: Blocker > Fix For: 2.10.0 > > > List of affected tests: > org.apache.beam.sdk.transforms.WatchTest > > testSinglePollMultipleInputsWithSideInput FAILED > org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithKeyExtractor > FAILED > org.apache.beam.sdk.transforms.WatchTest > testSinglePollMultipleInputs FAILED > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsWithTerminationDueToTerminationCondition FAILED > org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithManyResults > FAILED > org.apache.beam.sdk.transforms.WatchTest > testSinglePollWithManyResults > FAILED > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsStopAfterTimeSinceNewOutput > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsWithTerminationBecauseOutputIsFinal FAILED > org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > > testContinuouslyWriteAndReadMultipleFilepatterns[0: true] FAILED > org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > > testContinuouslyWriteAndReadMultipleFilepatterns[1: false] FAILED > org.apache.beam.sdk.io.FileIOTest > testMatchWatchForNewFiles FAILED > org.apache.beam.sdk.io.TextIOReadTest$BasicIOTest > testReadWatchForNewFiles > FAILED > {code} > java.lang.IllegalArgumentException: > org.apache.beam.sdk.transforms.Watch$WatchGrowthFn, @ProcessElement > process(ProcessContext, GrowthTracker): Has tracker type > Watch.GrowthTracker, but the DoFn's tracker > type must be of type RestrictionTracker. > {code} > Relevant pull requests: > - https://github.com/apache/beam/pull/6467 > - https://github.com/apache/beam/pull/7374 > Now tests are marked with @Ignore referencing this JIRA issue -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6352) Watch PTransform is broken
[ https://issues.apache.org/jira/browse/BEAM-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742705#comment-16742705 ] Kenneth Knowles commented on BEAM-6352: --- Summarizing some thoughts here. The idea of allowing a subclass of RestrictionTracker is twofold: 1. Something has to actually implement it (RestrictionTracker should be an interface) 2. Each kind of RestrictionTracker may generally have special methods. I think part of the idea of changing number 2 is that the user might actually mess up the implementation. Also that the runner wants to wrap the user's RestrictionTracker class in its own that adds synchronization as necessary. So I wonder if the GrowthTracker extra methods do seem like subverting the main methods or if they can be implemented another way. The only other place that user types can show up is RestrictionT and PositionT. So perhaps the extra methods can go here. I am really not expert in any of this. I am just going on what I can read in PRs and the mailing list. I haven't really followed the SDF details. > Watch PTransform is broken > -- > > Key: BEAM-6352 > URL: https://issues.apache.org/jira/browse/BEAM-6352 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.9.0 >Reporter: Gleb Kanterov >Assignee: Boyuan Zhang >Priority: Blocker > Fix For: 2.10.0 > > > List of affected tests: > org.apache.beam.sdk.transforms.WatchTest > > testSinglePollMultipleInputsWithSideInput FAILED > org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithKeyExtractor > FAILED > org.apache.beam.sdk.transforms.WatchTest > testSinglePollMultipleInputs FAILED > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsWithTerminationDueToTerminationCondition FAILED > org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithManyResults > FAILED > org.apache.beam.sdk.transforms.WatchTest > testSinglePollWithManyResults > FAILED > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsStopAfterTimeSinceNewOutput > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsWithTerminationBecauseOutputIsFinal FAILED > org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > > testContinuouslyWriteAndReadMultipleFilepatterns[0: true] FAILED > org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > > testContinuouslyWriteAndReadMultipleFilepatterns[1: false] FAILED > org.apache.beam.sdk.io.FileIOTest > testMatchWatchForNewFiles FAILED > org.apache.beam.sdk.io.TextIOReadTest$BasicIOTest > testReadWatchForNewFiles > FAILED > {code} > java.lang.IllegalArgumentException: > org.apache.beam.sdk.transforms.Watch$WatchGrowthFn, @ProcessElement > process(ProcessContext, GrowthTracker): Has tracker type > Watch.GrowthTracker, but the DoFn's tracker > type must be of type RestrictionTracker. > {code} > Relevant pull requests: > - https://github.com/apache/beam/pull/6467 > - https://github.com/apache/beam/pull/7374 > Now tests are marked with @Ignore referencing this JIRA issue -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6418) beam_PreCommit_Java_Cron failing
[ https://issues.apache.org/jira/browse/BEAM-6418?focusedWorklogId=185093=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185093 ] ASF GitHub Bot logged work on BEAM-6418: Author: ASF GitHub Bot Created on: 15/Jan/19 03:12 Start Date: 15/Jan/19 03:12 Worklog Time Spent: 10m Work Description: mxm commented on issue #7496: [BEAM-6418] Avoid Jenkins memory problems for Flink build targets URL: https://github.com/apache/beam/pull/7496#issuecomment-454252353 @aaltay Serial execution is possible, I have created a PR: #7510. Ideally, I don't want to increase the PreCommit build time too much due to serial execution, but since other tests can execute in the meantime it might be ok. I'm working on reducing the memory pressure of the tests so we can run them in parallel again. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185093) Time Spent: 1h 10m (was: 1h) > beam_PreCommit_Java_Cron failing > > > Key: BEAM-6418 > URL: https://issues.apache.org/jira/browse/BEAM-6418 > Project: Beam > Issue Type: Bug > Components: runner-flink, test-failures >Reporter: Ahmet Altay >Assignee: Maximilian Michels >Priority: Critical > Time Spent: 1h 10m > Remaining Estimate: 0h > > [https://builds.apache.org/job/beam_PreCommit_Java_Cron/814/console] > > *16:22:16* 1: Task failed with an exception.*16:22:16* ---*16:22:16* > * What went wrong:*16:22:16* Execution failed for task > ':beam-runners-flink-1.6:test'.*16:22:16* > Process 'Gradle Test Executor > 108' finished with non-zero exit value 137*16:22:16* This problem might be > caused by incorrect test process configuration.*16:22:16* Please refer to > the test execution section in the user guide at > [https://docs.gradle.org/4.10.3/userguide/java_plugin.html#sec:test_execution] > *16:22:16* *16:22:16* * Try:*16:22:16* Run with --stacktrace option to get > the stack trace. Run with --info or --debug option to get more log output. > Run with --scan to get full insights.*16:22:16* > ==*16:22:16* > *16:22:16* 2: Task failed with an exception.*16:22:16* ---*16:22:16* > * What went wrong:*16:22:16* Execution failed for task > ':beam-runners-flink_2.11:test'.*16:22:16* > Process 'Gradle Test Executor > 110' finished with non-zero exit value 1*16:22:16* This problem might be > caused by incorrect test process configuration.*16:22:16* Please refer to > the test execution section in the user guide at > [https://docs.gradle.org/4.10.3/userguide/java_plugin.html#sec:test_execution] > *16:22:16* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6418) beam_PreCommit_Java_Cron failing
[ https://issues.apache.org/jira/browse/BEAM-6418?focusedWorklogId=185092=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185092 ] ASF GitHub Bot logged work on BEAM-6418: Author: ASF GitHub Bot Created on: 15/Jan/19 03:08 Start Date: 15/Jan/19 03:08 Worklog Time Spent: 10m Work Description: mxm commented on pull request #7510: [BEAM-6418] Execute Flink tests serially to avoid memory issues URL: https://github.com/apache/beam/pull/7510 We had previously disabled Flink tests for 1.6 and 1.7 due to memory issues when they execute in parallel. This lets them execute one after another. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185092) Time Spent: 1h (was: 50m) > beam_PreCommit_Java_Cron failing > > > Key: BEAM-6418 > URL: https://issues.apache.org/jira/browse/BEAM-6418 > Project: Beam > Issue Type: Bug > Components: runner-flink, test-failures >Reporter: Ahmet Altay >Assignee: Maximilian Michels >Priority: Critical > Time Spent: 1h > Remaining Estimate: 0h > > [https://builds.apache.org/job/beam_PreCommit_Java_Cron/814/console] > > *16:22:16* 1: Task failed with an exception.*16:22:16*
[jira] [Work logged] (BEAM-6290) Make the schema for BQ tables storing metric results more generic (JAVA)
[ https://issues.apache.org/jira/browse/BEAM-6290?focusedWorklogId=185087=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185087 ] ASF GitHub Bot logged work on BEAM-6290: Author: ASF GitHub Bot Created on: 15/Jan/19 02:47 Start Date: 15/Jan/19 02:47 Worklog Time Spent: 10m Work Description: udim commented on pull request #7458: [BEAM-6290] Generic schema for metrics URL: https://github.com/apache/beam/pull/7458#discussion_r247747716 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/LoadTest.java ## @@ -81,36 +87,46 @@ * Runs the load test, collects and publishes test results to various data store and/or console. */ public PipelineResult run() throws IOException { -long testStartTime = System.currentTimeMillis(); +Timestamp timestamp = Timestamp.now(); loadTest(); PipelineResult result = pipeline.run(); result.waitUntilFinish(); -LoadTestResult testResult = LoadTestResult.create(result, metricsNamespace, testStartTime); +String testId = UUID.randomUUID().toString(); +List metrics = readMetrics(timestamp, result, testId); -ConsoleResultPublisher.publish(testResult); +ConsoleResultPublisher.publish(metrics, testId, timestamp.toString()); if (options.getPublishToBigQuery()) { - publishResultToBigQuery(testResult); + publishResultsToBigQuery(metrics); } return result; } - private void publishResultToBigQuery(LoadTestResult testResult) { + private List readMetrics(Timestamp timestamp, PipelineResult result, String testId) { +MetricsReader reader = new MetricsReader(result, metricsNamespace); + +NamedTestResult runtime = NamedTestResult +.create(testId, timestamp.toString(), "runtime_sec", +(reader.getEndTimeMetric("runtime") - reader.getStartTimeMetric("runtime")) / 1000D); + +NamedTestResult totalBytes = NamedTestResult +.create(testId, timestamp.toString(), "total_bytes_count", +reader.getCounterMetric("totalBytes.count")); + +return Arrays.asList(runtime, totalBytes); + } + + private void publishResultsToBigQuery(List testResults) { Review comment: Type should be `List`, since you're using the schema from that class. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185087) Time Spent: 1h 10m (was: 1h) > Make the schema for BQ tables storing metric results more generic (JAVA) > > > Key: BEAM-6290 > URL: https://issues.apache.org/jira/browse/BEAM-6290 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently, we keep the metrics results in BQ in tables with a schema like > this: > timestamp | total_bytes | run_time | (possibly other BQ columns) > every time we want to add a new column the schema has to be extended. This is > not convenient given the fact that any load test can have different metrics > stored. This in turn would cause multiple BQ tables each queried differently. > We can provide a more generic schema, like so: > test_id | timestamp | metric | value > thanks to that, every metric, whatever it's name is, can be saved in the > table as a separate row. This gives more elasticity in storing metrics and is > still easy to query and plot. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5319) Finish Python 3 porting for runners module
[ https://issues.apache.org/jira/browse/BEAM-5319?focusedWorklogId=185077=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185077 ] ASF GitHub Bot logged work on BEAM-5319: Author: ASF GitHub Bot Created on: 15/Jan/19 01:51 Start Date: 15/Jan/19 01:51 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #7445: [BEAM-5319] Python 3 port runners module URL: https://github.com/apache/beam/pull/7445 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185077) Time Spent: 6h 20m (was: 6h 10m) > Finish Python 3 porting for runners module > -- > > Key: BEAM-5319 > URL: https://issues.apache.org/jira/browse/BEAM-5319 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Robbe >Priority: Major > Time Spent: 6h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard
[ https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185076=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185076 ] ASF GitHub Bot logged work on BEAM-6427: Author: ASF GitHub Bot Created on: 15/Jan/19 01:51 Start Date: 15/Jan/19 01:51 Worklog Time Spent: 10m Work Description: akedin commented on pull request #7504: [BEAM-6427] INTERSECT ALL is not compatible with SQL standard. URL: https://github.com/apache/beam/pull/7504#discussion_r247740393 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java ## @@ -17,6 +17,7 @@ */ package org.apache.beam.sdk.extensions.sql.impl.transform; +import com.google.common.collect.Iterators; Review comment: If I didn't miss anything we should use vendored guava: https://lists.apache.org/thread.html/6cdeb1f7cf6037c84a249268f641fa1caa52a721af017a1b643d60fa@%3Cdev.beam.apache.org%3E This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185076) Time Spent: 1h 10m (was: 1h) > INTERSECT ALL is not compatible with SQL standard > - > > Key: BEAM-6427 > URL: https://issues.apache.org/jira/browse/BEAM-6427 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > say Row R appears m times on one side and n times on another side. > Beam's INTERSECT ALL is implemented to return MAX(m, n). > And SQL1999 standard says INTERSECT ALL should return MIN(m, n). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6431) Add ExecutionTime metrics to the Beam Java SDK
[ https://issues.apache.org/jira/browse/BEAM-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Amato reassigned BEAM-6431: Assignee: Alex Amato > Add ExecutionTime metrics to the Beam Java SDK > -- > > Key: BEAM-6431 > URL: https://issues.apache.org/jira/browse/BEAM-6431 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > > This will be done by using the Dataflow worker's StateSampler code. I have > put together a refactoring plan > [here|https://docs.google.com/document/d/1OlAJf4T_CTL9WRH8lP8uQOfLjWYfm8IpRXSe38g34k4/edit#] > This will include estimating the processing time for the start, process and > finish bundle. The python SDK already has an implementation of this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6431) Add ExecutionTime metrics to the Beam Java SDK
Alex Amato created BEAM-6431: Summary: Add ExecutionTime metrics to the Beam Java SDK Key: BEAM-6431 URL: https://issues.apache.org/jira/browse/BEAM-6431 Project: Beam Issue Type: New Feature Components: java-fn-execution Reporter: Alex Amato This will be done by using the Dataflow worker's StateSampler code. I have put together a refactoring plan [here|https://docs.google.com/document/d/1OlAJf4T_CTL9WRH8lP8uQOfLjWYfm8IpRXSe38g34k4/edit#] This will include estimating the processing time for the start, process and finish bundle. The python SDK already has an implementation of this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6430) EXCEPT is not compatible with SQL standard
[ https://issues.apache.org/jira/browse/BEAM-6430?focusedWorklogId=185053=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185053 ] ASF GitHub Bot logged work on BEAM-6430: Author: ASF GitHub Bot Created on: 15/Jan/19 00:30 Start Date: 15/Jan/19 00:30 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #7506: [BEAM-6430] Fix EXCEPT URL: https://github.com/apache/beam/pull/7506#issuecomment-454218691 add comment to explain what EXCEPT ALL should behave. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185053) Time Spent: 0.5h (was: 20m) > EXCEPT is not compatible with SQL standard > -- > > Key: BEAM-6430 > URL: https://issues.apache.org/jira/browse/BEAM-6430 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > EXCEPT ALL should return MAX(m - n, 0). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard
[ https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185049 ] ASF GitHub Bot logged work on BEAM-6427: Author: ASF GitHub Bot Created on: 15/Jan/19 00:24 Start Date: 15/Jan/19 00:24 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #7504: [BEAM-6427] INTERSECT ALL is not compatible with SQL standard. URL: https://github.com/apache/beam/pull/7504#discussion_r247721252 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java ## @@ -80,8 +80,23 @@ public void processElement(ProcessContext ctx) { case INTERSECT: if (leftRows.iterator().hasNext() && rightRows.iterator().hasNext()) { if (all) { - for (Row leftRow : leftRows) { -ctx.output(leftRow); + Iterator iter = leftRows.iterator(); + int leftCount = 0; + int rightCount = 0; + while (iter.hasNext()) { +iter.next(); +leftCount++; + } Review comment: Thanks! Using Guava now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185049) Time Spent: 50m (was: 40m) > INTERSECT ALL is not compatible with SQL standard > - > > Key: BEAM-6427 > URL: https://issues.apache.org/jira/browse/BEAM-6427 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > say Row R appears m times on one side and n times on another side. > Beam's INTERSECT ALL is implemented to return MAX(m, n). > And SQL1999 standard says INTERSECT ALL should return MIN(m, n). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6391) small fixes in Python SDK (ArrayOutOfIndex, always true condition, missing raise)
[ https://issues.apache.org/jira/browse/BEAM-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee resolved BEAM-6391. --- Resolution: Fixed Fix Version/s: 2.10.0 > small fixes in Python SDK (ArrayOutOfIndex, always true condition, missing > raise) > - > > Key: BEAM-6391 > URL: https://issues.apache.org/jira/browse/BEAM-6391 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.10.0 >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: 2.10.0 > > Time Spent: 50m > Remaining Estimate: 0h > > ticket for addressing small and easy to fix bugs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6391) small fixes in Python SDK (ArrayOutOfIndex, always true condition, missing raise)
[ https://issues.apache.org/jira/browse/BEAM-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742608#comment-16742608 ] Heejong Lee commented on BEAM-6391: --- new title sounds scarier ... > small fixes in Python SDK (ArrayOutOfIndex, always true condition, missing > raise) > - > > Key: BEAM-6391 > URL: https://issues.apache.org/jira/browse/BEAM-6391 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.10.0 >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > ticket for addressing small and easy to fix bugs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6391) small fixes in Python SDK (ArrayOutOfIndex, always true condition,
[ https://issues.apache.org/jira/browse/BEAM-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee updated BEAM-6391: -- Summary: small fixes in Python SDK (ArrayOutOfIndex, always true condition, (was: fix miscellaneous bugs) > small fixes in Python SDK (ArrayOutOfIndex, always true condition, > --- > > Key: BEAM-6391 > URL: https://issues.apache.org/jira/browse/BEAM-6391 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.10.0 >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > ticket for addressing small and easy to fix bugs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6391) small fixes in Python SDK (ArrayOutOfIndex, always true condition, missing raise)
[ https://issues.apache.org/jira/browse/BEAM-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee updated BEAM-6391: -- Summary: small fixes in Python SDK (ArrayOutOfIndex, always true condition, missing raise) (was: small fixes in Python SDK (ArrayOutOfIndex, always true condition, ) > small fixes in Python SDK (ArrayOutOfIndex, always true condition, missing > raise) > - > > Key: BEAM-6391 > URL: https://issues.apache.org/jira/browse/BEAM-6391 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.10.0 >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > ticket for addressing small and easy to fix bugs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4076) Schema followups
[ https://issues.apache.org/jira/browse/BEAM-4076?focusedWorklogId=185039=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185039 ] ASF GitHub Bot logged work on BEAM-4076: Author: ASF GitHub Bot Created on: 15/Jan/19 00:08 Start Date: 15/Jan/19 00:08 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #7500: [BEAM-4076] Add antlr4 to beam URL: https://github.com/apache/beam/pull/7500#discussion_r247714307 ## File path: sdks/java/core/build.gradle ## @@ -51,9 +58,11 @@ test { } dependencies { + antlr "org.antlr:antlr4:4.7" Review comment: This should be `library.java.antlr` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185039) Time Spent: 16h 50m (was: 16h 40m) > Schema followups > > > Key: BEAM-4076 > URL: https://issues.apache.org/jira/browse/BEAM-4076 > Project: Beam > Issue Type: Improvement > Components: beam-model, dsl-sql, sdk-java-core >Reporter: Kenneth Knowles >Priority: Major > Time Spent: 16h 50m > Remaining Estimate: 0h > > This umbrella bug contains subtasks with followups for Beam schemas, which > were moved from SQL to the core Java SDK and made to be type-name-based > rather than coder based. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4076) Schema followups
[ https://issues.apache.org/jira/browse/BEAM-4076?focusedWorklogId=185040=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185040 ] ASF GitHub Bot logged work on BEAM-4076: Author: ASF GitHub Bot Created on: 15/Jan/19 00:08 Start Date: 15/Jan/19 00:08 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #7500: [BEAM-4076] Add antlr4 to beam URL: https://github.com/apache/beam/pull/7500#discussion_r247714253 ## File path: sdks/java/core/build.gradle ## @@ -51,9 +58,11 @@ test { } dependencies { + antlr "org.antlr:antlr4:4.7" // Required to load constants from the model, e.g. max timestamp for global window shadow project(path: ":beam-model-pipeline", configuration: "shadow") shadow library.java.vendored_guava_20_0 + compile library.java.antlr Review comment: This should be `antlr_runtime`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185040) Time Spent: 17h (was: 16h 50m) > Schema followups > > > Key: BEAM-4076 > URL: https://issues.apache.org/jira/browse/BEAM-4076 > Project: Beam > Issue Type: Improvement > Components: beam-model, dsl-sql, sdk-java-core >Reporter: Kenneth Knowles >Priority: Major > Time Spent: 17h > Remaining Estimate: 0h > > This umbrella bug contains subtasks with followups for Beam schemas, which > were moved from SQL to the core Java SDK and made to be type-name-based > rather than coder based. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard
[ https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185037=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185037 ] ASF GitHub Bot logged work on BEAM-6427: Author: ASF GitHub Bot Created on: 15/Jan/19 00:04 Start Date: 15/Jan/19 00:04 Worklog Time Spent: 10m Work Description: akedin commented on pull request #7504: [BEAM-6427] INTERSECT ALL is not compatible with SQL standard. URL: https://github.com/apache/beam/pull/7504#discussion_r247713398 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java ## @@ -80,8 +80,23 @@ public void processElement(ProcessContext ctx) { case INTERSECT: if (leftRows.iterator().hasNext() && rightRows.iterator().hasNext()) { if (all) { - for (Row leftRow : leftRows) { -ctx.output(leftRow); + Iterator iter = leftRows.iterator(); + int leftCount = 0; + int rightCount = 0; + while (iter.hasNext()) { +iter.next(); +leftCount++; + } Review comment: I think guava has `Iterators.size(iterator)` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185037) Time Spent: 0.5h (was: 20m) > INTERSECT ALL is not compatible with SQL standard > - > > Key: BEAM-6427 > URL: https://issues.apache.org/jira/browse/BEAM-6427 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > say Row R appears m times on one side and n times on another side. > Beam's INTERSECT ALL is implemented to return MAX(m, n). > And SQL1999 standard says INTERSECT ALL should return MIN(m, n). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard
[ https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185038=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185038 ] ASF GitHub Bot logged work on BEAM-6427: Author: ASF GitHub Bot Created on: 15/Jan/19 00:04 Start Date: 15/Jan/19 00:04 Worklog Time Spent: 10m Work Description: akedin commented on pull request #7504: [BEAM-6427] INTERSECT ALL is not compatible with SQL standard. URL: https://github.com/apache/beam/pull/7504#discussion_r247713579 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java ## @@ -80,8 +80,23 @@ public void processElement(ProcessContext ctx) { case INTERSECT: if (leftRows.iterator().hasNext() && rightRows.iterator().hasNext()) { if (all) { - for (Row leftRow : leftRows) { -ctx.output(leftRow); + Iterator iter = leftRows.iterator(); + int leftCount = 0; + int rightCount = 0; + while (iter.hasNext()) { +iter.next(); +leftCount++; + } + iter = rightRows.iterator(); + while (iter.hasNext()) { +iter.next(); +rightCount++; + } + + // output MIN(m, n) + iter = (leftCount <= rightCount) ? leftRows.iterator() : rightRows.iterator(); + while (iter.hasNext()) { Review comment: Looking at this I am not sure I understand how SQL `INTERSECT` is supposed to work, can you add a brief explanation comment? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185038) Time Spent: 40m (was: 0.5h) > INTERSECT ALL is not compatible with SQL standard > - > > Key: BEAM-6427 > URL: https://issues.apache.org/jira/browse/BEAM-6427 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > say Row R appears m times on one side and n times on another side. > Beam's INTERSECT ALL is implemented to return MAX(m, n). > And SQL1999 standard says INTERSECT ALL should return MIN(m, n). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6430) EXCEPT is not compatible with SQL standard
[ https://issues.apache.org/jira/browse/BEAM-6430?focusedWorklogId=185035=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185035 ] ASF GitHub Bot logged work on BEAM-6430: Author: ASF GitHub Bot Created on: 14/Jan/19 23:57 Start Date: 14/Jan/19 23:57 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #7506: [BEAM-6430] Fix EXCEPT URL: https://github.com/apache/beam/pull/7506#issuecomment-454209673 @kennknowles @akedin @apilloud This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185035) Time Spent: 20m (was: 10m) > EXCEPT is not compatible with SQL standard > -- > > Key: BEAM-6430 > URL: https://issues.apache.org/jira/browse/BEAM-6430 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > EXCEPT ALL should return MAX(m - n, 0). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable Python connector
[ https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=185032=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185032 ] ASF GitHub Bot logged work on BEAM-3342: Author: ASF GitHub Bot Created on: 14/Jan/19 23:53 Start Date: 14/Jan/19 23:53 Worklog Time Spent: 10m Work Description: sduskis commented on pull request #7367: [BEAM-3342] Create a Cloud Bigtable Python connector Write URL: https://github.com/apache/beam/pull/7367#discussion_r247711422 ## File path: sdks/python/apache_beam/io/gcp/bigtable_io.py ## @@ -0,0 +1,121 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""BigTable connector + +This module implements writing to BigTable tables. +The default mode is to set row data to write to BigTable tables. +The syntax supported is described here: +https://cloud.google.com/bigtable/docs/quickstart-cbt + +BigTable connector can be used as main outputs. A main output +(common case) is expected to be massive and will be split into +manageable chunks and processed in parallel. In the example below +we created a list of rows then passed to the GeneratedDirectRows +DoFn to set the Cells and then we call the WriteToBigtable to insert +those generated rows in the table. + + main_table = (p + | 'Generate Row Values' >> beam.Create(row_values) + | 'Generate Direct Rows' >> beam.ParDo(GenerateDirectRows()) + | 'Write to BT' >> beam.ParDo(WriteToBigtable(beam_options))) +""" +from __future__ import absolute_import + +import apache_beam as beam +from apache_beam.metrics import Metrics +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.transforms.display import DisplayDataItem + +try: + from google.cloud.bigtable import Client + from google.cloud.bigtable.batcher import MutationsBatcher +except ImportError: + pass + + +class WriteToBigtable(beam.DoFn): + """ Creates the connector can call and add_row to the batcher using each + row in beam pipe line + + :type beam_options: class:`~bigtable_configuration.BigtableConfiguration` + :param beam_options: class `~bigtable_configuration.BigtableConfiguration` + """ + + def __init__(self, beam_options): +super(WriteToBigtable, self).__init__(beam_options) +self.beam_options = beam_options +self.table = None +self.batcher = None +self.written = Metrics.counter(self.__class__, 'Written Row') + + def start_bundle(self): +if self.table is None: + client = Client(project=self.beam_options.project_id) + instance = client.instance(self.beam_options.instance_id) + self.table = instance.table(self.beam_options.table_id) + self.batcher = MutationsBatcher(self.table) Review comment: please unindent `self.batch = ` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185032) Time Spent: 7.5h (was: 7h 20m) > Create a Cloud Bigtable Python connector > > > Key: BEAM-3342 > URL: https://issues.apache.org/jira/browse/BEAM-3342 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Solomon Duskis >Assignee: Solomon Duskis >Priority: Major > Time Spent: 7.5h > Remaining Estimate: 0h > > I would like to create a Cloud Bigtable python connector. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5592) Beam Dependency Update Request: com.diffplug.spotless:spotless-plugin-gradle
[ https://issues.apache.org/jira/browse/BEAM-5592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-5592: - Assignee: Kenneth Knowles > Beam Dependency Update Request: com.diffplug.spotless:spotless-plugin-gradle > > > Key: BEAM-5592 > URL: https://issues.apache.org/jira/browse/BEAM-5592 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Kenneth Knowles >Priority: Major > > - 2018-10-01 19:32:35.663453 > - > Please consider upgrading the dependency > com.diffplug.spotless:spotless-plugin-gradle. > The current version is 3.6.0. The latest version is 3.15.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-08 12:21:21.486670 > - > Please consider upgrading the dependency > com.diffplug.spotless:spotless-plugin-gradle. > The current version is 3.6.0. The latest version is 3.15.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-15 12:14:25.257832 > - > Please consider upgrading the dependency > com.diffplug.spotless:spotless-plugin-gradle. > The current version is 3.6.0. The latest version is 3.15.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-22 12:14:56.122481 > - > Please consider upgrading the dependency > com.diffplug.spotless:spotless-plugin-gradle. > The current version is 3.6.0. The latest version is 3.15.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-29 12:20:00.647717 > - > Please consider upgrading the dependency > com.diffplug.spotless:spotless-plugin-gradle. > The current version is 3.7.0. The latest version is 3.15.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-11-05 12:17:24.553692 > - > Please consider upgrading the dependency > com.diffplug.spotless:spotless-plugin-gradle. > The current version is 3.7.0. The latest version is 3.16.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-11-12 12:17:11.816394 > - > Please consider upgrading the dependency > com.diffplug.spotless:spotless-plugin-gradle. > The current version is 3.7.0. The latest version is 3.16.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-11-19 12:17:59.751820 > - > Please consider upgrading the dependency > com.diffplug.spotless:spotless-plugin-gradle. > The current version is 3.7.0. The latest version is 3.16.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-11-26 12:16:57.363678 > - > Please consider upgrading the dependency > com.diffplug.spotless:spotless-plugin-gradle. > The current version is 3.7.0. The latest version is 3.16.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-12-03 12:17:19.561449 > - > Please consider upgrading the dependency > com.diffplug.spotless:spotless-plugin-gradle. > The current version is 3.7.0. The latest version is 3.16.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. >
[jira] [Work logged] (BEAM-6430) EXCEPT is not compatible with SQL standard
[ https://issues.apache.org/jira/browse/BEAM-6430?focusedWorklogId=185030=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185030 ] ASF GitHub Bot logged work on BEAM-6430: Author: ASF GitHub Bot Created on: 14/Jan/19 23:44 Start Date: 14/Jan/19 23:44 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #7506: [BEAM-6430] Fix EXCEPT URL: https://github.com/apache/beam/pull/7506 EXCEPT ALL should emit MAX(m - n, 0). and EXCEPT DISTINCT emit dedupped EXCEPT ALL result Follow this checklist to help us incorporate your contribution quickly and easily: - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185030) Time Spent: 10m Remaining Estimate: 0h > EXCEPT is not compatible with
[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard
[ https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185026=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185026 ] ASF GitHub Bot logged work on BEAM-6427: Author: ASF GitHub Bot Created on: 14/Jan/19 23:40 Start Date: 14/Jan/19 23:40 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #7504: [BEAM-6427] INTERSECT ALL is not compatible with SQL standard. URL: https://github.com/apache/beam/pull/7504#issuecomment-454206064 @akedin @kennknowles @apilloud This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185026) Time Spent: 20m (was: 10m) > INTERSECT ALL is not compatible with SQL standard > - > > Key: BEAM-6427 > URL: https://issues.apache.org/jira/browse/BEAM-6427 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > say Row R appears m times on one side and n times on another side. > Beam's INTERSECT ALL is implemented to return MAX(m, n). > And SQL1999 standard says INTERSECT ALL should return MIN(m, n). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5009) Beam Dependency Update Request: com.diffplug.spotless
[ https://issues.apache.org/jira/browse/BEAM-5009?focusedWorklogId=185025=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185025 ] ASF GitHub Bot logged work on BEAM-5009: Author: ASF GitHub Bot Created on: 14/Jan/19 23:38 Start Date: 14/Jan/19 23:38 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #7505: [BEAM-5009] Pin spotless and googleJavaFormat to latest; apply globally URL: https://github.com/apache/beam/pull/7505 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185025) Time Spent: 50m (was: 40m) > Beam Dependency Update Request: com.diffplug.spotless > - > > Key: BEAM-5009 > URL: https://issues.apache.org/jira/browse/BEAM-5009 > Project: Beam > Issue Type: Bug > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > 2018-07-25 20:34:42.078359 > Please review and upgrade the com.diffplug.spotless to the latest > version None > > cc: -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6352) Watch PTransform is broken
[ https://issues.apache.org/jira/browse/BEAM-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742583#comment-16742583 ] Kenneth Knowles commented on BEAM-6352: --- OK the problem was my reading of the PR description. I see now. > Watch PTransform is broken > -- > > Key: BEAM-6352 > URL: https://issues.apache.org/jira/browse/BEAM-6352 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.9.0 >Reporter: Gleb Kanterov >Assignee: Boyuan Zhang >Priority: Blocker > Fix For: 2.10.0 > > > List of affected tests: > org.apache.beam.sdk.transforms.WatchTest > > testSinglePollMultipleInputsWithSideInput FAILED > org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithKeyExtractor > FAILED > org.apache.beam.sdk.transforms.WatchTest > testSinglePollMultipleInputs FAILED > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsWithTerminationDueToTerminationCondition FAILED > org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithManyResults > FAILED > org.apache.beam.sdk.transforms.WatchTest > testSinglePollWithManyResults > FAILED > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsStopAfterTimeSinceNewOutput > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsWithTerminationBecauseOutputIsFinal FAILED > org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > > testContinuouslyWriteAndReadMultipleFilepatterns[0: true] FAILED > org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > > testContinuouslyWriteAndReadMultipleFilepatterns[1: false] FAILED > org.apache.beam.sdk.io.FileIOTest > testMatchWatchForNewFiles FAILED > org.apache.beam.sdk.io.TextIOReadTest$BasicIOTest > testReadWatchForNewFiles > FAILED > {code} > java.lang.IllegalArgumentException: > org.apache.beam.sdk.transforms.Watch$WatchGrowthFn, @ProcessElement > process(ProcessContext, GrowthTracker): Has tracker type > Watch.GrowthTracker, but the DoFn's tracker > type must be of type RestrictionTracker. > {code} > Relevant pull requests: > - https://github.com/apache/beam/pull/6467 > - https://github.com/apache/beam/pull/7374 > Now tests are marked with @Ignore referencing this JIRA issue -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5319) Finish Python 3 porting for runners module
[ https://issues.apache.org/jira/browse/BEAM-5319?focusedWorklogId=185021=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185021 ] ASF GitHub Bot logged work on BEAM-5319: Author: ASF GitHub Bot Created on: 14/Jan/19 23:31 Start Date: 14/Jan/19 23:31 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #7445: [BEAM-5319] Python 3 port runners module URL: https://github.com/apache/beam/pull/7445#discussion_r247706878 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py ## @@ -215,9 +203,6 @@ def test_gbk_side_input(self): main | beam.Map(lambda a, b: (a, b), beam.pvalue.AsDict(side)), equal_to([(None, {'a': [1]})])) - @unittest.skipIf(sys.version_info[0] == 3 and Review comment: Ok, that's fine. I created a JIRA tracking this failure: https://issues.apache.org/jira/browse/BEAM-6429. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185021) Time Spent: 6h 10m (was: 6h) > Finish Python 3 porting for runners module > -- > > Key: BEAM-5319 > URL: https://issues.apache.org/jira/browse/BEAM-5319 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Robbe >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work started] (BEAM-6430) EXCEPT is not compatible with SQL standard
[ https://issues.apache.org/jira/browse/BEAM-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-6430 started by Rui Wang. -- > EXCEPT is not compatible with SQL standard > -- > > Key: BEAM-6430 > URL: https://issues.apache.org/jira/browse/BEAM-6430 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > > EXCEPT ALL should return MAX(m - n, 0). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5446) SplittableDoFn: Remove runner time execution information from public API surface
[ https://issues.apache.org/jira/browse/BEAM-5446?focusedWorklogId=185020=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185020 ] ASF GitHub Bot logged work on BEAM-5446: Author: ASF GitHub Bot Created on: 14/Jan/19 23:28 Start Date: 14/Jan/19 23:28 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #6467: [BEAM-5446] SplittableDoFn: Remove "internal" methods for public API surface URL: https://github.com/apache/beam/pull/6467#discussion_r247706235 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java ## @@ -610,11 +591,9 @@ private static void verifySplittableMethods(DoFnSignature signature, ErrorReport errors.forMethod(DoFn.GetInitialRestriction.class, getInitialRestriction.targetMethod()); TypeDescriptor restrictionT = getInitialRestriction.restrictionT(); processElementErrors.checkArgument( -processElement.trackerT().equals(trackerT), -"Has tracker type %s, but the DoFn's tracker type was inferred as %s from %s", -formatType(processElement.trackerT()), -trackerT, -originOfTrackerT); + processElement.trackerT().getRawType().equals(RestrictionTracker.class), Review comment: Ah, nevermind. My bad reading. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185020) Time Spent: 2h 10m (was: 2h) > SplittableDoFn: Remove runner time execution information from public API > surface > > > Key: BEAM-5446 > URL: https://issues.apache.org/jira/browse/BEAM-5446 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Minor > Fix For: 2.9.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Move the setting of "claim observers" within RestrictionTracker to another > location to clean up the RestrictionTracker interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5009) Beam Dependency Update Request: com.diffplug.spotless
[ https://issues.apache.org/jira/browse/BEAM-5009?focusedWorklogId=185019=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185019 ] ASF GitHub Bot logged work on BEAM-5009: Author: ASF GitHub Bot Created on: 14/Jan/19 23:27 Start Date: 14/Jan/19 23:27 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #7505: [BEAM-5009] Pin spotless and googleJavaFormat to latest; apply globally URL: https://github.com/apache/beam/pull/7505#issuecomment-454203222 lgtm This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185019) Time Spent: 0.5h (was: 20m) > Beam Dependency Update Request: com.diffplug.spotless > - > > Key: BEAM-5009 > URL: https://issues.apache.org/jira/browse/BEAM-5009 > Project: Beam > Issue Type: Bug > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > 2018-07-25 20:34:42.078359 > Please review and upgrade the com.diffplug.spotless to the latest > version None > > cc: -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5446) SplittableDoFn: Remove runner time execution information from public API surface
[ https://issues.apache.org/jira/browse/BEAM-5446?focusedWorklogId=185018=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185018 ] ASF GitHub Bot logged work on BEAM-5446: Author: ASF GitHub Bot Created on: 14/Jan/19 23:27 Start Date: 14/Jan/19 23:27 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #6467: [BEAM-5446] SplittableDoFn: Remove "internal" methods for public API surface URL: https://github.com/apache/beam/pull/6467#discussion_r247706055 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java ## @@ -610,11 +591,9 @@ private static void verifySplittableMethods(DoFnSignature signature, ErrorReport errors.forMethod(DoFn.GetInitialRestriction.class, getInitialRestriction.targetMethod()); TypeDescriptor restrictionT = getInitialRestriction.restrictionT(); processElementErrors.checkArgument( -processElement.trackerT().equals(trackerT), -"Has tracker type %s, but the DoFn's tracker type was inferred as %s from %s", -formatType(processElement.trackerT()), -trackerT, -originOfTrackerT); + processElement.trackerT().getRawType().equals(RestrictionTracker.class), Review comment: (I know the PR description says that it makes this change, but I don't understand why) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185018) Time Spent: 2h (was: 1h 50m) > SplittableDoFn: Remove runner time execution information from public API > surface > > > Key: BEAM-5446 > URL: https://issues.apache.org/jira/browse/BEAM-5446 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Minor > Fix For: 2.9.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Move the setting of "claim observers" within RestrictionTracker to another > location to clean up the RestrictionTracker interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5446) SplittableDoFn: Remove runner time execution information from public API surface
[ https://issues.apache.org/jira/browse/BEAM-5446?focusedWorklogId=185017=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185017 ] ASF GitHub Bot logged work on BEAM-5446: Author: ASF GitHub Bot Created on: 14/Jan/19 23:26 Start Date: 14/Jan/19 23:26 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #6467: [BEAM-5446] SplittableDoFn: Remove "internal" methods for public API surface URL: https://github.com/apache/beam/pull/6467#discussion_r247705762 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java ## @@ -610,11 +591,9 @@ private static void verifySplittableMethods(DoFnSignature signature, ErrorReport errors.forMethod(DoFn.GetInitialRestriction.class, getInitialRestriction.targetMethod()); TypeDescriptor restrictionT = getInitialRestriction.restrictionT(); processElementErrors.checkArgument( -processElement.trackerT().equals(trackerT), -"Has tracker type %s, but the DoFn's tracker type was inferred as %s from %s", -formatType(processElement.trackerT()), -trackerT, -originOfTrackerT); + processElement.trackerT().getRawType().equals(RestrictionTracker.class), Review comment: This seems like it should be a subtype check. What was wrong with the prior code? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185017) Time Spent: 1h 50m (was: 1h 40m) > SplittableDoFn: Remove runner time execution information from public API > surface > > > Key: BEAM-5446 > URL: https://issues.apache.org/jira/browse/BEAM-5446 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Minor > Fix For: 2.9.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Move the setting of "claim observers" within RestrictionTracker to another > location to clean up the RestrictionTracker interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6352) Watch PTransform is broken
[ https://issues.apache.org/jira/browse/BEAM-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742581#comment-16742581 ] Kenneth Knowles commented on BEAM-6352: --- The change seems bad. I don't really know why it was made. > Watch PTransform is broken > -- > > Key: BEAM-6352 > URL: https://issues.apache.org/jira/browse/BEAM-6352 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.9.0 >Reporter: Gleb Kanterov >Assignee: Boyuan Zhang >Priority: Blocker > Fix For: 2.10.0 > > > List of affected tests: > org.apache.beam.sdk.transforms.WatchTest > > testSinglePollMultipleInputsWithSideInput FAILED > org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithKeyExtractor > FAILED > org.apache.beam.sdk.transforms.WatchTest > testSinglePollMultipleInputs FAILED > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsWithTerminationDueToTerminationCondition FAILED > org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithManyResults > FAILED > org.apache.beam.sdk.transforms.WatchTest > testSinglePollWithManyResults > FAILED > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsStopAfterTimeSinceNewOutput > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsWithTerminationBecauseOutputIsFinal FAILED > org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > > testContinuouslyWriteAndReadMultipleFilepatterns[0: true] FAILED > org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > > testContinuouslyWriteAndReadMultipleFilepatterns[1: false] FAILED > org.apache.beam.sdk.io.FileIOTest > testMatchWatchForNewFiles FAILED > org.apache.beam.sdk.io.TextIOReadTest$BasicIOTest > testReadWatchForNewFiles > FAILED > {code} > java.lang.IllegalArgumentException: > org.apache.beam.sdk.transforms.Watch$WatchGrowthFn, @ProcessElement > process(ProcessContext, GrowthTracker): Has tracker type > Watch.GrowthTracker, but the DoFn's tracker > type must be of type RestrictionTracker. > {code} > Relevant pull requests: > - https://github.com/apache/beam/pull/6467 > - https://github.com/apache/beam/pull/7374 > Now tests are marked with @Ignore referencing this JIRA issue -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6391) fix miscellaneous bugs
[ https://issues.apache.org/jira/browse/BEAM-6391?focusedWorklogId=185012=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185012 ] ASF GitHub Bot logged work on BEAM-6391: Author: ASF GitHub Bot Created on: 14/Jan/19 23:13 Start Date: 14/Jan/19 23:13 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #7430: [BEAM-6391] fix always true conditions URL: https://github.com/apache/beam/pull/7430#issuecomment-454199847 LGTM. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185012) Time Spent: 40m (was: 0.5h) > fix miscellaneous bugs > -- > > Key: BEAM-6391 > URL: https://issues.apache.org/jira/browse/BEAM-6391 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.10.0 >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > ticket for addressing small and easy to fix bugs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work started] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard
[ https://issues.apache.org/jira/browse/BEAM-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-6427 started by Rui Wang. -- > INTERSECT ALL is not compatible with SQL standard > - > > Key: BEAM-6427 > URL: https://issues.apache.org/jira/browse/BEAM-6427 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > say Row R appears m times on one side and n times on another side. > Beam's INTERSECT ALL is implemented to return MAX(m, n). > And SQL1999 standard says INTERSECT ALL should return MIN(m, n). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6430) EXCEPT is not compatible with SQL standard
Rui Wang created BEAM-6430: -- Summary: EXCEPT is not compatible with SQL standard Key: BEAM-6430 URL: https://issues.apache.org/jira/browse/BEAM-6430 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Rui Wang Assignee: Rui Wang EXCEPT ALL should return MAX(m - n, 0). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5009) Beam Dependency Update Request: com.diffplug.spotless
[ https://issues.apache.org/jira/browse/BEAM-5009?focusedWorklogId=185015=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185015 ] ASF GitHub Bot logged work on BEAM-5009: Author: ASF GitHub Bot Created on: 14/Jan/19 23:22 Start Date: 14/Jan/19 23:22 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #7505: [BEAM-5009] Pin spotless and googleJavaFormat to latest; apply globally URL: https://github.com/apache/beam/pull/7505 Our spotless plugin was just using the latest googleJavaFormat or whatever was on disk. So new devs would get a new version and our codebase would break, actually. This pins the version and also updates everything. Follow this checklist to help us incorporate your contribution quickly and easily: - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (BEAM-5009) Beam Dependency Update Request: com.diffplug.spotless
[ https://issues.apache.org/jira/browse/BEAM-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-5009: - Assignee: Kenneth Knowles > Beam Dependency Update Request: com.diffplug.spotless > - > > Key: BEAM-5009 > URL: https://issues.apache.org/jira/browse/BEAM-5009 > Project: Beam > Issue Type: Bug > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > 2018-07-25 20:34:42.078359 > Please review and upgrade the com.diffplug.spotless to the latest > version None > > cc: -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6391) fix miscellaneous bugs
[ https://issues.apache.org/jira/browse/BEAM-6391?focusedWorklogId=185010=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185010 ] ASF GitHub Bot logged work on BEAM-6391: Author: ASF GitHub Bot Created on: 14/Jan/19 23:08 Start Date: 14/Jan/19 23:08 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #7428: [BEAM-6391] add missing raise keywords URL: https://github.com/apache/beam/pull/7428 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185010) Time Spent: 0.5h (was: 20m) > fix miscellaneous bugs > -- > > Key: BEAM-6391 > URL: https://issues.apache.org/jira/browse/BEAM-6391 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.10.0 >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > ticket for addressing small and easy to fix bugs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6429) apache_beam.runners.portability.fn_api_runner_test.FnApiRunnerTest.test_multimap_side_input fails in Python 3.6
Valentyn Tymofieiev created BEAM-6429: - Summary: apache_beam.runners.portability.fn_api_runner_test.FnApiRunnerTest.test_multimap_side_input fails in Python 3.6 Key: BEAM-6429 URL: https://issues.apache.org/jira/browse/BEAM-6429 Project: Beam Issue Type: Sub-task Components: sdk-py-core Reporter: Valentyn Tymofieiev {noformat} ERROR: test_multimap_side_input (apache_beam.runners.portability.fn_api_runner_test.FnApiRunnerTest) -- Traceback (most recent call last): File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py", line 230, in test_multimap_side_input equal_to([('a', [1, 3]), ('b', [2])])) File "/beam/sdks/python/apache_beam/pipeline.py", line 425, in __exit__ self.run().wait_until_finish() File "/beam/sdks/python/apache_beam/pipeline.py", line 405, in run self._options).run(False) File "/beam/sdks/python/apache_beam/pipeline.py", line 418, in run return self.runner.run_pipeline(self, self._options) File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 265, in run_pipeline default_environment=self._default_environment)) File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 268, in run_via_runner_api return self.run_stages(*self.create_stages(pipeline_proto)) File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 355, in run_stages safe_coders) File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 449, in run_stage elements_by_window = _WindowGroupingBuffer(si, value_coder) File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 191, in __init__ self._key_coder = coder.wrapped_value_coder.key_coder() File "/beam/sdks/python/apache_beam/coders/coders.py", line 177, in key_coder raise ValueError('Not a KV coder: %s.' % self) ValueError: Not a KV coder: BytesCoder.{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard
[ https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185011=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185011 ] ASF GitHub Bot logged work on BEAM-6427: Author: ASF GitHub Bot Created on: 14/Jan/19 23:13 Start Date: 14/Jan/19 23:13 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #7504: [BEAM-6427]INTERSECT ALL is not compatible with SQL standard. URL: https://github.com/apache/beam/pull/7504 Per SQL standard 1999 `7.12 3) iii) 3)`, For ALL case, ``` If INTERSECT is specified, then the number of duplicates of R that T contains is the minimum of m and n. ``` Follow this checklist to help us incorporate your contribution quickly and easily: - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking
[jira] [Work logged] (BEAM-6391) fix miscellaneous bugs
[ https://issues.apache.org/jira/browse/BEAM-6391?focusedWorklogId=185013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185013 ] ASF GitHub Bot logged work on BEAM-6391: Author: ASF GitHub Bot Created on: 14/Jan/19 23:13 Start Date: 14/Jan/19 23:13 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #7430: [BEAM-6391] fix always true conditions URL: https://github.com/apache/beam/pull/7430 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185013) Time Spent: 50m (was: 40m) > fix miscellaneous bugs > -- > > Key: BEAM-6391 > URL: https://issues.apache.org/jira/browse/BEAM-6391 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.10.0 >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > ticket for addressing small and easy to fix bugs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6352) Watch PTransform is broken
[ https://issues.apache.org/jira/browse/BEAM-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742565#comment-16742565 ] Boyuan Zhang commented on BEAM-6352: Researching a little bit. The root cause is in Luke's PR([https://github.com/apache/beam/pull/6467]), there is a checkArguement changes: {code} processElementErrors.checkArgument(processElement.trackerT().getRawType().equals(RestrictionTracker.class), "Has tracker type %s, but the DoFn's tracker type must be of type RestrictionTracker.", formatType(processElement.trackerT())); {code} which enforces a DoFn use RestrictionTracker in processElement rather than a subtype of RestrictionTracker. But in the WatchGrowthFn, the usage of tracker is not only limited to RestrictionTracker, [example|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java#L727]. There are 2 possible solutions from my knowlagde: 1. Expose the exact tracker from [RestrictionTrackerObserver|https://github.com/apache/beam/blob/master/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/splittabledofn/RestrictionTrackers.java#L42] 2. In the checkArguement section, make WatchGrowthTracker as a special case. I'm not super familiar with Watch.java so I cannot tell which is the right way to go. > Watch PTransform is broken > -- > > Key: BEAM-6352 > URL: https://issues.apache.org/jira/browse/BEAM-6352 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.9.0 >Reporter: Gleb Kanterov >Assignee: Boyuan Zhang >Priority: Blocker > Fix For: 2.10.0 > > > List of affected tests: > org.apache.beam.sdk.transforms.WatchTest > > testSinglePollMultipleInputsWithSideInput FAILED > org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithKeyExtractor > FAILED > org.apache.beam.sdk.transforms.WatchTest > testSinglePollMultipleInputs FAILED > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsWithTerminationDueToTerminationCondition FAILED > org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithManyResults > FAILED > org.apache.beam.sdk.transforms.WatchTest > testSinglePollWithManyResults > FAILED > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsStopAfterTimeSinceNewOutput > org.apache.beam.sdk.transforms.WatchTest > > testMultiplePollsWithTerminationBecauseOutputIsFinal FAILED > org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > > testContinuouslyWriteAndReadMultipleFilepatterns[0: true] FAILED > org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > > testContinuouslyWriteAndReadMultipleFilepatterns[1: false] FAILED > org.apache.beam.sdk.io.FileIOTest > testMatchWatchForNewFiles FAILED > org.apache.beam.sdk.io.TextIOReadTest$BasicIOTest > testReadWatchForNewFiles > FAILED > {code} > java.lang.IllegalArgumentException: > org.apache.beam.sdk.transforms.Watch$WatchGrowthFn, @ProcessElement > process(ProcessContext, GrowthTracker): Has tracker type > Watch.GrowthTracker, but the DoFn's tracker > type must be of type RestrictionTracker. > {code} > Relevant pull requests: > - https://github.com/apache/beam/pull/6467 > - https://github.com/apache/beam/pull/7374 > Now tests are marked with @Ignore referencing this JIRA issue -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6391) fix miscellaneous bugs
[ https://issues.apache.org/jira/browse/BEAM-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742571#comment-16742571 ] Chamikara Jayalath commented on BEAM-6391: -- Please give a more descriptive name (sounds like a bug we can never close with the current title :D ) > fix miscellaneous bugs > -- > > Key: BEAM-6391 > URL: https://issues.apache.org/jira/browse/BEAM-6391 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.10.0 >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > ticket for addressing small and easy to fix bugs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6391) fix miscellaneous bugs
[ https://issues.apache.org/jira/browse/BEAM-6391?focusedWorklogId=185008=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185008 ] ASF GitHub Bot logged work on BEAM-6391: Author: ASF GitHub Bot Created on: 14/Jan/19 23:07 Start Date: 14/Jan/19 23:07 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #7429: [BEAM-6391] fix possible out of bound exception URL: https://github.com/apache/beam/pull/7429#issuecomment-454198505 LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185008) Time Spent: 10m Remaining Estimate: 0h > fix miscellaneous bugs > -- > > Key: BEAM-6391 > URL: https://issues.apache.org/jira/browse/BEAM-6391 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.10.0 >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > ticket for addressing small and easy to fix bugs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6391) fix miscellaneous bugs
[ https://issues.apache.org/jira/browse/BEAM-6391?focusedWorklogId=185009=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185009 ] ASF GitHub Bot logged work on BEAM-6391: Author: ASF GitHub Bot Created on: 14/Jan/19 23:07 Start Date: 14/Jan/19 23:07 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #7429: [BEAM-6391] fix possible out of bound exception URL: https://github.com/apache/beam/pull/7429 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185009) Time Spent: 20m (was: 10m) > fix miscellaneous bugs > -- > > Key: BEAM-6391 > URL: https://issues.apache.org/jira/browse/BEAM-6391 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.10.0 >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > ticket for addressing small and easy to fix bugs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable Python connector
[ https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=185006=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185006 ] ASF GitHub Bot logged work on BEAM-3342: Author: ASF GitHub Bot Created on: 14/Jan/19 22:58 Start Date: 14/Jan/19 22:58 Worklog Time Spent: 10m Work Description: juan-rael commented on pull request #7367: [BEAM-3342] Create a Cloud Bigtable Python connector Write URL: https://github.com/apache/beam/pull/7367#discussion_r247695621 ## File path: sdks/python/apache_beam/io/gcp/bigtable_io_test.py ## @@ -0,0 +1,174 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Unittest for GCP Bigtable testing.""" +from __future__ import absolute_import + +import datetime +import logging +import random +import string +import unittest +import uuid + +import apache_beam as beam +from apache_beam.io.gcp.bigtable_io_write import BigtableWriteConfiguration +from apache_beam.io.gcp.bigtable_io_write import WriteToBigtable +from apache_beam.metrics.metric import MetricsFilter +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.runners.runner import PipelineState +from apache_beam.testing.test_pipeline import TestPipeline + +# Protect against environments where bigtable library is not available. +# pylint: disable=wrong-import-order, wrong-import-position +try: + from google.cloud.bigtable import row, column_family, Client +except ImportError: + Client = None + + +class GenerateDirectRows(beam.DoFn): + """ Generates an iterator of DirectRow object to process on beam pipeline. + + """ + def process(self, row_values): +""" Process beam pipeline using an element. + +:type row_value: dict +:param row_value: dict: dict values with row_key and row_content having +family, column_id and value of row. +""" +direct_row = row.DirectRow(row_key=row_values["row_key"]) + +for row_value in row_values["row_content"]: + direct_row.set_cell(row_value["column_family_id"], + row_value["column_id"], + row_value["value"], + datetime.datetime.now()) + + +@unittest.skipIf(Client is None, 'GCP Bigtable dependencies are not installed') +class BigtableIOWriteIT(unittest.TestCase): + """ Bigtable Write Connector Test + + """ + DEFAULT_TABLE_PREFIX = "python-test" + instance_id = DEFAULT_TABLE_PREFIX + "-" + str(uuid.uuid4())[:8] + cluster_id = DEFAULT_TABLE_PREFIX + "-" + str(uuid.uuid4())[:8] + table_id = DEFAULT_TABLE_PREFIX + "-" + str(uuid.uuid4())[:8] + number = 500 + LOCATION_ID = "us-east1-b" + + def setUp(self): +try: + from google.cloud.bigtable import enums + self.STORAGE_TYPE = enums.StorageType.HDD +except ImportError: + self.STORAGE_TYPE = 2 + +self.test_pipeline = TestPipeline(is_integration_test=True) +self.runner_name = type(self.test_pipeline.runner).__name__ +self.project = self.test_pipeline.get_option('project') +self.client = Client(project=self.project, admin=True) +self._create_instance_table() + + def tearDown(self): +instance = self.client.instance(self.instance_id) +if instance.exists(): + instance.delete() + + def test_bigtable_write_python(self): +number = self.number +config = BigtableWriteConfiguration(self.project, self.instance_id, +self.table_id) +pipeline_args = self.test_pipeline.options_list +pipeline_options = PipelineOptions(pipeline_args) + +row_values = self._generate_mutation_data(number) + +with beam.Pipeline(options=pipeline_options) as pipeline: + _ = ( + pipeline + | 'Generate Row Values' >> beam.Create(row_values) Review comment: I changed the class `Generate Direct Rows` to a PTransform. ```python class GenerateDirectRows(beam.PTransform): """ Generates an iterator of DirectRow object to process on beam pipeline. """ def __init__(self, number, **kwargs): super(GenerateDirectRows, self).__init__(**kwargs) self.number = number self.rand =
[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module
[ https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=185000=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185000 ] ASF GitHub Bot logged work on BEAM-5315: Author: ASF GitHub Bot Created on: 14/Jan/19 22:50 Start Date: 14/Jan/19 22:50 Worklog Time Spent: 10m Work Description: RobbeSneyders commented on pull request #7503: [BEAM-5315] Python 3 port io.tfrecordio module URL: https://github.com/apache/beam/pull/7503 This is is part of a series of PRs with goal to make Apache Beam PY3 compatible. The proposal with the outlined approach has been documented here: https://s.apache.org/beam-python-3. This PR ports the `io.tfrecordio` module. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 185000) Time Spent: 12.5h (was: 12h 20m) > Finish Python 3 porting for io module > - > > Key: BEAM-5315 > URL: https://issues.apache.org/jira/browse/BEAM-5315 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Robbe >Priority: Major > Time Spent: 12.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4076) Schema followups
[ https://issues.apache.org/jira/browse/BEAM-4076?focusedWorklogId=184969=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184969 ] ASF GitHub Bot logged work on BEAM-4076: Author: ASF GitHub Bot Created on: 14/Jan/19 22:32 Start Date: 14/Jan/19 22:32 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #7500: [BEAM-4076] Add antlr4 to beam URL: https://github.com/apache/beam/pull/7500 Just adds the dependency and the shading. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184969) Time Spent: 16.5h (was: 16h 20m) > Schema followups > > > Key: BEAM-4076 > URL: https://issues.apache.org/jira/browse/BEAM-4076 > Project: Beam > Issue Type: Improvement > Components: beam-model, dsl-sql, sdk-java-core >Reporter: Kenneth Knowles >Priority: Major > Time Spent: 16.5h > Remaining Estimate: 0h > > This umbrella bug contains subtasks with followups for Beam schemas, which > were moved from SQL to the core Java SDK and made to be type-name-based > rather than coder based. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6428) Allow textual selection syntax for schema fields
Reuven Lax created BEAM-6428: Summary: Allow textual selection syntax for schema fields Key: BEAM-6428 URL: https://issues.apache.org/jira/browse/BEAM-6428 Project: Beam Issue Type: Sub-task Components: beam-model Affects Versions: 2.9.0 Reporter: Reuven Lax Assignee: Kenneth Knowles -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard
Rui Wang created BEAM-6427: -- Summary: INTERSECT ALL is not compatible with SQL standard Key: BEAM-6427 URL: https://issues.apache.org/jira/browse/BEAM-6427 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Rui Wang Assignee: Rui Wang say Row R appears m times on one side and n times on another side. Beam's INTERSECT ALL is implemented to return MAX(m, n). And SQL1999 standard says INTERSECT ALL should return MIN(m, n). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5319) Finish Python 3 porting for runners module
[ https://issues.apache.org/jira/browse/BEAM-5319?focusedWorklogId=184960=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184960 ] ASF GitHub Bot logged work on BEAM-5319: Author: ASF GitHub Bot Created on: 14/Jan/19 21:46 Start Date: 14/Jan/19 21:46 Worklog Time Spent: 10m Work Description: RobbeSneyders commented on pull request #7445: [BEAM-5319] Python 3 port runners module URL: https://github.com/apache/beam/pull/7445#discussion_r247671233 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py ## @@ -215,9 +203,6 @@ def test_gbk_side_input(self): main | beam.Map(lambda a, b: (a, b), beam.pvalue.AsDict(side)), equal_to([(None, {'a': [1]})])) - @unittest.skipIf(sys.version_info[0] == 3 and Review comment: Only unskipping tests for passing versions would mean we would need to check the tests for multiple versions every time, which would take some time. Leaving all tests unskipped for higher versions also doesn't seem like the best approach, since most will be fixed for all versions anyway. I also think some other already unskipped tests will fail on Python 3.6+. For instance some typehints tests related to type inference, which uses byte code inspection, which was changed in Python 3.6. Looking at the fastest way forward, I would like to focus solely on 3.5 for now, and add skip statements for 3.6 when a 3.6 test environment is added. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184960) Time Spent: 6h (was: 5h 50m) > Finish Python 3 porting for runners module > -- > > Key: BEAM-5319 > URL: https://issues.apache.org/jira/browse/BEAM-5319 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Robbe >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5319) Finish Python 3 porting for runners module
[ https://issues.apache.org/jira/browse/BEAM-5319?focusedWorklogId=184959=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184959 ] ASF GitHub Bot logged work on BEAM-5319: Author: ASF GitHub Bot Created on: 14/Jan/19 21:41 Start Date: 14/Jan/19 21:41 Worklog Time Spent: 10m Work Description: RobbeSneyders commented on issue #7445: [BEAM-5319] Python 3 port runners module URL: https://github.com/apache/beam/pull/7445#issuecomment-454172116 I've updated the random seed in the `test_should_sample (apache_beam.runners.worker.opcounters_test.OperationCountersTest)` after discussion with Robert on [BEAM-6359](https://issues.apache.org/jira/browse/BEAM-6395?jql=text%20~%20%22opcounters%22), which fixes the last failing runners test for 3.5. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184959) Time Spent: 5h 50m (was: 5h 40m) > Finish Python 3 porting for runners module > -- > > Key: BEAM-5319 > URL: https://issues.apache.org/jira/browse/BEAM-5319 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Robbe >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3608) Vendor Guava
[ https://issues.apache.org/jira/browse/BEAM-3608?focusedWorklogId=184943=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184943 ] ASF GitHub Bot logged work on BEAM-3608: Author: ASF GitHub Bot Created on: 14/Jan/19 20:27 Start Date: 14/Jan/19 20:27 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #7494: [BEAM-3608] Port shaded Guava to vendored Guava URL: https://github.com/apache/beam/pull/7494#issuecomment-454149478 I only filed https://issues.apache.org/jira/browse/BEAM-5821 for removing shadow plugin. It would be great to start on the easy IO cases now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184943) Time Spent: 9h 50m (was: 9h 40m) > Vendor Guava > > > Key: BEAM-3608 > URL: https://issues.apache.org/jira/browse/BEAM-3608 > Project: Beam > Issue Type: Sub-task > Components: runner-core, sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 9h 50m > Remaining Estimate: 0h > > Instead of shading as part of our build, we can shade before build so that it > is apparent when reading code, and in IDEs, that a particular class resides > in a hidden namespace. > {{import com.google.common.reflect.TypeToken}} > becomes something like > {{import org.apache.beam.private.guava21.com.google.common.reflect.TypeToken}} > So we can very trivially ban `org.apache.beam.private` from public APIs > unless they are annotated {{@Internal}}, and it makes sharing between our own > modules never get broken by shading again. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3608) Vendor Guava
[ https://issues.apache.org/jira/browse/BEAM-3608?focusedWorklogId=184942=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184942 ] ASF GitHub Bot logged work on BEAM-3608: Author: ASF GitHub Bot Created on: 14/Jan/19 20:26 Start Date: 14/Jan/19 20:26 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #7494: [BEAM-3608] Port shaded Guava to vendored Guava URL: https://github.com/apache/beam/pull/7494#issuecomment-454149304 1. We could use checkstyle as mentioned on list. I think maybe `@SuppressWarnings` could work for exceptions. But actually if we enforce it in the Gradle dependency level then you can't even import non-vendored Guava anyhow. Great idea. I filed https://issues.apache.org/jira/browse/BEAM-6426 but left it unassigned since I am not jumping on the task this minute. 2. I would like to make the shadow plugin inactive for almost all modules. Totally agree. And once we move vendoring further, to stop using it entirely. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184942) Time Spent: 9h 40m (was: 9.5h) > Vendor Guava > > > Key: BEAM-3608 > URL: https://issues.apache.org/jira/browse/BEAM-3608 > Project: Beam > Issue Type: Sub-task > Components: runner-core, sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 9h 40m > Remaining Estimate: 0h > > Instead of shading as part of our build, we can shade before build so that it > is apparent when reading code, and in IDEs, that a particular class resides > in a hidden namespace. > {{import com.google.common.reflect.TypeToken}} > becomes something like > {{import org.apache.beam.private.guava21.com.google.common.reflect.TypeToken}} > So we can very trivially ban `org.apache.beam.private` from public APIs > unless they are annotated {{@Internal}}, and it makes sharing between our own > modules never get broken by shading again. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6426) Enforce ban of non-vendored Guava, with exceptions
Kenneth Knowles created BEAM-6426: - Summary: Enforce ban of non-vendored Guava, with exceptions Key: BEAM-6426 URL: https://issues.apache.org/jira/browse/BEAM-6426 Project: Beam Issue Type: Sub-task Components: build-system Reporter: Kenneth Knowles -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-6234) [Flink Runner] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed BEAM-6234. Resolution: Fixed Fix Version/s: (was: 2.8.0) 2.10.0 > [Flink Runner] Make failOnCheckpointingErrors setting available in > FlinkPipelineOptions > --- > > Key: BEAM-6234 > URL: https://issues.apache.org/jira/browse/BEAM-6234 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Daniel Harper >Assignee: Daniel Harper >Priority: Trivial > Fix For: 2.10.0 > > Time Spent: 50m > Remaining Estimate: 0h > > The configuration setting {{failOnCheckpointingErrors}} [1] is available in > Flink to allow engineers to specify whether the job should fail in the event > of checkpoint failure. > This should be exposed in {{FlinkPipelineOptions}} and > {{FlinkExecutionEnvironments}} to allow users to configure this. > The default for this value in Flink is true [2] > [1] > https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L249 > [2] > https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L73 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6234) [Flink Runner] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-6234?focusedWorklogId=184941=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184941 ] ASF GitHub Bot logged work on BEAM-6234: Author: ASF GitHub Bot Created on: 14/Jan/19 20:13 Start Date: 14/Jan/19 20:13 Worklog Time Spent: 10m Work Description: mxm commented on issue #7284: [BEAM-6234] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions URL: https://github.com/apache/beam/pull/7284#issuecomment-454145416 Added the test upon merge. Thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184941) Time Spent: 50m (was: 40m) > [Flink Runner] Make failOnCheckpointingErrors setting available in > FlinkPipelineOptions > --- > > Key: BEAM-6234 > URL: https://issues.apache.org/jira/browse/BEAM-6234 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Daniel Harper >Assignee: Daniel Harper >Priority: Trivial > Fix For: 2.8.0 > > Time Spent: 50m > Remaining Estimate: 0h > > The configuration setting {{failOnCheckpointingErrors}} [1] is available in > Flink to allow engineers to specify whether the job should fail in the event > of checkpoint failure. > This should be exposed in {{FlinkPipelineOptions}} and > {{FlinkExecutionEnvironments}} to allow users to configure this. > The default for this value in Flink is true [2] > [1] > https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L249 > [2] > https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L73 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6234) [Flink Runner] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-6234?focusedWorklogId=184940=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184940 ] ASF GitHub Bot logged work on BEAM-6234: Author: ASF GitHub Bot Created on: 14/Jan/19 20:13 Start Date: 14/Jan/19 20:13 Worklog Time Spent: 10m Work Description: asfgit commented on pull request #7284: [BEAM-6234] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions URL: https://github.com/apache/beam/pull/7284 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184940) Time Spent: 40m (was: 0.5h) > [Flink Runner] Make failOnCheckpointingErrors setting available in > FlinkPipelineOptions > --- > > Key: BEAM-6234 > URL: https://issues.apache.org/jira/browse/BEAM-6234 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Daniel Harper >Assignee: Daniel Harper >Priority: Trivial > Fix For: 2.8.0 > > Time Spent: 40m > Remaining Estimate: 0h > > The configuration setting {{failOnCheckpointingErrors}} [1] is available in > Flink to allow engineers to specify whether the job should fail in the event > of checkpoint failure. > This should be exposed in {{FlinkPipelineOptions}} and > {{FlinkExecutionEnvironments}} to allow users to configure this. > The default for this value in Flink is true [2] > [1] > https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L249 > [2] > https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L73 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4783) Add bundleSize parameter to control splitting of Spark sources (useful for Dynamic Allocation)
[ https://issues.apache.org/jira/browse/BEAM-4783?focusedWorklogId=184937=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184937 ] ASF GitHub Bot logged work on BEAM-4783: Author: ASF GitHub Bot Created on: 14/Jan/19 19:44 Start Date: 14/Jan/19 19:44 Worklog Time Spent: 10m Work Description: kyle-winkelman commented on issue #6884: [BEAM-4783] Fix invalid parameter to set the partitioner in Spark GbK URL: https://github.com/apache/beam/pull/6884#issuecomment-454134845 @iemejia Were you able to confirm if this change fixed the performance issues seen in the Spark Runner? My code was poorly written and believe that your way is much more clear but if you take a look at the state before my [PR](https://github.com/apache/beam/pull/6181) you will what I am talking about. Specifically [this comment](https://github.com/apache/beam/pull/6181#discussion_r215912675). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184937) Time Spent: 6h 50m (was: 6h 40m) > Add bundleSize parameter to control splitting of Spark sources (useful for > Dynamic Allocation) > -- > > Key: BEAM-4783 > URL: https://issues.apache.org/jira/browse/BEAM-4783 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Affects Versions: 2.8.0 >Reporter: Kyle Winkelman >Assignee: Kyle Winkelman >Priority: Major > Fix For: 2.8.0, 2.9.0 > > Time Spent: 6h 50m > Remaining Estimate: 0h > > When the spark-runner is used along with the configuration > spark.dynamicAllocation.enabled=true the SourceRDD does not detect this. It > then falls back to the value calculated in this description: > // when running on YARN/SparkDeploy it's the result of max(totalCores, > 2). > // when running on Mesos it's 8. > // when running local it's the total number of cores (local = 1, > local[N] = N, > // local[*] = estimation of the machine's cores). > // ** the configuration "spark.default.parallelism" takes precedence > over all of the above ** > So in most cases this default is quite small. This is an issue when using a > very large input file as it will only get split in half. > I believe that when Dynamic Allocation is enable the SourceRDD should use the > DEFAULT_BUNDLE_SIZE and possibly expose a SparkPipelineOptions that allows > you to change this DEFAULT_BUNDLE_SIZE. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4783) Add bundleSize parameter to control splitting of Spark sources (useful for Dynamic Allocation)
[ https://issues.apache.org/jira/browse/BEAM-4783?focusedWorklogId=184936=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184936 ] ASF GitHub Bot logged work on BEAM-4783: Author: ASF GitHub Bot Created on: 14/Jan/19 19:40 Start Date: 14/Jan/19 19:40 Worklog Time Spent: 10m Work Description: kyle-winkelman commented on issue #6884: [BEAM-4783] Fix invalid parameter to set the partitioner in Spark GbK URL: https://github.com/apache/beam/pull/6884#issuecomment-454134845 @iemejia Were you able to confirm if this change fixed the performance issues seen in the Spark Runner? My code was poorly written and believe that your way is much more clear but if you take a look at the state before my [PR](https://github.com/apache/beam/pull/6181) you will what I am talking about. Specifically [this comment by iemejia](https://github.com/apache/beam/pull/6181#discussion_r215912675). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184936) Time Spent: 6h 40m (was: 6.5h) > Add bundleSize parameter to control splitting of Spark sources (useful for > Dynamic Allocation) > -- > > Key: BEAM-4783 > URL: https://issues.apache.org/jira/browse/BEAM-4783 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Affects Versions: 2.8.0 >Reporter: Kyle Winkelman >Assignee: Kyle Winkelman >Priority: Major > Fix For: 2.8.0, 2.9.0 > > Time Spent: 6h 40m > Remaining Estimate: 0h > > When the spark-runner is used along with the configuration > spark.dynamicAllocation.enabled=true the SourceRDD does not detect this. It > then falls back to the value calculated in this description: > // when running on YARN/SparkDeploy it's the result of max(totalCores, > 2). > // when running on Mesos it's 8. > // when running local it's the total number of cores (local = 1, > local[N] = N, > // local[*] = estimation of the machine's cores). > // ** the configuration "spark.default.parallelism" takes precedence > over all of the above ** > So in most cases this default is quite small. This is an issue when using a > very large input file as it will only get split in half. > I believe that when Dynamic Allocation is enable the SourceRDD should use the > DEFAULT_BUNDLE_SIZE and possibly expose a SparkPipelineOptions that allows > you to change this DEFAULT_BUNDLE_SIZE. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6418) beam_PreCommit_Java_Cron failing
[ https://issues.apache.org/jira/browse/BEAM-6418?focusedWorklogId=184915=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184915 ] ASF GitHub Bot logged work on BEAM-6418: Author: ASF GitHub Bot Created on: 14/Jan/19 18:28 Start Date: 14/Jan/19 18:28 Worklog Time Spent: 10m Work Description: aaltay commented on issue #7496: [BEAM-6418] Avoid Jenkins memory problems for Flink build targets URL: https://github.com/apache/beam/pull/7496#issuecomment-454109778 Thank you @mxm. Is it possible run them one after another? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184915) Time Spent: 50m (was: 40m) > beam_PreCommit_Java_Cron failing > > > Key: BEAM-6418 > URL: https://issues.apache.org/jira/browse/BEAM-6418 > Project: Beam > Issue Type: Bug > Components: runner-flink, test-failures >Reporter: Ahmet Altay >Assignee: Maximilian Michels >Priority: Critical > Time Spent: 50m > Remaining Estimate: 0h > > [https://builds.apache.org/job/beam_PreCommit_Java_Cron/814/console] > > *16:22:16* 1: Task failed with an exception.*16:22:16* ---*16:22:16* > * What went wrong:*16:22:16* Execution failed for task > ':beam-runners-flink-1.6:test'.*16:22:16* > Process 'Gradle Test Executor > 108' finished with non-zero exit value 137*16:22:16* This problem might be > caused by incorrect test process configuration.*16:22:16* Please refer to > the test execution section in the user guide at > [https://docs.gradle.org/4.10.3/userguide/java_plugin.html#sec:test_execution] > *16:22:16* *16:22:16* * Try:*16:22:16* Run with --stacktrace option to get > the stack trace. Run with --info or --debug option to get more log output. > Run with --scan to get full insights.*16:22:16* > ==*16:22:16* > *16:22:16* 2: Task failed with an exception.*16:22:16* ---*16:22:16* > * What went wrong:*16:22:16* Execution failed for task > ':beam-runners-flink_2.11:test'.*16:22:16* > Process 'Gradle Test Executor > 110' finished with non-zero exit value 1*16:22:16* This problem might be > caused by incorrect test process configuration.*16:22:16* Please refer to > the test execution section in the user guide at > [https://docs.gradle.org/4.10.3/userguide/java_plugin.html#sec:test_execution] > *16:22:16* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6425) Replace SSLContext.getInstance("SSL")
[ https://issues.apache.org/jira/browse/BEAM-6425?focusedWorklogId=184909=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184909 ] ASF GitHub Bot logged work on BEAM-6425: Author: ASF GitHub Bot Created on: 14/Jan/19 17:52 Start Date: 14/Jan/19 17:52 Worklog Time Spent: 10m Work Description: coheigea commented on pull request #7499: BEAM-6425 - Replace SSLContext.getInstance("SSL") URL: https://github.com/apache/beam/pull/7499 Switch to SSLContext.getInstance("TLS") instead of the older SSL algorithm. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184909) Time Spent: 10m Remaining Estimate: 0h > Replace SSLContext.getInstance("SSL") > - > > Key: BEAM-6425 > URL: https://issues.apache.org/jira/browse/BEAM-6425 > Project: Beam > Issue Type: Improvement > Components: io-java-mongodb >Reporter: Colm O hEigeartaigh >Assignee: Colm O hEigeartaigh >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > SSLUtils has an instance of: SSLContext.getInstance("SSL") > Instead we should use "TLS" here. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6234) [Flink Runner] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-6234?focusedWorklogId=184911=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184911 ] ASF GitHub Bot logged work on BEAM-6234: Author: ASF GitHub Bot Created on: 14/Jan/19 17:58 Start Date: 14/Jan/19 17:58 Worklog Time Spent: 10m Work Description: mxm commented on issue #7284: [BEAM-6234] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions URL: https://github.com/apache/beam/pull/7284#issuecomment-454100397 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184911) Time Spent: 0.5h (was: 20m) > [Flink Runner] Make failOnCheckpointingErrors setting available in > FlinkPipelineOptions > --- > > Key: BEAM-6234 > URL: https://issues.apache.org/jira/browse/BEAM-6234 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Daniel Harper >Assignee: Daniel Harper >Priority: Trivial > Fix For: 2.8.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The configuration setting {{failOnCheckpointingErrors}} [1] is available in > Flink to allow engineers to specify whether the job should fail in the event > of checkpoint failure. > This should be exposed in {{FlinkPipelineOptions}} and > {{FlinkExecutionEnvironments}} to allow users to configure this. > The default for this value in Flink is true [2] > [1] > https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L249 > [2] > https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L73 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6234) [Flink Runner] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-6234?focusedWorklogId=184910=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184910 ] ASF GitHub Bot logged work on BEAM-6234: Author: ASF GitHub Bot Created on: 14/Jan/19 17:58 Start Date: 14/Jan/19 17:58 Worklog Time Spent: 10m Work Description: mxm commented on issue #7284: [BEAM-6234] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions URL: https://github.com/apache/beam/pull/7284#issuecomment-454100353 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184910) Time Spent: 20m (was: 10m) > [Flink Runner] Make failOnCheckpointingErrors setting available in > FlinkPipelineOptions > --- > > Key: BEAM-6234 > URL: https://issues.apache.org/jira/browse/BEAM-6234 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Daniel Harper >Assignee: Daniel Harper >Priority: Trivial > Fix For: 2.8.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The configuration setting {{failOnCheckpointingErrors}} [1] is available in > Flink to allow engineers to specify whether the job should fail in the event > of checkpoint failure. > This should be exposed in {{FlinkPipelineOptions}} and > {{FlinkExecutionEnvironments}} to allow users to configure this. > The default for this value in Flink is true [2] > [1] > https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L249 > [2] > https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L73 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6354) Hanging BoundedReadFromUnboundedSourceTest#testTimeBound and SplittableDoFnTest#testLateData
[ https://issues.apache.org/jira/browse/BEAM-6354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742339#comment-16742339 ] Gleb Kanterov commented on BEAM-6354: - [~kenn] yes, technically, I discovered first, and then ignored them > Hanging BoundedReadFromUnboundedSourceTest#testTimeBound and > SplittableDoFnTest#testLateData > > > Key: BEAM-6354 > URL: https://issues.apache.org/jira/browse/BEAM-6354 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Gleb Kanterov >Priority: Major > Fix For: 2.10.0 > > > It seems that they have a similar root cause because both of them use > unbounded streams. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6321) flinkCompatibilityMatrixBatchDOCKER out of memory
[ https://issues.apache.org/jira/browse/BEAM-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels resolved BEAM-6321. -- Resolution: Fixed Fix Version/s: Not applicable Should be fixed by https://github.com/apache/beam/pull/7483 CC [~robertwb] > flinkCompatibilityMatrixBatchDOCKER out of memory > - > > Key: BEAM-6321 > URL: https://issues.apache.org/jira/browse/BEAM-6321 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Andrew Pilloud >Priority: Major > Fix For: Not applicable > > > https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/1156/ > https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/1158/ > {code} > 02:48:35 # > 02:48:35 # There is insufficient memory for the Java Runtime Environment to > continue. > 02:48:35 # Native memory allocation (mmap) failed to map 5019009024 bytes for > committing reserved memory. > 02:48:35 # An error report file with more information is saved as: > 02:48:35 # > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_VR_Flink/src/sdks/python/hs_err_pid21225.log > 02:48:37 Exception in thread wait_until_finish_read: > 02:48:37 Traceback (most recent call last): > 02:48:37 File "/usr/lib/python2.7/threading.py", line 801, in > __bootstrap_inner > 02:48:37 self.run() > 02:48:37 File "/usr/lib/python2.7/threading.py", line 754, in run > 02:48:37 self.__target(*self.__args, **self.__kwargs) > 02:48:37 File "apache_beam/runners/portability/portable_runner.py", line > 285, in read_messages > 02:48:37 beam_job_api_pb2.JobMessagesRequest(job_id=self._job_id)): > 02:48:37 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_VR_Flink/src/build/gradleenv/1327086738/local/lib/python2.7/site-packages/grpc/_channel.py", > line 366, in next > 02:48:37 return self._next() > 02:48:37 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_VR_Flink/src/build/gradleenv/1327086738/local/lib/python2.7/site-packages/grpc/_channel.py", > line 357, in _next > 02:48:37 raise self > 02:48:37 _Rendezvous: <_Rendezvous of RPC that terminated with: > 02:48:37 status = StatusCode.UNAVAILABLE > 02:48:37 details = "Socket closed" > 02:48:37 debug_error_string = > "{"created":"@1545821317.301500173","description":"Error received from > peer","file":"src/core/lib/surface/call.cc","file_line":1036,"grpc_message":"Socket > closed","grpc_status":14}" > 02:48:37 > > 02:48:37 E > 02:48:37 INFO:root:Using latest locally built Python SDK docker image. > 04:18:29 Build timed out (after 100 minutes). Marking the build as aborted. > 04:18:30 Build was aborted > 04:18:30 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6334) Change status of LoadTests after failure on Dataflow
[ https://issues.apache.org/jira/browse/BEAM-6334?focusedWorklogId=184902=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184902 ] ASF GitHub Bot logged work on BEAM-6334: Author: ASF GitHub Bot Created on: 14/Jan/19 17:19 Start Date: 14/Jan/19 17:19 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on pull request #7450: [BEAM-6334] Add throwing exception in case of invalid state or timeout URL: https://github.com/apache/beam/pull/7450#discussion_r247574008 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/LoadTest.java ## @@ -85,17 +88,22 @@ public PipelineResult run() throws IOException { loadTest(); -PipelineResult result = pipeline.run(); -result.waitUntilFinish(); - -LoadTestResult testResult = LoadTestResult.create(result, metricsNamespace, testStartTime); +PipelineResult pipelineResult = pipeline.run(); + pipelineResult.waitUntilFinish(Duration.standardMinutes(options.getLoadTestTimeout())); +LoadTestResult testResult = LoadTestResult.create(pipelineResult, metricsNamespace, testStartTime); ConsoleResultPublisher.publish(testResult); -if (options.getPublishToBigQuery()) { - publishResultToBigQuery(testResult); +Optional failure = assertFailure(pipelineResult, testResult); + +if(failure.isPresent()) { Review comment: nit: missing space between "if" and "(" This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184902) Time Spent: 1.5h (was: 1h 20m) > Change status of LoadTests after failure on Dataflow > > > Key: BEAM-6334 > URL: https://issues.apache.org/jira/browse/BEAM-6334 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Kasia Kucharczyk >Assignee: Lukasz Gajowy >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > When GroupByKey load test was tested via Jenkins and it was failing on > Dataflow, Jenkins was showing Success status. > Example: > [Jenkins build > link|https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_Java_LoadTests_GroupByKey_Dataflow_Small_PR/3/]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6334) Change status of LoadTests after failure on Dataflow
[ https://issues.apache.org/jira/browse/BEAM-6334?focusedWorklogId=184901=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184901 ] ASF GitHub Bot logged work on BEAM-6334: Author: ASF GitHub Bot Created on: 14/Jan/19 17:19 Start Date: 14/Jan/19 17:19 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on pull request #7450: [BEAM-6334] Add throwing exception in case of invalid state or timeout URL: https://github.com/apache/beam/pull/7450#discussion_r247579532 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/JobFailure.java ## @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.loadtests; + +import static java.lang.String.format; +import static java.util.Optional.empty; +import static java.util.Optional.of; + +import java.io.IOException; +import java.util.Optional; +import org.apache.beam.sdk.PipelineResult; +import org.joda.time.Duration; + +/** Class for detecting failures after {@link PipelineResult#waitUntilFinish(Duration)} unblocks. */ +class JobFailure { + + private String cause; + + private boolean requiresCancelling; + + JobFailure(String cause, boolean requiresCancelling) { +this.cause = cause; +this.requiresCancelling = requiresCancelling; + } + + static Optional assertFailure(PipelineResult pipelineResult, + LoadTestResult testResult) { +PipelineResult.State state = pipelineResult.getState(); + +Optional stateRelatedFailure = lookForInvalidState(state); + +if (stateRelatedFailure.isPresent()) { + return stateRelatedFailure; +} else { + return lookForMetricResultFailure(testResult); +} + } + + private static Optional lookForMetricResultFailure(LoadTestResult testResult) { +if (testResult.getRuntime() == -1 || testResult.getTotalBytesCount() == -1) { + return of(new JobFailure("Invalid test results", false)); +} else { + return empty(); +} + } + + static Optional lookForInvalidState(PipelineResult.State state) { +switch (state) { + case RUNNING: + case UNKNOWN: +return of(new JobFailure("Job timeout.", true)); + + case CANCELLED: + case FAILED: + case STOPPED: + case UPDATED: +return of(new JobFailure(format("Invalid job state: %s.", state.toString()), false)); + + default: +return empty(); +} + } + + static PipelineResult handleFailure(PipelineResult pipelineResult, JobFailure failure) + throws IOException { + +if(failure.requiresCancelling) { Review comment: nit: missing space between "if" and "(" This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184901) > Change status of LoadTests after failure on Dataflow > > > Key: BEAM-6334 > URL: https://issues.apache.org/jira/browse/BEAM-6334 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Kasia Kucharczyk >Assignee: Lukasz Gajowy >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > When GroupByKey load test was tested via Jenkins and it was failing on > Dataflow, Jenkins was showing Success status. > Example: > [Jenkins build > link|https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_Java_LoadTests_GroupByKey_Dataflow_Small_PR/3/]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6424) RabbitMqIO: NullPointerException raised on getWatermark() first call
[ https://issues.apache.org/jira/browse/BEAM-6424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742296#comment-16742296 ] Jean-Baptiste Onofré commented on BEAM-6424: Thanks. I will fix that as discussed on github. > RabbitMqIO: NullPointerException raised on getWatermark() first call > > > Key: BEAM-6424 > URL: https://issues.apache.org/jira/browse/BEAM-6424 > Project: Beam > Issue Type: Bug > Components: io-ideas >Reporter: Augustin Lafanechere >Assignee: Jean-Baptiste Onofré >Priority: Minor > > I tried to use the RabbitMqIO with the direct runner to generate an unbounded > PCollection from a queue. I encounter a NPE : > {quote}{{java.lang.NullPointerException}} > {{ at > org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.processElement > (UnboundedReadEvaluatorFactory.java:169)}} > {{}} > {quote} > After investigation it looks like it's caused by the fact that no default is > given to > checkpointMark.oldestTimestamp. getWatermark() is called before the mutation > of the currentTimestamp variable, raising a NPE. I fixed the problem on my > side, reimplementing the class and overriding getWatermark to return > Instant.now() if checkpointMark.oldestTimestamp is null : > {quote}@Override > publicInstantgetWatermark() { > if (checkpointMark.oldestTimestamp == null) { > returnInstant.now(); > } > return checkpointMark.oldestTimestamp; > }{quote} > It looks likes this bug as [already been raised > here|https://jira.apache.org/jira/browse/BEAM-1240?focusedCommentId=16566869=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16566869] > on the PR for RabbitMqIO. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6424) RabbitMqIO: NullPointerException raised on getWatermark() first call
[ https://issues.apache.org/jira/browse/BEAM-6424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Augustin Lafanechere reassigned BEAM-6424: -- Assignee: Jean-Baptiste Onofré > RabbitMqIO: NullPointerException raised on getWatermark() first call > > > Key: BEAM-6424 > URL: https://issues.apache.org/jira/browse/BEAM-6424 > Project: Beam > Issue Type: Bug > Components: io-ideas >Reporter: Augustin Lafanechere >Assignee: Jean-Baptiste Onofré >Priority: Minor > > I tried to use the RabbitMqIO with the direct runner to generate an unbounded > PCollection from a queue. I encounter a NPE : > {quote}{{java.lang.NullPointerException}} > {{ at > org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.processElement > (UnboundedReadEvaluatorFactory.java:169)}} > {{}} > {quote} > After investigation it looks like it's caused by the fact that no default is > given to > checkpointMark.oldestTimestamp. getWatermark() is called before the mutation > of the currentTimestamp variable, raising a NPE. I fixed the problem on my > side, reimplementing the class and overriding getWatermark to return > Instant.now() if checkpointMark.oldestTimestamp is null : > {quote}@Override > publicInstantgetWatermark() { > if (checkpointMark.oldestTimestamp == null) { > returnInstant.now(); > } > return checkpointMark.oldestTimestamp; > }{quote} > It looks likes this bug as [already been raised > here|https://jira.apache.org/jira/browse/BEAM-1240?focusedCommentId=16566869=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16566869] > on the PR for RabbitMqIO. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6424) RabbitMqIO: NullPointerException raised on getWatermark() first call
[ https://issues.apache.org/jira/browse/BEAM-6424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía reassigned BEAM-6424: -- Assignee: (was: Eugene Kirpichov) > RabbitMqIO: NullPointerException raised on getWatermark() first call > > > Key: BEAM-6424 > URL: https://issues.apache.org/jira/browse/BEAM-6424 > Project: Beam > Issue Type: Bug > Components: io-ideas >Reporter: Augustin Lafanechere >Priority: Minor > > I tried to use the RabbitMqIO with the direct runner to generate an unbounded > PCollection from a queue. I encounter a NPE : > {quote}{{java.lang.NullPointerException}} > {{ at > org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.processElement > (UnboundedReadEvaluatorFactory.java:169)}} > {{}} > {quote} > After investigation it looks like it's caused by the fact that no default is > given to > checkpointMark.oldestTimestamp. getWatermark() is called before the mutation > of the currentTimestamp variable, raising a NPE. I fixed the problem on my > side, reimplementing the class and overriding getWatermark to return > Instant.now() if checkpointMark.oldestTimestamp is null : > {quote}@Override > publicInstantgetWatermark() { > if (checkpointMark.oldestTimestamp == null) { > returnInstant.now(); > } > return checkpointMark.oldestTimestamp; > }{quote} > It looks likes this bug as [already been raised > here|https://jira.apache.org/jira/browse/BEAM-1240?focusedCommentId=16566869=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16566869] > on the PR for RabbitMqIO. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6424) RabbitMqIO: NullPointerException raised on getWatermark() first call
[ https://issues.apache.org/jira/browse/BEAM-6424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742294#comment-16742294 ] Augustin Lafanechere commented on BEAM-6424: I assigned to [~jbonofre]. He offered his help after [my comment|https://github.com/apache/beam/pull/1729#issuecomment-454046116] on his original PR. > RabbitMqIO: NullPointerException raised on getWatermark() first call > > > Key: BEAM-6424 > URL: https://issues.apache.org/jira/browse/BEAM-6424 > Project: Beam > Issue Type: Bug > Components: io-ideas >Reporter: Augustin Lafanechere >Assignee: Jean-Baptiste Onofré >Priority: Minor > > I tried to use the RabbitMqIO with the direct runner to generate an unbounded > PCollection from a queue. I encounter a NPE : > {quote}{{java.lang.NullPointerException}} > {{ at > org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.processElement > (UnboundedReadEvaluatorFactory.java:169)}} > {{}} > {quote} > After investigation it looks like it's caused by the fact that no default is > given to > checkpointMark.oldestTimestamp. getWatermark() is called before the mutation > of the currentTimestamp variable, raising a NPE. I fixed the problem on my > side, reimplementing the class and overriding getWatermark to return > Instant.now() if checkpointMark.oldestTimestamp is null : > {quote}@Override > publicInstantgetWatermark() { > if (checkpointMark.oldestTimestamp == null) { > returnInstant.now(); > } > return checkpointMark.oldestTimestamp; > }{quote} > It looks likes this bug as [already been raised > here|https://jira.apache.org/jira/browse/BEAM-1240?focusedCommentId=16566869=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16566869] > on the PR for RabbitMqIO. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6425) Replace SSLContext.getInstance("SSL")
Colm O hEigeartaigh created BEAM-6425: - Summary: Replace SSLContext.getInstance("SSL") Key: BEAM-6425 URL: https://issues.apache.org/jira/browse/BEAM-6425 Project: Beam Issue Type: Improvement Components: io-java-mongodb Reporter: Colm O hEigeartaigh Assignee: Colm O hEigeartaigh SSLUtils has an instance of: SSLContext.getInstance("SSL") Instead we should use "TLS" here. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6351) OutOfMemoryError on DirectRunner while running load tests
[ https://issues.apache.org/jira/browse/BEAM-6351?focusedWorklogId=184888=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184888 ] ASF GitHub Bot logged work on BEAM-6351: Author: ASF GitHub Bot Created on: 14/Jan/19 16:35 Start Date: 14/Jan/19 16:35 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #7497: [BEAM-6351] Divide separate job for "smoke" load test suites URL: https://github.com/apache/beam/pull/7497#discussion_r247564649 ## File path: .test-infra/jenkins/job_LoadTests_Java.groovy ## @@ -69,6 +53,53 @@ for (testConfiguration in testsConfigurations) { ) { description(testConfiguration.jobDescription) commonJobProperties.setTopLevelMainJobProperties(delegate) -loadTestsBuilder.buildTest(delegate, testConfiguration.jobDescription, testConfiguration.runner, testConfiguration.jobProperties, testConfiguration.itClass) +loadTestsBuilder.loadTest(delegate, testConfiguration.jobDescription, testConfiguration.runner, testConfiguration.jobProperties, testConfiguration.itClass) +} +} + +def smokeTestConfigurations = [ +[ +title: 'GroupByKey load test Direct', +itClass : 'org.apache.beam.sdk.loadtests.GroupByKeyLoadTest', +runner : CommonTestProperties.Runner.DIRECT, +jobProperties: [ +publishToBigQuery: true, +bigQueryDataset : 'load_test_SMOKE', +bigQueryTable: 'direct_gbk', +sourceOptions: '{"numRecords":10,"splitPointFrequencyRecords":1}', +stepOptions : '{"outputRecordsPerInputRecord":1,"preservesInputKeyDistribution":true}', +fanout : 10, +iterations : 1, +] +], +[ +title: 'GroupByKey load test Dataflow', Review comment: I think adding smoke tests for other runners is probably useful. Most have local runners so the test should be fairly quick. Dataflow doesn't have this option and takes a few minutes to start up, so tests less than a few minutes will be mostly startup time. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184888) Time Spent: 10m Remaining Estimate: 0h > OutOfMemoryError on DirectRunner while running load tests > -- > > Key: BEAM-6351 > URL: https://issues.apache.org/jira/browse/BEAM-6351 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Kasia Kucharczyk >Assignee: Lukasz Gajowy >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The GroupByKey Java load test with 10 number of records is failing on > DirectRunner with OutOfMemory Error. Then the test is aborted after timeout. > This is [example failing run of the > job|https://builds.apache.org/job/beam_Java_LoadTests_GroupByKey_Direct_Small_PR/2/]. > The stacktrace is following: > {code:java} > Exception in thread "main" java.lang.RuntimeException: > java.lang.OutOfMemoryError: GC overhead limit exceeded > 18:02:15 at > org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:204) > 18:02:15 at > org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) > 18:02:15 at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) > 18:02:15 at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299) > 18:02:15 at org.apache.beam.sdk.loadtests.LoadTest.run(LoadTest.java:75) > 18:02:15 at > org.apache.beam.sdk.loadtests.GroupByKeyLoadTest.run(GroupByKeyLoadTest.java:58) > 18:02:15 at > org.apache.beam.sdk.loadtests.GroupByKeyLoadTest.main(GroupByKeyLoadTest.java:130) > 18:02:15 Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > 18:02:15 at > org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:97) > 18:02:15 at > org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:92) > 18:02:15 at > org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.(MutationDetectors.java:117) > 18:02:15 at > org.apache.beam.sdk.util.MutationDetectors.forValueWithCoder(MutationDetectors.java:44) > 18:02:15 at >
[jira] [Resolved] (BEAM-6397) Add KeyValue attribute to SyntheticDataPubSubPublisher
[ https://issues.apache.org/jira/browse/BEAM-6397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kasia Kucharczyk resolved BEAM-6397. Resolution: Fixed Fix Version/s: 2.10.0 > Add KeyValue attribute to SyntheticDataPubSubPublisher > -- > > Key: BEAM-6397 > URL: https://issues.apache.org/jira/browse/BEAM-6397 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Kasia Kucharczyk >Assignee: Kasia Kucharczyk >Priority: Minor > Fix For: 2.10.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Currently SyntheticDataPubSubPublisher sends only synthetic elements > converted to ByteArray to PubSub. To perform KeyValue tranforms (such us > GroupByKey) on PubSub source I propose to send generated synthetic data as > Map via attributes parameter in PubsubMessage object. > Also it is required to fix NoClassDefFoundError (details in comment). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6405) Improve PortableValidatesRunner test reliability on Jenkins
[ https://issues.apache.org/jira/browse/BEAM-6405?focusedWorklogId=184878=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184878 ] ASF GitHub Bot logged work on BEAM-6405: Author: ASF GitHub Bot Created on: 14/Jan/19 16:03 Start Date: 14/Jan/19 16:03 Worklog Time Spent: 10m Work Description: tweise commented on pull request #7461: [BEAM-6405] Let PortableValidatesRunner tests run in EMBEDDED environment URL: https://github.com/apache/beam/pull/7461 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184878) Time Spent: 5h (was: 4h 50m) > Improve PortableValidatesRunner test reliability on Jenkins > --- > > Key: BEAM-6405 > URL: https://issues.apache.org/jira/browse/BEAM-6405 > Project: Beam > Issue Type: Test > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > The PVR tests seem to be passing fine and then failing consecutively for no > reason: https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/ > It looks like the outrageous parallelism, i.e. number of available cores, is > responsible for the flakiness if there is additional load on the build > slaves. We should lower the parallelism. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-1240) Create RabbitMqIO
[ https://issues.apache.org/jira/browse/BEAM-1240?focusedWorklogId=184846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184846 ] ASF GitHub Bot logged work on BEAM-1240: Author: ASF GitHub Bot Created on: 14/Jan/19 15:44 Start Date: 14/Jan/19 15:44 Worklog Time Spent: 10m Work Description: alafanechere commented on issue #1729: [BEAM-1240] Create RabbitMqIO URL: https://github.com/apache/beam/pull/1729#issuecomment-454050841 I juste created the JIRA : https://jira.apache.org/jira/browse/BEAM-6424 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184846) Time Spent: 14.5h (was: 14h 20m) > Create RabbitMqIO > - > > Key: BEAM-1240 > URL: https://issues.apache.org/jira/browse/BEAM-1240 > Project: Beam > Issue Type: New Feature > Components: io-ideas >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré >Priority: Major > Fix For: 2.9.0 > > Time Spent: 14.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-1240) Create RabbitMqIO
[ https://issues.apache.org/jira/browse/BEAM-1240?focusedWorklogId=184847=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184847 ] ASF GitHub Bot logged work on BEAM-1240: Author: ASF GitHub Bot Created on: 14/Jan/19 15:44 Start Date: 14/Jan/19 15:44 Worklog Time Spent: 10m Work Description: alafanechere commented on issue #1729: [BEAM-1240] Create RabbitMqIO URL: https://github.com/apache/beam/pull/1729#issuecomment-454050841 Thanks ! I juste created the JIRA : https://jira.apache.org/jira/browse/BEAM-6424 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184847) Time Spent: 14h 40m (was: 14.5h) > Create RabbitMqIO > - > > Key: BEAM-1240 > URL: https://issues.apache.org/jira/browse/BEAM-1240 > Project: Beam > Issue Type: New Feature > Components: io-ideas >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré >Priority: Major > Fix For: 2.9.0 > > Time Spent: 14h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS
[ https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742214#comment-16742214 ] Marco Veluscek commented on BEAM-3772: -- Ok, thank you, I'll try again with 2.9 to confirm this. Thank you for your help. > BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded > PCollection and FILE_LOADS > > > Key: BEAM-3772 > URL: https://issues.apache.org/jira/browse/BEAM-3772 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.2.0, 2.3.0 > Environment: Dataflow streaming pipeline >Reporter: Benjamin BENOIST >Assignee: Chamikara Jayalath >Priority: Major > > My workflow : KAFKA -> Dataflow streaming -> BigQuery > Given that having low-latency isn't important in my case, I use FILE_LOADS to > reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, > which is a table with the current hour as a suffix. > This _BigQueryIO.Write_ is configured like this : > {code:java} > .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED) > .withMethod(Method.FILE_LOADS) > .withTriggeringFrequency(triggeringFrequency) > .withNumFileShards(100) > {code} > The first table is successfully created and is written to. But then the > following tables are never created and I get these exceptions: > {code:java} > (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job > with id prefix > 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023, > reached max retries: 3, last failed load job: { > "configuration" : { > "load" : { > "createDisposition" : "CREATE_NEVER", > "destinationTable" : { > "datasetId" : "dev_mydataset", > "projectId" : "myproject-id", > "tableId" : "mytable_20180302_16" > }, > {code} > The _CreateDisposition_ used is _CREATE_NEVER_, contrary as > _CREATE_IF_NEEDED_ as specified. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6408) beam_Java_LoadTests_GroupByKey_Dataflow_Small timeouts
[ https://issues.apache.org/jira/browse/BEAM-6408?focusedWorklogId=184845=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184845 ] ASF GitHub Bot logged work on BEAM-6408: Author: ASF GitHub Bot Created on: 14/Jan/19 15:34 Start Date: 14/Jan/19 15:34 Worklog Time Spent: 10m Work Description: lgajowy commented on issue #7485: [BEAM-6408] Fix load test job URL: https://github.com/apache/beam/pull/7485#issuecomment-454047129 Run GroupByKey Small Java Load Test Dataflow This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184845) Time Spent: 10m Remaining Estimate: 0h > beam_Java_LoadTests_GroupByKey_Dataflow_Small timeouts > -- > > Key: BEAM-6408 > URL: https://issues.apache.org/jira/browse/BEAM-6408 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > This job starts a load test that lasts 4 hours on Dataflow (on 10 workers). > It fails due to jenkins timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-1240) Create RabbitMqIO
[ https://issues.apache.org/jira/browse/BEAM-1240?focusedWorklogId=184844=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184844 ] ASF GitHub Bot logged work on BEAM-1240: Author: ASF GitHub Bot Created on: 14/Jan/19 15:34 Start Date: 14/Jan/19 15:34 Worklog Time Spent: 10m Work Description: jbonofre commented on issue #1729: [BEAM-1240] Create RabbitMqIO URL: https://github.com/apache/beam/pull/1729#issuecomment-454047121 @alafanechere by the way, I can create the Jira for you (and I will assign the Jira to me). Please let me know. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184844) Time Spent: 14h 20m (was: 14h 10m) > Create RabbitMqIO > - > > Key: BEAM-1240 > URL: https://issues.apache.org/jira/browse/BEAM-1240 > Project: Beam > Issue Type: New Feature > Components: io-ideas >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré >Priority: Major > Fix For: 2.9.0 > > Time Spent: 14h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6424) RabbitMqIO: NullPointerException raised on getWatermark() first call
Augustin Lafanechere created BEAM-6424: -- Summary: RabbitMqIO: NullPointerException raised on getWatermark() first call Key: BEAM-6424 URL: https://issues.apache.org/jira/browse/BEAM-6424 Project: Beam Issue Type: Bug Components: io-ideas Reporter: Augustin Lafanechere Assignee: Eugene Kirpichov I tried to use the RabbitMqIO with the direct runner to generate an unbounded PCollection from a queue. I encounter a NPE : {quote}{{java.lang.NullPointerException}} {{ at org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.processElement (UnboundedReadEvaluatorFactory.java:169)}} {{}} {quote} After investigation it looks like it's caused by the fact that no default is given to checkpointMark.oldestTimestamp. getWatermark() is called before the mutation of the currentTimestamp variable, raising a NPE. I fixed the problem on my side, reimplementing the class and overriding getWatermark to return Instant.now() if checkpointMark.oldestTimestamp is null : {quote}@Override publicInstantgetWatermark() { if (checkpointMark.oldestTimestamp == null) { returnInstant.now(); } return checkpointMark.oldestTimestamp; }{quote} It looks likes this bug as [already been raised here|https://jira.apache.org/jira/browse/BEAM-1240?focusedCommentId=16566869=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16566869] on the PR for RabbitMqIO. -- This message was sent by Atlassian JIRA (v7.6.3#76005)