[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality
[ https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=175678=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175678 ] ASF GitHub Bot logged work on BEAM-6240: Author: ASF GitHub Bot Created on: 15/Dec/18 06:39 Start Date: 15/Dec/18 06:39 Worklog Time Spent: 10m Work Description: reuvenlax opened a new pull request #7289: [BEAM-6240] Add a library of schema annotations for POJO and JavaBeans URL: https://github.com/apache/beam/pull/7289 Annotations added: @SchemaIgnore - Skip a field @FieldName - Specify the name of a field (instead of just using the Java field name) @SchemaCreate - Specify a constructor or static creator for a user type. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175678) Time Spent: 10m Remaining Estimate: 0h > Allow users to annotate POJOs and JavaBeans for richer functionality > > > Key: BEAM-6240 > URL: https://issues.apache.org/jira/browse/BEAM-6240 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Desired annotations: > * SchemaIgnore - ignore this field > * FieldName - allow the user to explicitly specify a field name > * SchemaCreate - register a function to be used to create an object (so > fields can be final, and no default constructor need be assumed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality
[ https://issues.apache.org/jira/browse/BEAM-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reuven Lax reassigned BEAM-6240: Assignee: Reuven Lax (was: Kenneth Knowles) > Allow users to annotate POJOs and JavaBeans for richer functionality > > > Key: BEAM-6240 > URL: https://issues.apache.org/jira/browse/BEAM-6240 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > > Desired annotations: > * SchemaIgnore - ignore this field > * FieldName - allow the user to explicitly specify a field name > * SchemaCreate - register a function to be used to create an object (so > fields can be final, and no default constructor need be assumed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality
Reuven Lax created BEAM-6240: Summary: Allow users to annotate POJOs and JavaBeans for richer functionality Key: BEAM-6240 URL: https://issues.apache.org/jira/browse/BEAM-6240 Project: Beam Issue Type: Sub-task Components: sdk-java-core Reporter: Reuven Lax Assignee: Kenneth Knowles Desired annotations: * SchemaIgnore - ignore this field * FieldName - allow the user to explicitly specify a field name * SchemaCreate - register a function to be used to create an object (so fields can be final, and no default constructor need be assumed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6197) Log time for Dataflow GCS upload of staged files
[ https://issues.apache.org/jira/browse/BEAM-6197?focusedWorklogId=175675=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175675 ] ASF GitHub Bot logged work on BEAM-6197: Author: ASF GitHub Bot Created on: 15/Dec/18 05:30 Start Date: 15/Dec/18 05:30 Worklog Time Spent: 10m Work Description: alanmyrvold removed a comment on issue #7235: [BEAM-6197] Log time for Dataflow GCS upload of staged files + add a test program URL: https://github.com/apache/beam/pull/7235#issuecomment-447515127 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175675) Time Spent: 2h (was: 1h 50m) > Log time for Dataflow GCS upload of staged files > > > Key: BEAM-6197 > URL: https://issues.apache.org/jira/browse/BEAM-6197 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Alan Myrvold >Assignee: Alan Myrvold >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > Would be nice to collect timing in the logs of Dataflow GCS upload of staged > files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6197) Log time for Dataflow GCS upload of staged files
[ https://issues.apache.org/jira/browse/BEAM-6197?focusedWorklogId=175674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175674 ] ASF GitHub Bot logged work on BEAM-6197: Author: ASF GitHub Bot Created on: 15/Dec/18 05:30 Start Date: 15/Dec/18 05:30 Worklog Time Spent: 10m Work Description: alanmyrvold commented on issue #7235: [BEAM-6197] Log time for Dataflow GCS upload of staged files + add a test program URL: https://github.com/apache/beam/pull/7235#issuecomment-447539110 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175674) Time Spent: 1h 50m (was: 1h 40m) > Log time for Dataflow GCS upload of staged files > > > Key: BEAM-6197 > URL: https://issues.apache.org/jira/browse/BEAM-6197 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Alan Myrvold >Assignee: Alan Myrvold >Priority: Minor > Time Spent: 1h 50m > Remaining Estimate: 0h > > Would be nice to collect timing in the logs of Dataflow GCS upload of staged > files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6239) Nexmark benchmark for raw sessionization then stream enrichment
[ https://issues.apache.org/jira/browse/BEAM-6239?focusedWorklogId=175672=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175672 ] ASF GitHub Bot logged work on BEAM-6239: Author: ASF GitHub Bot Created on: 15/Dec/18 04:42 Start Date: 15/Dec/18 04:42 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #7287: [BEAM-6239] Add session side input join to Nexmark URL: https://github.com/apache/beam/pull/7287#issuecomment-447537032 run java precommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175672) Time Spent: 40m (was: 0.5h) > Nexmark benchmark for raw sessionization then stream enrichment > --- > > Key: BEAM-6239 > URL: https://issues.apache.org/jira/browse/BEAM-6239 > Project: Beam > Issue Type: New Feature > Components: examples-nexmark >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > We have BOUNDED_SIDE_INPUT_JOIN that just enriches a stream. Another use case > is to sessionize first. I am curious about the different in perf, and how > this plays out in SQL. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6239) Nexmark benchmark for raw sessionization then stream enrichment
[ https://issues.apache.org/jira/browse/BEAM-6239?focusedWorklogId=175659=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175659 ] ASF GitHub Bot logged work on BEAM-6239: Author: ASF GitHub Bot Created on: 15/Dec/18 02:14 Start Date: 15/Dec/18 02:14 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #7287: [BEAM-6239] Add session side input join to Nexmark URL: https://github.com/apache/beam/pull/7287#issuecomment-447529180 R: @akedin @echauchot I know you are a bit busy but tagging you so you see this This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175659) Time Spent: 0.5h (was: 20m) > Nexmark benchmark for raw sessionization then stream enrichment > --- > > Key: BEAM-6239 > URL: https://issues.apache.org/jira/browse/BEAM-6239 > Project: Beam > Issue Type: New Feature > Components: examples-nexmark >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > We have BOUNDED_SIDE_INPUT_JOIN that just enriches a stream. Another use case > is to sessionize first. I am curious about the different in perf, and how > this plays out in SQL. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6239) Nexmark benchmark for raw sessionization then stream enrichment
[ https://issues.apache.org/jira/browse/BEAM-6239?focusedWorklogId=175657=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175657 ] ASF GitHub Bot logged work on BEAM-6239: Author: ASF GitHub Bot Created on: 15/Dec/18 02:12 Start Date: 15/Dec/18 02:12 Worklog Time Spent: 10m Work Description: kennknowles opened a new pull request #7288: [BEAM-6239] Add SQL sessionize then side input join as a benchmark URL: https://github.com/apache/beam/pull/7288 This is my attempt at #7287 in SQL. I believe these features may not be supported by Calcite, and have reached out to their mailing list. Follow this checklist to help us incorporate your contribution quickly and easily: - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175657) Time Spent: 20m (was: 10m) > Nexmark benchmark for raw sessionization then stream enrichment > --- > > Key: BEAM-6239 > URL:
[jira] [Work logged] (BEAM-6239) Nexmark benchmark for raw sessionization then stream enrichment
[ https://issues.apache.org/jira/browse/BEAM-6239?focusedWorklogId=175656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175656 ] ASF GitHub Bot logged work on BEAM-6239: Author: ASF GitHub Bot Created on: 15/Dec/18 02:11 Start Date: 15/Dec/18 02:11 Worklog Time Spent: 10m Work Description: kennknowles opened a new pull request #7287: [BEAM-6239] Add session side input join to Nexmark URL: https://github.com/apache/beam/pull/7287 This is a new benchmark of sessionization then a join with a side input. Follow this checklist to help us incorporate your contribution quickly and easily: - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175656) Time Spent: 10m Remaining Estimate: 0h > Nexmark benchmark for raw sessionization then stream enrichment > --- > > Key: BEAM-6239 > URL: https://issues.apache.org/jira/browse/BEAM-6239 >
[jira] [Created] (BEAM-6239) Nexmark benchmark for raw sessionization then stream enrichment
Kenneth Knowles created BEAM-6239: - Summary: Nexmark benchmark for raw sessionization then stream enrichment Key: BEAM-6239 URL: https://issues.apache.org/jira/browse/BEAM-6239 Project: Beam Issue Type: New Feature Components: examples-nexmark Reporter: Kenneth Knowles Assignee: Kenneth Knowles We have BOUNDED_SIDE_INPUT_JOIN that just enriches a stream. Another use case is to sessionize first. I am curious about the different in perf, and how this plays out in SQL. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4594) Implement Beam Python User State and Timer API
[ https://issues.apache.org/jira/browse/BEAM-4594?focusedWorklogId=175650=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175650 ] ASF GitHub Bot logged work on BEAM-4594: Author: ASF GitHub Bot Created on: 15/Dec/18 00:57 Start Date: 15/Dec/18 00:57 Worklog Time Spent: 10m Work Description: youngoli commented on a change in pull request #7252: [BEAM-4594] Support state continuation tokens over the Fn API. URL: https://github.com/apache/beam/pull/7252#discussion_r241929690 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py ## @@ -1122,9 +1124,25 @@ def restore(self): def process_instruction_id(self, unused_instruction_id): yield -def blocking_get(self, state_key): +def blocking_get(self, state_key, continuation_token=None): with self._lock: -return b''.join(self._state[self._to_key(state_key)]) +full_state = self._state[self._to_key(state_key)] +if self._use_continuation_tokens: + new_token = 'token_%d' % len(self._continuations) + if not continuation_token: +# Store (index, blobs). +self._continuations[new_token] = 0, tuple(full_state) +return b'', new_token + else: +ix, full_state = self._continuations[continuation_token] +if ix == len(full_state): + return b'', None +else: + self._continuations[new_token] = ix + 1, full_state + return full_state[ix], new_token Review comment: So every time there's a valid continuation token and the iteration is advanced, that's stored in a new entry in the list of continuations. Is there any chance that this list may have get identical continuations with different tokens? Since this list seems to be only added to, I instinctively worry about it leaking memory like that. Or is it a non-issue based on the broader architecture? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175650) Time Spent: 6h 50m (was: 6h 40m) > Implement Beam Python User State and Timer API > -- > > Key: BEAM-4594 > URL: https://issues.apache.org/jira/browse/BEAM-4594 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Charles Chen >Assignee: Charles Chen >Priority: Major > Labels: portability > Time Spent: 6h 50m > Remaining Estimate: 0h > > This issue tracks the implementation of the Beam Python User State and Timer > API, described here: [https://s.apache.org/beam-python-user-state-and-timers]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6197) Log time for Dataflow GCS upload of staged files
[ https://issues.apache.org/jira/browse/BEAM-6197?focusedWorklogId=175638=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175638 ] ASF GitHub Bot logged work on BEAM-6197: Author: ASF GitHub Bot Created on: 15/Dec/18 00:08 Start Date: 15/Dec/18 00:08 Worklog Time Spent: 10m Work Description: aaltay commented on issue #7235: [BEAM-6197] Log time for Dataflow GCS upload of staged files + add a test program URL: https://github.com/apache/beam/pull/7235#issuecomment-447515309 Thank you. I can merge it after the last test passes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175638) Time Spent: 1h 40m (was: 1.5h) > Log time for Dataflow GCS upload of staged files > > > Key: BEAM-6197 > URL: https://issues.apache.org/jira/browse/BEAM-6197 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Alan Myrvold >Assignee: Alan Myrvold >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > Would be nice to collect timing in the logs of Dataflow GCS upload of staged > files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6197) Log time for Dataflow GCS upload of staged files
[ https://issues.apache.org/jira/browse/BEAM-6197?focusedWorklogId=175635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175635 ] ASF GitHub Bot logged work on BEAM-6197: Author: ASF GitHub Bot Created on: 15/Dec/18 00:07 Start Date: 15/Dec/18 00:07 Worklog Time Spent: 10m Work Description: alanmyrvold removed a comment on issue #7235: [BEAM-6197] Log time for Dataflow GCS upload of staged files + add a test program URL: https://github.com/apache/beam/pull/7235#issuecomment-447511793 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175635) Time Spent: 1h 20m (was: 1h 10m) > Log time for Dataflow GCS upload of staged files > > > Key: BEAM-6197 > URL: https://issues.apache.org/jira/browse/BEAM-6197 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Alan Myrvold >Assignee: Alan Myrvold >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > Would be nice to collect timing in the logs of Dataflow GCS upload of staged > files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6197) Log time for Dataflow GCS upload of staged files
[ https://issues.apache.org/jira/browse/BEAM-6197?focusedWorklogId=175636=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175636 ] ASF GitHub Bot logged work on BEAM-6197: Author: ASF GitHub Bot Created on: 15/Dec/18 00:07 Start Date: 15/Dec/18 00:07 Worklog Time Spent: 10m Work Description: alanmyrvold commented on issue #7235: [BEAM-6197] Log time for Dataflow GCS upload of staged files + add a test program URL: https://github.com/apache/beam/pull/7235#issuecomment-447515127 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175636) Time Spent: 1.5h (was: 1h 20m) > Log time for Dataflow GCS upload of staged files > > > Key: BEAM-6197 > URL: https://issues.apache.org/jira/browse/BEAM-6197 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Alan Myrvold >Assignee: Alan Myrvold >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > Would be nice to collect timing in the logs of Dataflow GCS upload of staged > files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6197) Log time for Dataflow GCS upload of staged files
[ https://issues.apache.org/jira/browse/BEAM-6197?focusedWorklogId=175622=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175622 ] ASF GitHub Bot logged work on BEAM-6197: Author: ASF GitHub Bot Created on: 14/Dec/18 23:45 Start Date: 14/Dec/18 23:45 Worklog Time Spent: 10m Work Description: alanmyrvold commented on issue #7235: [BEAM-6197] Log time for Dataflow GCS upload of staged files + add a test program URL: https://github.com/apache/beam/pull/7235#issuecomment-447511793 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175622) Time Spent: 1h 10m (was: 1h) > Log time for Dataflow GCS upload of staged files > > > Key: BEAM-6197 > URL: https://issues.apache.org/jira/browse/BEAM-6197 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Alan Myrvold >Assignee: Alan Myrvold >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > Would be nice to collect timing in the logs of Dataflow GCS upload of staged > files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6056) Migrate gRPC to use vendoring library and vendoring format
[ https://issues.apache.org/jira/browse/BEAM-6056?focusedWorklogId=175619=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175619 ] ASF GitHub Bot logged work on BEAM-6056: Author: ASF GitHub Bot Created on: 14/Dec/18 23:35 Start Date: 14/Dec/18 23:35 Worklog Time Spent: 10m Work Description: aaltay commented on issue #7255: WIP [BEAM-6056] Rename v1_13_1 to v1p13p1. URL: https://github.com/apache/beam/pull/7255#issuecomment-447510252 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175619) Time Spent: 2h 20m (was: 2h 10m) > Migrate gRPC to use vendoring library and vendoring format > -- > > Key: BEAM-6056 > URL: https://issues.apache.org/jira/browse/BEAM-6056 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Minor > Labels: portability > Fix For: 2.10.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > This thread discusses the work: > https://lists.apache.org/thread.html/4c12db35b40a6d56e170cd6fc8bb0ac4c43a99aa3cb7dbae54176815@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6188) Create unbounded synthetic source
[ https://issues.apache.org/jira/browse/BEAM-6188?focusedWorklogId=175618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175618 ] ASF GitHub Bot logged work on BEAM-6188: Author: ASF GitHub Bot Created on: 14/Dec/18 23:32 Start Date: 14/Dec/18 23:32 Worklog Time Spent: 10m Work Description: pabloem edited a comment on issue #7226: [BEAM-6188] Unbouded synthetic source URL: https://github.com/apache/beam/pull/7226#issuecomment-447509801 It all is looking good for now. Only classes that I have left to check out are SyntheticUnboundedSource, and SyntheticWatermark. One question: What do you think about using `SyntheticIO.unbounded(options)`/`SyntheticIO.bounded(options)` vs `new UnboundedSyntheticSource(options)`/`new BoundedSyntheticSource(options)` ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175618) Time Spent: 2h 20m (was: 2h 10m) > Create unbounded synthetic source > - > > Key: BEAM-6188 > URL: https://issues.apache.org/jira/browse/BEAM-6188 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > It is needed for streaming scenarios. It should provide ways to reason about > time and recovering from checkpoints. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6188) Create unbounded synthetic source
[ https://issues.apache.org/jira/browse/BEAM-6188?focusedWorklogId=175617=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175617 ] ASF GitHub Bot logged work on BEAM-6188: Author: ASF GitHub Bot Created on: 14/Dec/18 23:32 Start Date: 14/Dec/18 23:32 Worklog Time Spent: 10m Work Description: pabloem commented on issue #7226: [BEAM-6188] Unbouded synthetic source URL: https://github.com/apache/beam/pull/7226#issuecomment-447509801 It all is looking good for now. Only classes that I have left to check out are SyntheticUnboundedSource, and SyntheticWatermark. One question: What do you think about using `SyntheticIO.unbounded(options)` vs `new UnboundedSyntheticSource(options)`/`new BoundedSyntheticSource(options)` ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175617) Time Spent: 2h 10m (was: 2h) > Create unbounded synthetic source > - > > Key: BEAM-6188 > URL: https://issues.apache.org/jira/browse/BEAM-6188 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > It is needed for streaming scenarios. It should provide ways to reason about > time and recovering from checkpoints. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6238) ULR shows error when successfully finishing a job: "StatusRuntimeException: CANCELLED: cancelled before receiving half close"
Daniel Oliveira created BEAM-6238: - Summary: ULR shows error when successfully finishing a job: "StatusRuntimeException: CANCELLED: cancelled before receiving half close" Key: BEAM-6238 URL: https://issues.apache.org/jira/browse/BEAM-6238 Project: Beam Issue Type: Bug Components: runner-direct Reporter: Daniel Oliveira Assignee: Daniel Oliveira When the ULR finishes running a job, it seems to always end with the following error: {noformat} [grpc-default-executor-0] ERROR org.apache.beam.repackaged.beam_runners_direct_java.sdk.fn.data.BeamFnDataGrpcMultiplexer - Failed to handle for unknown endpoint org.apache.beam.vendor.grpc.v1_13_1.io.grpc.StatusRuntimeException: CANCELLED: cancelled before receiving half close at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.Status.asRuntimeException(Status.java:517) at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onCancel(ServerCalls.java:272) at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.PartialForwardingServerCallListener.onCancel(PartialForwardingServerCallListener.java:40) at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.ForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:23) at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:40) at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.Contexts$ContextualizedServerCallListener.onCancel(Contexts.java:96) at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.closed(ServerCallImpl.java:293) at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1Closed.runInContext(ServerImpl.java:738) at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) org.apache.beam.vendor.grpc.v1_13_1.io.grpc.StatusRuntimeException: CANCELLED: cancelled before receiving half close at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.Status.asRuntimeException(Status.java:517) at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onCancel(ServerCalls.java:272) at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.PartialForwardingServerCallListener.onCancel(PartialForwardingServerCallListener.java:40) at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.ForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:23) at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:40) at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.Contexts$ContextualizedServerCallListener.onCancel(Contexts.java:96) at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.closed(ServerCallImpl.java:293) at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1Closed.runInContext(ServerImpl.java:738) at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [grpc-default-executor-38] WARN org.apache.beam.repackaged.beam_runners_direct_java.runners.fnexecution.logging.GrpcLoggingService - Beam Fn Logging client failed to be complete. org.apache.beam.vendor.grpc.v1_13_1.io.grpc.StatusRuntimeException: CANCELLED: call already cancelled at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.Status.asRuntimeException(Status.java:517) at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onCompleted(ServerCalls.java:356) at org.apache.beam.repackaged.beam_runners_direct_java.runners.fnexecution.logging.GrpcLoggingService.completeIfNotNull(GrpcLoggingService.java:78) at org.apache.beam.repackaged.beam_runners_direct_java.runners.fnexecution.logging.GrpcLoggingService.access$400(GrpcLoggingService.java:33) at org.apache.beam.repackaged.beam_runners_direct_java.runners.fnexecution.logging.GrpcLoggingService$InboundObserver.onError(GrpcLoggingService.java:105) at
[jira] [Resolved] (BEAM-6225) Setup Jenkins VR job for new bundle processing code
[ https://issues.apache.org/jira/browse/BEAM-6225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyuan Zhang resolved BEAM-6225. Resolution: Fixed Fix Version/s: 2.10.0 > Setup Jenkins VR job for new bundle processing code > --- > > Key: BEAM-6225 > URL: https://issues.apache.org/jira/browse/BEAM-6225 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.10.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-5953) Support DataflowRunner on Python 3
[ https://issues.apache.org/jira/browse/BEAM-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702207#comment-16702207 ] Mark Liu edited comment on BEAM-5953 at 12/14/18 10:34 PM: --- Thanks Valentyn. If I use google-cloud-storage instead of apitools to stage files, error will gone. https://github.com/apache/beam/pull/7051 is the code change, but unfortunately, transitive dependency google-cloud-core will have version conflicts if we use apitools and google-cloud-storage at same time. The PR is rolled back. was (Author: markflyhigh): Thanks Valentyn. The apitools is replaced by google-cloud-storage for staging files and error is gone. > Support DataflowRunner on Python 3 > -- > > Key: BEAM-5953 > URL: https://issues.apache.org/jira/browse/BEAM-5953 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6225) Setup Jenkins VR job for new bundle processing code
[ https://issues.apache.org/jira/browse/BEAM-6225?focusedWorklogId=175593=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175593 ] ASF GitHub Bot logged work on BEAM-6225: Author: ASF GitHub Bot Created on: 14/Dec/18 21:47 Start Date: 14/Dec/18 21:47 Worklog Time Spent: 10m Work Description: swegner commented on issue #7271: [BEAM-6225] Setup Jenkins Job to Run VR with ExecutableStage URL: https://github.com/apache/beam/pull/7271#issuecomment-447488924 Seed Job passes; LGTM, merging This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175593) Time Spent: 3.5h (was: 3h 20m) > Setup Jenkins VR job for new bundle processing code > --- > > Key: BEAM-6225 > URL: https://issues.apache.org/jira/browse/BEAM-6225 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6225) Setup Jenkins VR job for new bundle processing code
[ https://issues.apache.org/jira/browse/BEAM-6225?focusedWorklogId=175594=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175594 ] ASF GitHub Bot logged work on BEAM-6225: Author: ASF GitHub Bot Created on: 14/Dec/18 21:47 Start Date: 14/Dec/18 21:47 Worklog Time Spent: 10m Work Description: swegner closed pull request #7271: [BEAM-6225] Setup Jenkins Job to Run VR with ExecutableStage URL: https://github.com/apache/beam/pull/7271 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_DataflowPortabilityExecutableStage.groovy b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_DataflowPortabilityExecutableStage.groovy new file mode 100644 index ..62e73617b74e --- /dev/null +++ b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_DataflowPortabilityExecutableStage.groovy @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import PostcommitJobBuilder + + +// This job runs the suite of ValidatesRunner tests against the Dataflow +// runner. +PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_ValidatesRunner_DataflowPortabilityExecutableStage', + 'Run Dataflow Portability ExecutableStage ValidatesRunner', 'Google Cloud Dataflow Runner PortabilityApi ExecutableStage ValidatesRunner Tests', this) { + + description('Runs the ValidatesRunner suite on the Dataflow PortabilityApi runner with ExecutableStage code path enabled.') + + // Set common parameters. Sets a 3 hour timeout. + commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 400) + + // Publish all test results to Jenkins + publishers { +archiveJunit('**/build/test-results/**/*.xml') + } + + // Gradle goals for this job. + steps { +gradle { + rootBuildScriptDir(commonJobProperties.checkoutDir) + tasks(':beam-runners-google-cloud-dataflow-java:validatesRunnerFnApiWorkerExecutableStageTest') + // Increase parallel worker threads above processor limit since most time is + // spent waiting on Dataflow jobs. ValidatesRunner tests on Dataflow are slow + // because each one launches a Dataflow job with about 3 mins of overhead. + // 3 x num_cores strikes a good balance between maxing out parallelism without + // overloading the machines. + commonJobProperties.setGradleSwitches(delegate, 3 * Runtime.runtime.availableProcessors()) +} + } + + // [BEAM-6236] "use_executable_stage_bundle_execution" hasn't been rolled out. + disabled() +} diff --git a/runners/google-cloud-dataflow-java/build.gradle b/runners/google-cloud-dataflow-java/build.gradle index c0b831c6e547..9c6aaf4f773a 100644 --- a/runners/google-cloud-dataflow-java/build.gradle +++ b/runners/google-cloud-dataflow-java/build.gradle @@ -241,17 +241,55 @@ task validatesRunnerFnApiWorkerTest(type: Test) { } } +task validatesRunnerFnApiWorkerExecutableStageTest(type: Test) { +group = "Verification" +description "Validates Dataflow PortabilityApi runner" +dependsOn ":beam-runners-google-cloud-dataflow-java-fn-api-worker:shadowJar" +dependsOn buildAndPushDockerContainer + +systemProperty "beamTestPipelineOptions", JsonOutput.toJson([ +"--runner=TestDataflowRunner", +"--project=${dataflowProject}", +"--tempRoot=${dataflowPostCommitTempRoot}", +"--dataflowWorkerJar=${dataflowFnApiWorkerJar}", + "--workerHarnessContainerImage=${dockerImageContainer}:${dockerTag}", +"--experiments=beam_fn_api,use_executable_stage_bundle_execution"] +) + +// Increase test parallelism up to the number of Gradle workers. By default this is equal +// to the number of CPU cores, but can be increased by setting --max-workers=N. +maxParallelForks Integer.MAX_VALUE +
[jira] [Comment Edited] (BEAM-5419) Build multiple versions of the Flink Runner against different Flink versions
[ https://issues.apache.org/jira/browse/BEAM-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721719#comment-16721719 ] Giorgos Stamatakis edited comment on BEAM-5419 at 12/14/18 8:56 PM: Thank you for your time but it appears that even after installing (and including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up: java.lang.NoClassDefFoundError: org/apache/flink/streaming/api/CheckpointingMode mvn package exec:java Dexec.mainClass=myPipeline "Dexec.args-=--runner=FlinkRunner --flinkMaster=localhost:8081 --streaming=true --parallelism=4 --windowSize=10 --filesToStage=target/XXX-bundled-flink.jar" -Pflink-runner Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: flink-runner org.apache.beam beam-runners-flink-1.6 2.10.0 runtime If there is a sample pom.xml Im interested was (Author: gstamatakis): Thank you for your time but it appears that even after installing (and including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up: java.lang.NoClassDefFoundError: org/apache/flink/streaming/api/CheckpointingMode mvn package exec:java Dexec.mainClass=myPipeline "Dexec.args-=--runner=FlinkRunner --flinkMaster=localhost:8081 --streaming=true --parallelism=4 --windowSize=10 --filesToStage=target/XXX-bundled-flink.jar" -Pflink-runner Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: flink-runner org.apache.beam beam-runners-flink-1.6 2.10.0 runtime > Build multiple versions of the Flink Runner against different Flink versions > > > Key: BEAM-5419 > URL: https://issues.apache.org/jira/browse/BEAM-5419 > Project: Beam > Issue Type: New Feature > Components: build-system, runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.10.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > Following up on a discussion on the mailing list. > We want to keep the Flink version stable across different versions to avoid > upgrade pain for long-term users. At the same time, there are users out there > with newer Flink clusters and developers also want to utilize new Flink > features. > It would be great to build multiple versions of the Flink Runner against > different Flink versions. > When the upgrade is as simple as changing the version property in the build > script, this should be pretty straight-forward. If not, having a "base > version" and applying a patch during the build could be an option. We should > avoid duplicating any Runner code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6150) Provide alternatives to SerializableFunction and SimpleFunction that may declare exceptions
[ https://issues.apache.org/jira/browse/BEAM-6150?focusedWorklogId=175569=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175569 ] ASF GitHub Bot logged work on BEAM-6150: Author: ASF GitHub Bot Created on: 14/Dec/18 20:13 Start Date: 14/Dec/18 20:13 Worklog Time Spent: 10m Work Description: kennknowles closed pull request #7160: [BEAM-6150] Superinterface for SerializableFunction allowing declared exceptions URL: https://github.com/apache/beam/pull/7160 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Contextful.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Contextful.java index 7e788cf05866..97a994f3727e 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Contextful.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Contextful.java @@ -104,11 +104,11 @@ public String toString() { } /** - * Wraps a {@link SerializableFunction} as a {@link Contextful} of {@link Fn} with empty {@link + * Wraps a {@link ProcessFunction} as a {@link Contextful} of {@link Fn} with empty {@link * Requirements}. */ public static Contextful> fn( - final SerializableFunction fn) { + final ProcessFunction fn) { return new Contextful<>((element, c) -> fn.apply(element), Requirements.empty()); } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Filter.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Filter.java index 4bffeb6be3d0..aa9d2cd38100 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Filter.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Filter.java @@ -32,7 +32,7 @@ /** * Returns a {@code PTransform} that takes an input {@code PCollection} and returns a {@code * PCollection} with elements that satisfy the given predicate. The predicate must be a {@code - * SerializableFunction}. + * ProcessFunction}. * * Example of use: * @@ -46,7 +46,7 @@ * #greaterThanEq}, which return elements satisfying various inequalities with the specified value * based on the elements' natural ordering. */ - public static > Filter by( + public static > Filter by( PredicateT predicate) { return new Filter<>(predicate); } @@ -71,7 +71,7 @@ * See also {@link #by}, which returns elements that satisfy the given predicate. */ public static > Filter lessThan(final T value) { -return by((SerializableFunction) input -> input.compareTo(value) < 0) +return by((ProcessFunction) input -> input.compareTo(value) < 0) .described(String.format("x < %s", value)); } @@ -95,7 +95,7 @@ * See also {@link #by}, which returns elements that satisfy the given predicate. */ public static > Filter greaterThan(final T value) { -return by((SerializableFunction) input -> input.compareTo(value) > 0) +return by((ProcessFunction) input -> input.compareTo(value) > 0) .described(String.format("x > %s", value)); } @@ -119,7 +119,7 @@ * See also {@link #by}, which returns elements that satisfy the given predicate. */ public static > Filter lessThanEq(final T value) { -return by((SerializableFunction) input -> input.compareTo(value) <= 0) +return by((ProcessFunction) input -> input.compareTo(value) <= 0) .described(String.format("x ≤ %s", value)); } @@ -143,7 +143,7 @@ * See also {@link #by}, which returns elements that satisfy the given predicate. */ public static > Filter greaterThanEq(final T value) { -return by((SerializableFunction) input -> input.compareTo(value) >= 0) +return by((ProcessFunction) input -> input.compareTo(value) >= 0) .described(String.format("x ≥ %s", value)); } @@ -166,20 +166,20 @@ * See also {@link #by}, which returns elements that satisfy the given predicate. */ public static > Filter equal(final T value) { -return by((SerializableFunction) input -> input.compareTo(value) == 0) +return by((ProcessFunction) input -> input.compareTo(value) == 0) .described(String.format("x == %s", value)); } /// - private SerializableFunction predicate; + private ProcessFunction predicate; private String predicateDescription; - private Filter(SerializableFunction predicate) { + private Filter(ProcessFunction predicate) { this(predicate, "Filter.predicate"); } - private Filter(SerializableFunction predicate, String
[jira] [Work logged] (BEAM-6150) Provide alternatives to SerializableFunction and SimpleFunction that may declare exceptions
[ https://issues.apache.org/jira/browse/BEAM-6150?focusedWorklogId=175570=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175570 ] ASF GitHub Bot logged work on BEAM-6150: Author: ASF GitHub Bot Created on: 14/Dec/18 20:13 Start Date: 14/Dec/18 20:13 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #7160: [BEAM-6150] Superinterface for SerializableFunction allowing declared exceptions URL: https://github.com/apache/beam/pull/7160#issuecomment-447441634 Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175570) Time Spent: 1h 40m (was: 1.5h) > Provide alternatives to SerializableFunction and SimpleFunction that may > declare exceptions > --- > > Key: BEAM-6150 > URL: https://issues.apache.org/jira/browse/BEAM-6150 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Jeff Klukas >Assignee: Jeff Klukas >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > Contextful.Fn allows subclasses to declare checked exceptions, but neither > SerializableFunction nor SimpleFunction do. We want to add a new entry in > each of those hierarchies where checked exceptions are allowed. We can then > change existing method signatures to accept the new superinterfaces in > contexts where allowing user code to throw checked exceptions is acceptable, > such as in ProcessElement methods. > Discussed on the dev mailing list: > https://lists.apache.org/thread.html/eecd8dea8b47710098ec67d73b87cf9b4e2926c444c3fee1a6b9a743@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6150) Provide alternatives to SerializableFunction and SimpleFunction that may declare exceptions
[ https://issues.apache.org/jira/browse/BEAM-6150?focusedWorklogId=175568=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175568 ] ASF GitHub Bot logged work on BEAM-6150: Author: ASF GitHub Bot Created on: 14/Dec/18 20:12 Start Date: 14/Dec/18 20:12 Worklog Time Spent: 10m Work Description: jklukas commented on issue #7160: [BEAM-6150] Superinterface for SerializableFunction allowing declared exceptions URL: https://github.com/apache/beam/pull/7160#issuecomment-447441228 > It looks like the commits should be squashed Yes, that would be lovely. Do you need me to do that, or are you happy to use GitHub's squash and merge? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175568) Time Spent: 1h 20m (was: 1h 10m) > Provide alternatives to SerializableFunction and SimpleFunction that may > declare exceptions > --- > > Key: BEAM-6150 > URL: https://issues.apache.org/jira/browse/BEAM-6150 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Jeff Klukas >Assignee: Jeff Klukas >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > Contextful.Fn allows subclasses to declare checked exceptions, but neither > SerializableFunction nor SimpleFunction do. We want to add a new entry in > each of those hierarchies where checked exceptions are allowed. We can then > change existing method signatures to accept the new superinterfaces in > contexts where allowing user code to throw checked exceptions is acceptable, > such as in ProcessElement methods. > Discussed on the dev mailing list: > https://lists.apache.org/thread.html/eecd8dea8b47710098ec67d73b87cf9b4e2926c444c3fee1a6b9a743@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6150) Provide alternatives to SerializableFunction and SimpleFunction that may declare exceptions
[ https://issues.apache.org/jira/browse/BEAM-6150?focusedWorklogId=175567=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175567 ] ASF GitHub Bot logged work on BEAM-6150: Author: ASF GitHub Bot Created on: 14/Dec/18 20:10 Start Date: 14/Dec/18 20:10 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #7160: [BEAM-6150] Superinterface for SerializableFunction allowing declared exceptions URL: https://github.com/apache/beam/pull/7160#issuecomment-447440631 Nice. It looks like the commits should be squashed, but I don't want to assume - is that what you intended? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175567) Time Spent: 1h 10m (was: 1h) > Provide alternatives to SerializableFunction and SimpleFunction that may > declare exceptions > --- > > Key: BEAM-6150 > URL: https://issues.apache.org/jira/browse/BEAM-6150 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Jeff Klukas >Assignee: Jeff Klukas >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > Contextful.Fn allows subclasses to declare checked exceptions, but neither > SerializableFunction nor SimpleFunction do. We want to add a new entry in > each of those hierarchies where checked exceptions are allowed. We can then > change existing method signatures to accept the new superinterfaces in > contexts where allowing user code to throw checked exceptions is acceptable, > such as in ProcessElement methods. > Discussed on the dev mailing list: > https://lists.apache.org/thread.html/eecd8dea8b47710098ec67d73b87cf9b4e2926c444c3fee1a6b9a743@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6179) Batch size estimation failing
[ https://issues.apache.org/jira/browse/BEAM-6179?focusedWorklogId=175562=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175562 ] ASF GitHub Bot logged work on BEAM-6179: Author: ASF GitHub Bot Created on: 14/Dec/18 19:36 Start Date: 14/Dec/18 19:36 Worklog Time Spent: 10m Work Description: angoenka closed pull request #7280: [BEAM-6179] Fixing bundle estimation when all xs are same URL: https://github.com/apache/beam/pull/7280 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/python/apache_beam/transforms/util.py b/sdks/python/apache_beam/transforms/util.py index 38839f57842d..046be0b417c2 100644 --- a/sdks/python/apache_beam/transforms/util.py +++ b/sdks/python/apache_beam/transforms/util.py @@ -281,6 +281,9 @@ def linear_regression_no_numpy(xs, ys): n = float(len(xs)) xbar = sum(xs) / n ybar = sum(ys) / n +if [xs[0]] * len(xs) == xs: + # Simply use the mean if all values in xs are same. + return 0, ybar/xbar b = (sum([(x - xbar) * (y - ybar) for x, y in zip(xs, ys)]) / sum([(x - xbar)**2 for x in xs])) a = ybar - b * xbar @@ -291,13 +294,16 @@ def linear_regression_numpy(xs, ys): # pylint: disable=wrong-import-order, wrong-import-position import numpy as np from numpy import sum +n = len(xs) +if [xs[0]] * n == xs: + # If all values of xs are same then fallback to linear_regression_no_numpy + return _BatchSizeEstimator.linear_regression_no_numpy(xs, ys) xs = np.asarray(xs, dtype=float) ys = np.asarray(ys, dtype=float) # First do a simple least squares fit for y = a + bx over all points. b, a = np.polyfit(xs, ys, 1) -n = len(xs) if n < 10: return a, b else: diff --git a/sdks/python/apache_beam/transforms/util_test.py b/sdks/python/apache_beam/transforms/util_test.py index e592f938e175..f0296c0a0f12 100644 --- a/sdks/python/apache_beam/transforms/util_test.py +++ b/sdks/python/apache_beam/transforms/util_test.py @@ -18,6 +18,7 @@ """Unit tests for the transform.util classes.""" from __future__ import absolute_import +from __future__ import division import logging import random @@ -160,6 +161,13 @@ def _run_regression_test(self, linear_regression_fn, test_outliers): self.assertAlmostEqual(a, 5, delta=0.01) self.assertAlmostEqual(b, 7, delta=0.01) +# Test repeated xs +xs = [1 + random.random()] * 100 +ys = [7 * x + 5 + 0.01 * random.random() for x in xs] +a, b = linear_regression_fn(xs, ys) +self.assertAlmostEqual(a, 0, delta=0.01) +self.assertAlmostEqual(b, sum(ys)/(len(ys) * xs[0]), delta=0.01) + if test_outliers: xs = [1 + random.random() for _ in range(100)] ys = [2*x + 1 for x in xs] This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175562) Time Spent: 2h (was: 1h 50m) > Batch size estimation failing > - > > Key: BEAM-6179 > URL: https://issues.apache.org/jira/browse/BEAM-6179 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-py-harness >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > Batch size estimation is failing on flink when running 13MB input pipeline > with error > ValueError: On entry to DLASCL parameter number 4 had an illegal value > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error > received from SDK harness for instruction 48: Traceback (most recent call > last): > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 135, in _execute > response = task() > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 170, in > self._execute(lambda: worker.do_instruction(work), work) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 221, in do_instruction > request.instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 237, in process_bundle > bundle_processor.process_bundle(instruction_id) > File >
[jira] [Created] (BEAM-6237) ULR ValidatesRunner tests not deleting artifacts.
Daniel Oliveira created BEAM-6237: - Summary: ULR ValidatesRunner tests not deleting artifacts. Key: BEAM-6237 URL: https://issues.apache.org/jira/browse/BEAM-6237 Project: Beam Issue Type: Bug Components: runner-direct Reporter: Daniel Oliveira Assignee: Daniel Oliveira When running ValidatesRunner tests with the ULR, artifacts are never deleted. Since a new job is run per test, this uses up massive amounts of disk storage quickly (over 20 Gigabytes per execution). This often causes the machine running these tests to run out of disk space which means tests start failing. The ULR should be modified to delete these artifacts after they have been staged to avoid this issue. Flink already does this, so the infrastructure exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6179) Batch size estimation failing
[ https://issues.apache.org/jira/browse/BEAM-6179?focusedWorklogId=175561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175561 ] ASF GitHub Bot logged work on BEAM-6179: Author: ASF GitHub Bot Created on: 14/Dec/18 19:35 Start Date: 14/Dec/18 19:35 Worklog Time Spent: 10m Work Description: angoenka commented on issue #7280: [BEAM-6179] Fixing bundle estimation when all xs are same URL: https://github.com/apache/beam/pull/7280#issuecomment-447431662 Thanks robertwb! Merging it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175561) Time Spent: 1h 50m (was: 1h 40m) > Batch size estimation failing > - > > Key: BEAM-6179 > URL: https://issues.apache.org/jira/browse/BEAM-6179 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-py-harness >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Batch size estimation is failing on flink when running 13MB input pipeline > with error > ValueError: On entry to DLASCL parameter number 4 had an illegal value > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error > received from SDK harness for instruction 48: Traceback (most recent call > last): > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 135, in _execute > response = task() > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 170, in > self._execute(lambda: worker.do_instruction(work), work) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 221, in do_instruction > request.instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 237, in process_bundle > bundle_processor.process_bundle(instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 480, in process_bundle > ].process_encoded(data.data) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 125, in process_encoded > self.output(decoded_value) > File "apache_beam/runners/worker/operations.py", line 182, in > apache_beam.runners.worker.operations.Operation.output > def output(self, windowed_value, output_index=0): > File "apache_beam/runners/worker/operations.py", line 183, in > apache_beam.runners.worker.operations.Operation.output > cython.cast(Receiver, > self.receivers[output_index]).receive(windowed_value) > File "apache_beam/runners/worker/operations.py", line 89, in > apache_beam.runners.worker.operations.ConsumerSet.receive > cython.cast(Operation, consumer).process(windowed_value) > File "apache_beam/runners/worker/operations.py", line 497, in > apache_beam.runners.worker.operations.DoOperation.process > with self.scoped_process_state: > File "apache_beam/runners/worker/operations.py", line 498, in > apache_beam.runners.worker.operations.DoOperation.process > self.dofn_receiver.receive(o) > File "apache_beam/runners/common.py", line 680, in > apache_beam.runners.common.DoFnRunner.receive > self.process(windowed_value) > File "apache_beam/runners/common.py", line 686, in > apache_beam.runners.common.DoFnRunner.process > self._reraise_augmented(exn) > File "apache_beam/runners/common.py", line 709, in > apache_beam.runners.common.DoFnRunner._reraise_augmented > raise > File "apache_beam/runners/common.py", line 684, in > apache_beam.runners.common.DoFnRunner.process > self.do_fn_invoker.invoke_process(windowed_value) > File "apache_beam/runners/common.py", line 420, in > apache_beam.runners.common.SimpleInvoker.invoke_process > output_processor.process_outputs( > File "apache_beam/runners/common.py", line 794, in > apache_beam.runners.common._OutputProcessor.process_outputs > self.main_receivers.receive(windowed_value) > File "apache_beam/runners/worker/operations.py", line 89, in > apache_beam.runners.worker.operations.ConsumerSet.receive > cython.cast(Operation, consumer).process(windowed_value) > File "apache_beam/runners/worker/operations.py", line 497, in > apache_beam.runners.worker.operations.DoOperation.process > with self.scoped_process_state: > File
[jira] [Comment Edited] (BEAM-5419) Build multiple versions of the Flink Runner against different Flink versions
[ https://issues.apache.org/jira/browse/BEAM-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721719#comment-16721719 ] Giorgos Stamatakis edited comment on BEAM-5419 at 12/14/18 7:25 PM: Thank you for your time but it appears that even after installing (and including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up: Caused by: java.lang.ClassNotFoundException: org.apache.flink.api.common.ExecutionMode mvn package exec:java Dexec.mainClass=myPipeline "Dexec.args-=--runner=FlinkRunner --flinkMaster=localhost:8081 --streaming=true --parallelism=4 --windowSize=10 --filesToStage=target/XXX-bundled-flink.jar" -Pflink-runner Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: flink-runner org.apache.beam beam-runners-flink-1.6 2.10.0 runtime was (Author: gstamatakis): Thank you for your time but it appears that even after installing (and including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up: Caused by: java.lang.ClassNotFoundException: org.apache.flink.api.common.ExecutionMode mvn package exec:java Dexec.mainClass=myPipeline "-Dexec.args=--runner=FlinkRunner --flinkMaster=localhost:8081 --streaming=true --parallelism=4 --windowSize=10 --filesToStage=target/XXX-bundled-flink.jar" -Pflink-runner Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: flink-runner org.apache.beam beam-runners-flink-1.6 2.10.0 runtime > Build multiple versions of the Flink Runner against different Flink versions > > > Key: BEAM-5419 > URL: https://issues.apache.org/jira/browse/BEAM-5419 > Project: Beam > Issue Type: New Feature > Components: build-system, runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.10.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > Following up on a discussion on the mailing list. > We want to keep the Flink version stable across different versions to avoid > upgrade pain for long-term users. At the same time, there are users out there > with newer Flink clusters and developers also want to utilize new Flink > features. > It would be great to build multiple versions of the Flink Runner against > different Flink versions. > When the upgrade is as simple as changing the version property in the build > script, this should be pretty straight-forward. If not, having a "base > version" and applying a patch during the build could be an option. We should > avoid duplicating any Runner code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5419) Build multiple versions of the Flink Runner against different Flink versions
[ https://issues.apache.org/jira/browse/BEAM-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721719#comment-16721719 ] Giorgos Stamatakis commented on BEAM-5419: -- Thank you for your time but it appears that even after installing (and including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up: Caused by: java.lang.ClassNotFoundException: org.apache.flink.api.common.ExecutionMode mvn package exec:java -Dexec.mainClass=myPipeline "-Dexec.args=--runner=FlinkRunner --flinkMaster=localhost:8081 --streaming=true --parallelism=4 --windowSize=10 --filesToStage=target/XXX-bundled-flink.jar" -Pflink-runner Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: flink-runner org.apache.beam beam-runners-flink-1.6 2.10.0 runtime > Build multiple versions of the Flink Runner against different Flink versions > > > Key: BEAM-5419 > URL: https://issues.apache.org/jira/browse/BEAM-5419 > Project: Beam > Issue Type: New Feature > Components: build-system, runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.10.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > Following up on a discussion on the mailing list. > We want to keep the Flink version stable across different versions to avoid > upgrade pain for long-term users. At the same time, there are users out there > with newer Flink clusters and developers also want to utilize new Flink > features. > It would be great to build multiple versions of the Flink Runner against > different Flink versions. > When the upgrade is as simple as changing the version property in the build > script, this should be pretty straight-forward. If not, having a "base > version" and applying a patch during the build could be an option. We should > avoid duplicating any Runner code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6197) Log time for Dataflow GCS upload of staged files
[ https://issues.apache.org/jira/browse/BEAM-6197?focusedWorklogId=175557=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175557 ] ASF GitHub Bot logged work on BEAM-6197: Author: ASF GitHub Bot Created on: 14/Dec/18 19:20 Start Date: 14/Dec/18 19:20 Worklog Time Spent: 10m Work Description: alanmyrvold commented on issue #7235: [BEAM-6197] Log time for Dataflow GCS upload of staged files + add a test program URL: https://github.com/apache/beam/pull/7235#issuecomment-447427556 > @alanmyrvold do you know why tests are failing? Yes. checkstyle was failing due to the format of the class comment. Fixed the comment and re-trying This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175557) Time Spent: 1h (was: 50m) > Log time for Dataflow GCS upload of staged files > > > Key: BEAM-6197 > URL: https://issues.apache.org/jira/browse/BEAM-6197 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Alan Myrvold >Assignee: Alan Myrvold >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > Would be nice to collect timing in the logs of Dataflow GCS upload of staged > files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-5419) Build multiple versions of the Flink Runner against different Flink versions
[ https://issues.apache.org/jira/browse/BEAM-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721719#comment-16721719 ] Giorgos Stamatakis edited comment on BEAM-5419 at 12/14/18 7:27 PM: Thank you for your time but it appears that even after installing (and including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up: java.lang.NoClassDefFoundError: org/apache/flink/streaming/api/CheckpointingMode mvn package exec:java Dexec.mainClass=myPipeline "Dexec.args-=--runner=FlinkRunner --flinkMaster=localhost:8081 --streaming=true --parallelism=4 --windowSize=10 --filesToStage=target/XXX-bundled-flink.jar" -Pflink-runner Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: flink-runner org.apache.beam beam-runners-flink-1.6 2.10.0 runtime was (Author: gstamatakis): Thank you for your time but it appears that even after installing (and including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up: Caused by: java.lang.ClassNotFoundException: org.apache.flink.api.common.ExecutionMode mvn package exec:java Dexec.mainClass=myPipeline "Dexec.args-=--runner=FlinkRunner --flinkMaster=localhost:8081 --streaming=true --parallelism=4 --windowSize=10 --filesToStage=target/XXX-bundled-flink.jar" -Pflink-runner Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: flink-runner org.apache.beam beam-runners-flink-1.6 2.10.0 runtime > Build multiple versions of the Flink Runner against different Flink versions > > > Key: BEAM-5419 > URL: https://issues.apache.org/jira/browse/BEAM-5419 > Project: Beam > Issue Type: New Feature > Components: build-system, runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.10.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > Following up on a discussion on the mailing list. > We want to keep the Flink version stable across different versions to avoid > upgrade pain for long-term users. At the same time, there are users out there > with newer Flink clusters and developers also want to utilize new Flink > features. > It would be great to build multiple versions of the Flink Runner against > different Flink versions. > When the upgrade is as simple as changing the version property in the build > script, this should be pretty straight-forward. If not, having a "base > version" and applying a patch during the build could be an option. We should > avoid duplicating any Runner code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-3211) Add an integration test for TextIO ReadAll transform and dynamic writes
[ https://issues.apache.org/jira/browse/BEAM-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chamikara Jayalath resolved BEAM-3211. -- Resolution: Fixed Fix Version/s: Not applicable > Add an integration test for TextIO ReadAll transform and dynamic writes > --- > > Key: BEAM-3211 > URL: https://issues.apache.org/jira/browse/BEAM-3211 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Lukasz Gajowy >Priority: Major > Fix For: Not applicable > > > We should add a small scale version of performance test available in > following file run as a part of 'beam_PostCommit_Java_MavenInstall' and > 'beam_PostCommit_Java_ValidatesRunner*' Jenkins test suites. > https://github.com/apache/beam/blob/master/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6179) Batch size estimation failing
[ https://issues.apache.org/jira/browse/BEAM-6179?focusedWorklogId=175558=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175558 ] ASF GitHub Bot logged work on BEAM-6179: Author: ASF GitHub Bot Created on: 14/Dec/18 19:25 Start Date: 14/Dec/18 19:25 Worklog Time Spent: 10m Work Description: angoenka commented on issue #7280: [BEAM-6179] Fixing bundle estimation when all xs are same URL: https://github.com/apache/beam/pull/7280#issuecomment-447428867 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175558) Time Spent: 1h 40m (was: 1.5h) > Batch size estimation failing > - > > Key: BEAM-6179 > URL: https://issues.apache.org/jira/browse/BEAM-6179 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-py-harness >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > Batch size estimation is failing on flink when running 13MB input pipeline > with error > ValueError: On entry to DLASCL parameter number 4 had an illegal value > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error > received from SDK harness for instruction 48: Traceback (most recent call > last): > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 135, in _execute > response = task() > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 170, in > self._execute(lambda: worker.do_instruction(work), work) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 221, in do_instruction > request.instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 237, in process_bundle > bundle_processor.process_bundle(instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 480, in process_bundle > ].process_encoded(data.data) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 125, in process_encoded > self.output(decoded_value) > File "apache_beam/runners/worker/operations.py", line 182, in > apache_beam.runners.worker.operations.Operation.output > def output(self, windowed_value, output_index=0): > File "apache_beam/runners/worker/operations.py", line 183, in > apache_beam.runners.worker.operations.Operation.output > cython.cast(Receiver, > self.receivers[output_index]).receive(windowed_value) > File "apache_beam/runners/worker/operations.py", line 89, in > apache_beam.runners.worker.operations.ConsumerSet.receive > cython.cast(Operation, consumer).process(windowed_value) > File "apache_beam/runners/worker/operations.py", line 497, in > apache_beam.runners.worker.operations.DoOperation.process > with self.scoped_process_state: > File "apache_beam/runners/worker/operations.py", line 498, in > apache_beam.runners.worker.operations.DoOperation.process > self.dofn_receiver.receive(o) > File "apache_beam/runners/common.py", line 680, in > apache_beam.runners.common.DoFnRunner.receive > self.process(windowed_value) > File "apache_beam/runners/common.py", line 686, in > apache_beam.runners.common.DoFnRunner.process > self._reraise_augmented(exn) > File "apache_beam/runners/common.py", line 709, in > apache_beam.runners.common.DoFnRunner._reraise_augmented > raise > File "apache_beam/runners/common.py", line 684, in > apache_beam.runners.common.DoFnRunner.process > self.do_fn_invoker.invoke_process(windowed_value) > File "apache_beam/runners/common.py", line 420, in > apache_beam.runners.common.SimpleInvoker.invoke_process > output_processor.process_outputs( > File "apache_beam/runners/common.py", line 794, in > apache_beam.runners.common._OutputProcessor.process_outputs > self.main_receivers.receive(windowed_value) > File "apache_beam/runners/worker/operations.py", line 89, in > apache_beam.runners.worker.operations.ConsumerSet.receive > cython.cast(Operation, consumer).process(windowed_value) > File "apache_beam/runners/worker/operations.py", line 497, in > apache_beam.runners.worker.operations.DoOperation.process > with self.scoped_process_state: > File
[jira] [Commented] (BEAM-3211) Add an integration test for TextIO ReadAll transform and dynamic writes
[ https://issues.apache.org/jira/browse/BEAM-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721721#comment-16721721 ] Chamikara Jayalath commented on BEAM-3211: -- Yeah, we can close this. > Add an integration test for TextIO ReadAll transform and dynamic writes > --- > > Key: BEAM-3211 > URL: https://issues.apache.org/jira/browse/BEAM-3211 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Lukasz Gajowy >Priority: Major > > We should add a small scale version of performance test available in > following file run as a part of 'beam_PostCommit_Java_MavenInstall' and > 'beam_PostCommit_Java_ValidatesRunner*' Jenkins test suites. > https://github.com/apache/beam/blob/master/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-5419) Build multiple versions of the Flink Runner against different Flink versions
[ https://issues.apache.org/jira/browse/BEAM-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721719#comment-16721719 ] Giorgos Stamatakis edited comment on BEAM-5419 at 12/14/18 7:24 PM: Thank you for your time but it appears that even after installing (and including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up: Caused by: java.lang.ClassNotFoundException: org.apache.flink.api.common.ExecutionMode mvn package exec:java Dexec.mainClass=myPipeline "-Dexec.args=--runner=FlinkRunner --flinkMaster=localhost:8081 --streaming=true --parallelism=4 --windowSize=10 --filesToStage=target/XXX-bundled-flink.jar" -Pflink-runner Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: flink-runner org.apache.beam beam-runners-flink-1.6 2.10.0 runtime was (Author: gstamatakis): Thank you for your time but it appears that even after installing (and including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up: Caused by: java.lang.ClassNotFoundException: org.apache.flink.api.common.ExecutionMode mvn package exec:java -Dexec.mainClass=myPipeline "-Dexec.args-=--runner=FlinkRunner --flinkMaster=localhost:8081 --streaming=true --parallelism=4 --windowSize=10 --filesToStage=target/XXX-bundled-flink.jar" -Pflink-runner Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: flink-runner org.apache.beam beam-runners-flink-1.6 2.10.0 runtime > Build multiple versions of the Flink Runner against different Flink versions > > > Key: BEAM-5419 > URL: https://issues.apache.org/jira/browse/BEAM-5419 > Project: Beam > Issue Type: New Feature > Components: build-system, runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.10.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > Following up on a discussion on the mailing list. > We want to keep the Flink version stable across different versions to avoid > upgrade pain for long-term users. At the same time, there are users out there > with newer Flink clusters and developers also want to utilize new Flink > features. > It would be great to build multiple versions of the Flink Runner against > different Flink versions. > When the upgrade is as simple as changing the version property in the build > script, this should be pretty straight-forward. If not, having a "base > version" and applying a patch during the build could be an option. We should > avoid duplicating any Runner code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-5419) Build multiple versions of the Flink Runner against different Flink versions
[ https://issues.apache.org/jira/browse/BEAM-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721719#comment-16721719 ] Giorgos Stamatakis edited comment on BEAM-5419 at 12/14/18 7:24 PM: Thank you for your time but it appears that even after installing (and including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up: Caused by: java.lang.ClassNotFoundException: org.apache.flink.api.common.ExecutionMode mvn package exec:java -Dexec.mainClass=myPipeline "-Dexec.args-=--runner=FlinkRunner --flinkMaster=localhost:8081 --streaming=true --parallelism=4 --windowSize=10 --filesToStage=target/XXX-bundled-flink.jar" -Pflink-runner Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: flink-runner org.apache.beam beam-runners-flink-1.6 2.10.0 runtime was (Author: gstamatakis): Thank you for your time but it appears that even after installing (and including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up: Caused by: java.lang.ClassNotFoundException: org.apache.flink.api.common.ExecutionMode mvn package exec:java -Dexec.mainClass=myPipeline "-Dexec.args=--runner=FlinkRunner --flinkMaster=localhost:8081 --streaming=true --parallelism=4 --windowSize=10 --filesToStage=target/XXX-bundled-flink.jar" -Pflink-runner Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: flink-runner org.apache.beam beam-runners-flink-1.6 2.10.0 runtime > Build multiple versions of the Flink Runner against different Flink versions > > > Key: BEAM-5419 > URL: https://issues.apache.org/jira/browse/BEAM-5419 > Project: Beam > Issue Type: New Feature > Components: build-system, runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.10.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > Following up on a discussion on the mailing list. > We want to keep the Flink version stable across different versions to avoid > upgrade pain for long-term users. At the same time, there are users out there > with newer Flink clusters and developers also want to utilize new Flink > features. > It would be great to build multiple versions of the Flink Runner against > different Flink versions. > When the upgrade is as simple as changing the version property in the build > script, this should be pretty straight-forward. If not, having a "base > version" and applying a patch during the build could be an option. We should > avoid duplicating any Runner code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6150) Provide alternatives to SerializableFunction and SimpleFunction that may declare exceptions
[ https://issues.apache.org/jira/browse/BEAM-6150?focusedWorklogId=175556=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175556 ] ASF GitHub Bot logged work on BEAM-6150: Author: ASF GitHub Bot Created on: 14/Dec/18 19:15 Start Date: 14/Dec/18 19:15 Worklog Time Spent: 10m Work Description: jklukas commented on issue #7160: [BEAM-6150] Superinterface for SerializableFunction allowing declared exceptions URL: https://github.com/apache/beam/pull/7160#issuecomment-447426035 Thanks for the review, @kennknowles. > But instead of porting tests to InferableFunction can you duplicate them, or some of them. Done. This should be ready for another review. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175556) Time Spent: 1h (was: 50m) > Provide alternatives to SerializableFunction and SimpleFunction that may > declare exceptions > --- > > Key: BEAM-6150 > URL: https://issues.apache.org/jira/browse/BEAM-6150 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Jeff Klukas >Assignee: Jeff Klukas >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > Contextful.Fn allows subclasses to declare checked exceptions, but neither > SerializableFunction nor SimpleFunction do. We want to add a new entry in > each of those hierarchies where checked exceptions are allowed. We can then > change existing method signatures to accept the new superinterfaces in > contexts where allowing user code to throw checked exceptions is acceptable, > such as in ProcessElement methods. > Discussed on the dev mailing list: > https://lists.apache.org/thread.html/eecd8dea8b47710098ec67d73b87cf9b4e2926c444c3fee1a6b9a743@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6150) Provide alternatives to SerializableFunction and SimpleFunction that may declare exceptions
[ https://issues.apache.org/jira/browse/BEAM-6150?focusedWorklogId=17=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-17 ] ASF GitHub Bot logged work on BEAM-6150: Author: ASF GitHub Bot Created on: 14/Dec/18 19:14 Start Date: 14/Dec/18 19:14 Worklog Time Spent: 10m Work Description: jklukas commented on a change in pull request #7160: [BEAM-6150] Superinterface for SerializableFunction allowing declared exceptions URL: https://github.com/apache/beam/pull/7160#discussion_r241860714 ## File path: website/src/contribute/ptransform-style-guide.md ## @@ -395,8 +395,8 @@ If the transform has an aspect of behavior to be customized by a user's code, ma Do: -* If possible, just use PTransform composition as an extensibility device - i.e. if the same effect can be achieved by the user applying the transform in their pipeline and composing it with another `PTransform`, then the transform itself should not be extensible. E.g., a transform that writes JSON objects to a third-party system should take a `PCollection` (assuming it is possible to provide a `Coder` for `JsonObject`), rather than taking a generic `PCollection` and a `SerializableFunction` (anti-example that should be fixed: `TextIO`). -* If extensibility by user code is necessary inside the transform, pass the user code as a `SerializableFunction` or define your own serializable function-like type (ideally single-method, for interoperability with Java 8 lambdas). Because Java erases the types of lambdas, you should be sure to have adequate type information even if a raw-type `SerializableFunction` is provided by the user. See `MapElements` and `FlatMapElements` for examples of how to use `SimpleFunction` and `SerializableFunction` in tandem to support Java 7 and Java 8 well. +* If possible, just use PTransform composition as an extensibility device - i.e. if the same effect can be achieved by the user applying the transform in their pipeline and composing it with another `PTransform`, then the transform itself should not be extensible. E.g., a transform that writes JSON objects to a third-party system should take a `PCollection` (assuming it is possible to provide a `Coder` for `JsonObject`), rather than taking a generic `PCollection` and a `ProcessFunction` (anti-example that should be fixed: `TextIO`). +* If extensibility by user code is necessary inside the transform, pass the user code as a `ProcessFunction` or define your own serializable function-like type (ideally single-method, for interoperability with Java 8 lambdas). Because Java erases the types of lambdas, you should be sure to have adequate type information even if a raw-type `ProcessFunction` is provided by the user. See `MapElements` and `FlatMapElements` for examples of how to use `ProcessFunction` and `InferableFunction` in tandem to provide good support for both lambdas and concrete subclasses with type information. Review comment: Changed wording of the last sentence here to remove reference to Java 7 and instead discuss type inferability. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 17) Time Spent: 50m (was: 40m) > Provide alternatives to SerializableFunction and SimpleFunction that may > declare exceptions > --- > > Key: BEAM-6150 > URL: https://issues.apache.org/jira/browse/BEAM-6150 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Jeff Klukas >Assignee: Jeff Klukas >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > Contextful.Fn allows subclasses to declare checked exceptions, but neither > SerializableFunction nor SimpleFunction do. We want to add a new entry in > each of those hierarchies where checked exceptions are allowed. We can then > change existing method signatures to accept the new superinterfaces in > contexts where allowing user code to throw checked exceptions is acceptable, > such as in ProcessElement methods. > Discussed on the dev mailing list: > https://lists.apache.org/thread.html/eecd8dea8b47710098ec67d73b87cf9b4e2926c444c3fee1a6b9a743@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6150) Provide alternatives to SerializableFunction and SimpleFunction that may declare exceptions
[ https://issues.apache.org/jira/browse/BEAM-6150?focusedWorklogId=175554=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175554 ] ASF GitHub Bot logged work on BEAM-6150: Author: ASF GitHub Bot Created on: 14/Dec/18 19:13 Start Date: 14/Dec/18 19:13 Worklog Time Spent: 10m Work Description: jklukas commented on a change in pull request #7160: [BEAM-6150] Superinterface for SerializableFunction allowing declared exceptions URL: https://github.com/apache/beam/pull/7160#discussion_r241860482 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/MapElementsTest.java ## @@ -166,12 +231,31 @@ public Integer apply(KV input) { } /** - * Basic test of {@link MapElements} with a {@link SerializableFunction}. This style is generally - * discouraged in Java 7, in favor of {@link SimpleFunction}. + * Test of {@link MapElements} coder propagation with a parametric {@link InferableFunction} where + * the type variable occurs nested within other concrete type constructors. Review comment: This looks out of place in the diff. The removed lines are the docstring for `testMapBasicSerializableFunction` which has been renamed to `testMapBasicProcessFunction` and appears as the next function below. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175554) Time Spent: 40m (was: 0.5h) > Provide alternatives to SerializableFunction and SimpleFunction that may > declare exceptions > --- > > Key: BEAM-6150 > URL: https://issues.apache.org/jira/browse/BEAM-6150 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Jeff Klukas >Assignee: Jeff Klukas >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > Contextful.Fn allows subclasses to declare checked exceptions, but neither > SerializableFunction nor SimpleFunction do. We want to add a new entry in > each of those hierarchies where checked exceptions are allowed. We can then > change existing method signatures to accept the new superinterfaces in > contexts where allowing user code to throw checked exceptions is acceptable, > such as in ProcessElement methods. > Discussed on the dev mailing list: > https://lists.apache.org/thread.html/eecd8dea8b47710098ec67d73b87cf9b4e2926c444c3fee1a6b9a743@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6190) "Processing stuck" messages should be visible on Pantheon
[ https://issues.apache.org/jira/browse/BEAM-6190?focusedWorklogId=175549=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175549 ] ASF GitHub Bot logged work on BEAM-6190: Author: ASF GitHub Bot Created on: 14/Dec/18 19:07 Start Date: 14/Dec/18 19:07 Worklog Time Spent: 10m Work Description: pabloem commented on issue #7240: [BEAM-6190] Add processing stuck message to Pantheon. URL: https://github.com/apache/beam/pull/7240#issuecomment-447423853 I don't know what's your LDAP. Mine is pabloem@ - talk to me in hangouts? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175549) Time Spent: 1h 10m (was: 1h) Remaining Estimate: 22h 50m (was: 23h) > "Processing stuck" messages should be visible on Pantheon > - > > Key: BEAM-6190 > URL: https://issues.apache.org/jira/browse/BEAM-6190 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Affects Versions: 2.8.0 > Environment: Running on Google Cloud Dataflow >Reporter: Dustin Rhodes >Assignee: Dustin Rhodes >Priority: Minor > Fix For: Not applicable > > Original Estimate: 24h > Time Spent: 1h 10m > Remaining Estimate: 22h 50m > > When user processing results in an exception, it is clearly visible on the > Pantheon landing page for a streaming Dataflow job. But when user processing > becomes stuck, there is no indication, even though the worker logs it. Most > users don't check worker logs and it is not that convenient to check for most > users. Ideally a stuck worker would result in a visible error on the > Pantheon landing page. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6225) Setup Jenkins VR job for new bundle processing code
[ https://issues.apache.org/jira/browse/BEAM-6225?focusedWorklogId=175541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175541 ] ASF GitHub Bot logged work on BEAM-6225: Author: ASF GitHub Bot Created on: 14/Dec/18 18:44 Start Date: 14/Dec/18 18:44 Worklog Time Spent: 10m Work Description: swegner commented on issue #7271: [BEAM-6225] Setup Jenkins Job to Run VR with ExecutableStage URL: https://github.com/apache/beam/pull/7271#issuecomment-447417017 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175541) Time Spent: 3h 20m (was: 3h 10m) > Setup Jenkins VR job for new bundle processing code > --- > > Key: BEAM-6225 > URL: https://issues.apache.org/jira/browse/BEAM-6225 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6225) Setup Jenkins VR job for new bundle processing code
[ https://issues.apache.org/jira/browse/BEAM-6225?focusedWorklogId=175523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175523 ] ASF GitHub Bot logged work on BEAM-6225: Author: ASF GitHub Bot Created on: 14/Dec/18 18:32 Start Date: 14/Dec/18 18:32 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #7271: [BEAM-6225] Setup Jenkins Job to Run VR with ExecutableStage URL: https://github.com/apache/beam/pull/7271#issuecomment-447413580 > You mentioned that Dataflow won't use the new `use_exetuable_stage_bundle_execution` until there is a service release. If that's the case, I don't think we should begin running these tests: > > 1. It's misleading because it looks like we're testing something we're not. > 2. It's not transparent when the service will be released, and it might cause test failures that will be hard to diagnose. > 3. Until the experiment is enabled, the tests are redundant with existing suites and wastes resources. > > How about checking this in bug disabling the job until Dataflow service is ready (track with a JIRA ticket)? The [Jenkins job DSL](https://jenkinsci.github.io/job-dsl-plugin/#path/job-disabled) includes a `disabled()` method. Filed JIRA: https://issues.apache.org/jira/browse/BEAM-6236 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175523) Time Spent: 3h 10m (was: 3h) > Setup Jenkins VR job for new bundle processing code > --- > > Key: BEAM-6225 > URL: https://issues.apache.org/jira/browse/BEAM-6225 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6179) Batch size estimation failing
[ https://issues.apache.org/jira/browse/BEAM-6179?focusedWorklogId=175527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175527 ] ASF GitHub Bot logged work on BEAM-6179: Author: ASF GitHub Bot Created on: 14/Dec/18 18:36 Start Date: 14/Dec/18 18:36 Worklog Time Spent: 10m Work Description: angoenka commented on issue #7280: [BEAM-6179] Fixing bundle estimation when all xs are same URL: https://github.com/apache/beam/pull/7280#issuecomment-447414722 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175527) Time Spent: 0.5h (was: 20m) > Batch size estimation failing > - > > Key: BEAM-6179 > URL: https://issues.apache.org/jira/browse/BEAM-6179 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-py-harness >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Batch size estimation is failing on flink when running 13MB input pipeline > with error > ValueError: On entry to DLASCL parameter number 4 had an illegal value > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error > received from SDK harness for instruction 48: Traceback (most recent call > last): > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 135, in _execute > response = task() > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 170, in > self._execute(lambda: worker.do_instruction(work), work) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 221, in do_instruction > request.instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 237, in process_bundle > bundle_processor.process_bundle(instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 480, in process_bundle > ].process_encoded(data.data) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 125, in process_encoded > self.output(decoded_value) > File "apache_beam/runners/worker/operations.py", line 182, in > apache_beam.runners.worker.operations.Operation.output > def output(self, windowed_value, output_index=0): > File "apache_beam/runners/worker/operations.py", line 183, in > apache_beam.runners.worker.operations.Operation.output > cython.cast(Receiver, > self.receivers[output_index]).receive(windowed_value) > File "apache_beam/runners/worker/operations.py", line 89, in > apache_beam.runners.worker.operations.ConsumerSet.receive > cython.cast(Operation, consumer).process(windowed_value) > File "apache_beam/runners/worker/operations.py", line 497, in > apache_beam.runners.worker.operations.DoOperation.process > with self.scoped_process_state: > File "apache_beam/runners/worker/operations.py", line 498, in > apache_beam.runners.worker.operations.DoOperation.process > self.dofn_receiver.receive(o) > File "apache_beam/runners/common.py", line 680, in > apache_beam.runners.common.DoFnRunner.receive > self.process(windowed_value) > File "apache_beam/runners/common.py", line 686, in > apache_beam.runners.common.DoFnRunner.process > self._reraise_augmented(exn) > File "apache_beam/runners/common.py", line 709, in > apache_beam.runners.common.DoFnRunner._reraise_augmented > raise > File "apache_beam/runners/common.py", line 684, in > apache_beam.runners.common.DoFnRunner.process > self.do_fn_invoker.invoke_process(windowed_value) > File "apache_beam/runners/common.py", line 420, in > apache_beam.runners.common.SimpleInvoker.invoke_process > output_processor.process_outputs( > File "apache_beam/runners/common.py", line 794, in > apache_beam.runners.common._OutputProcessor.process_outputs > self.main_receivers.receive(windowed_value) > File "apache_beam/runners/worker/operations.py", line 89, in > apache_beam.runners.worker.operations.ConsumerSet.receive > cython.cast(Operation, consumer).process(windowed_value) > File "apache_beam/runners/worker/operations.py", line 497, in > apache_beam.runners.worker.operations.DoOperation.process > with self.scoped_process_state: > File "apache_beam/runners/worker/operations.py",
[jira] [Work logged] (BEAM-6179) Batch size estimation failing
[ https://issues.apache.org/jira/browse/BEAM-6179?focusedWorklogId=175535=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175535 ] ASF GitHub Bot logged work on BEAM-6179: Author: ASF GitHub Bot Created on: 14/Dec/18 18:38 Start Date: 14/Dec/18 18:38 Worklog Time Spent: 10m Work Description: angoenka commented on issue #7280: [BEAM-6179] Fixing bundle estimation when all xs are same URL: https://github.com/apache/beam/pull/7280#issuecomment-447415359 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175535) Time Spent: 1h 20m (was: 1h 10m) > Batch size estimation failing > - > > Key: BEAM-6179 > URL: https://issues.apache.org/jira/browse/BEAM-6179 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-py-harness >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Batch size estimation is failing on flink when running 13MB input pipeline > with error > ValueError: On entry to DLASCL parameter number 4 had an illegal value > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error > received from SDK harness for instruction 48: Traceback (most recent call > last): > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 135, in _execute > response = task() > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 170, in > self._execute(lambda: worker.do_instruction(work), work) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 221, in do_instruction > request.instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 237, in process_bundle > bundle_processor.process_bundle(instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 480, in process_bundle > ].process_encoded(data.data) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 125, in process_encoded > self.output(decoded_value) > File "apache_beam/runners/worker/operations.py", line 182, in > apache_beam.runners.worker.operations.Operation.output > def output(self, windowed_value, output_index=0): > File "apache_beam/runners/worker/operations.py", line 183, in > apache_beam.runners.worker.operations.Operation.output > cython.cast(Receiver, > self.receivers[output_index]).receive(windowed_value) > File "apache_beam/runners/worker/operations.py", line 89, in > apache_beam.runners.worker.operations.ConsumerSet.receive > cython.cast(Operation, consumer).process(windowed_value) > File "apache_beam/runners/worker/operations.py", line 497, in > apache_beam.runners.worker.operations.DoOperation.process > with self.scoped_process_state: > File "apache_beam/runners/worker/operations.py", line 498, in > apache_beam.runners.worker.operations.DoOperation.process > self.dofn_receiver.receive(o) > File "apache_beam/runners/common.py", line 680, in > apache_beam.runners.common.DoFnRunner.receive > self.process(windowed_value) > File "apache_beam/runners/common.py", line 686, in > apache_beam.runners.common.DoFnRunner.process > self._reraise_augmented(exn) > File "apache_beam/runners/common.py", line 709, in > apache_beam.runners.common.DoFnRunner._reraise_augmented > raise > File "apache_beam/runners/common.py", line 684, in > apache_beam.runners.common.DoFnRunner.process > self.do_fn_invoker.invoke_process(windowed_value) > File "apache_beam/runners/common.py", line 420, in > apache_beam.runners.common.SimpleInvoker.invoke_process > output_processor.process_outputs( > File "apache_beam/runners/common.py", line 794, in > apache_beam.runners.common._OutputProcessor.process_outputs > self.main_receivers.receive(windowed_value) > File "apache_beam/runners/worker/operations.py", line 89, in > apache_beam.runners.worker.operations.ConsumerSet.receive > cython.cast(Operation, consumer).process(windowed_value) > File "apache_beam/runners/worker/operations.py", line 497, in > apache_beam.runners.worker.operations.DoOperation.process > with self.scoped_process_state: > File "apache_beam/runners/worker/operations.py",
[jira] [Work logged] (BEAM-6186) Cleanup FnApiRunner optimization phases.
[ https://issues.apache.org/jira/browse/BEAM-6186?focusedWorklogId=175532=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175532 ] ASF GitHub Bot logged work on BEAM-6186: Author: ASF GitHub Bot Created on: 14/Dec/18 18:38 Start Date: 14/Dec/18 18:38 Worklog Time Spent: 10m Work Description: angoenka commented on issue #7281: [BEAM-6186] Finish moving optimization phases. URL: https://github.com/apache/beam/pull/7281#issuecomment-447415196 Run RAT PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175532) Time Spent: 1h 10m (was: 1h) > Cleanup FnApiRunner optimization phases. > > > Key: BEAM-6186 > URL: https://issues.apache.org/jira/browse/BEAM-6186 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Ahmet Altay >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > They are currently expressed as functions with closure. It would be good to > pull them out with explicit dependencies both to better be able to follow the > code, and also be able to test and reuse them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6179) Batch size estimation failing
[ https://issues.apache.org/jira/browse/BEAM-6179?focusedWorklogId=175534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175534 ] ASF GitHub Bot logged work on BEAM-6179: Author: ASF GitHub Bot Created on: 14/Dec/18 18:38 Start Date: 14/Dec/18 18:38 Worklog Time Spent: 10m Work Description: angoenka commented on issue #7280: [BEAM-6179] Fixing bundle estimation when all xs are same URL: https://github.com/apache/beam/pull/7280#issuecomment-447415322 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175534) Time Spent: 1h 10m (was: 1h) > Batch size estimation failing > - > > Key: BEAM-6179 > URL: https://issues.apache.org/jira/browse/BEAM-6179 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-py-harness >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Batch size estimation is failing on flink when running 13MB input pipeline > with error > ValueError: On entry to DLASCL parameter number 4 had an illegal value > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error > received from SDK harness for instruction 48: Traceback (most recent call > last): > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 135, in _execute > response = task() > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 170, in > self._execute(lambda: worker.do_instruction(work), work) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 221, in do_instruction > request.instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 237, in process_bundle > bundle_processor.process_bundle(instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 480, in process_bundle > ].process_encoded(data.data) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 125, in process_encoded > self.output(decoded_value) > File "apache_beam/runners/worker/operations.py", line 182, in > apache_beam.runners.worker.operations.Operation.output > def output(self, windowed_value, output_index=0): > File "apache_beam/runners/worker/operations.py", line 183, in > apache_beam.runners.worker.operations.Operation.output > cython.cast(Receiver, > self.receivers[output_index]).receive(windowed_value) > File "apache_beam/runners/worker/operations.py", line 89, in > apache_beam.runners.worker.operations.ConsumerSet.receive > cython.cast(Operation, consumer).process(windowed_value) > File "apache_beam/runners/worker/operations.py", line 497, in > apache_beam.runners.worker.operations.DoOperation.process > with self.scoped_process_state: > File "apache_beam/runners/worker/operations.py", line 498, in > apache_beam.runners.worker.operations.DoOperation.process > self.dofn_receiver.receive(o) > File "apache_beam/runners/common.py", line 680, in > apache_beam.runners.common.DoFnRunner.receive > self.process(windowed_value) > File "apache_beam/runners/common.py", line 686, in > apache_beam.runners.common.DoFnRunner.process > self._reraise_augmented(exn) > File "apache_beam/runners/common.py", line 709, in > apache_beam.runners.common.DoFnRunner._reraise_augmented > raise > File "apache_beam/runners/common.py", line 684, in > apache_beam.runners.common.DoFnRunner.process > self.do_fn_invoker.invoke_process(windowed_value) > File "apache_beam/runners/common.py", line 420, in > apache_beam.runners.common.SimpleInvoker.invoke_process > output_processor.process_outputs( > File "apache_beam/runners/common.py", line 794, in > apache_beam.runners.common._OutputProcessor.process_outputs > self.main_receivers.receive(windowed_value) > File "apache_beam/runners/worker/operations.py", line 89, in > apache_beam.runners.worker.operations.ConsumerSet.receive > cython.cast(Operation, consumer).process(windowed_value) > File "apache_beam/runners/worker/operations.py", line 497, in > apache_beam.runners.worker.operations.DoOperation.process > with self.scoped_process_state: > File
[jira] [Work logged] (BEAM-6179) Batch size estimation failing
[ https://issues.apache.org/jira/browse/BEAM-6179?focusedWorklogId=175537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175537 ] ASF GitHub Bot logged work on BEAM-6179: Author: ASF GitHub Bot Created on: 14/Dec/18 18:38 Start Date: 14/Dec/18 18:38 Worklog Time Spent: 10m Work Description: angoenka commented on issue #7280: [BEAM-6179] Fixing bundle estimation when all xs are same URL: https://github.com/apache/beam/pull/7280#issuecomment-447415445 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175537) Time Spent: 1.5h (was: 1h 20m) > Batch size estimation failing > - > > Key: BEAM-6179 > URL: https://issues.apache.org/jira/browse/BEAM-6179 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-py-harness >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Batch size estimation is failing on flink when running 13MB input pipeline > with error > ValueError: On entry to DLASCL parameter number 4 had an illegal value > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error > received from SDK harness for instruction 48: Traceback (most recent call > last): > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 135, in _execute > response = task() > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 170, in > self._execute(lambda: worker.do_instruction(work), work) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 221, in do_instruction > request.instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 237, in process_bundle > bundle_processor.process_bundle(instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 480, in process_bundle > ].process_encoded(data.data) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 125, in process_encoded > self.output(decoded_value) > File "apache_beam/runners/worker/operations.py", line 182, in > apache_beam.runners.worker.operations.Operation.output > def output(self, windowed_value, output_index=0): > File "apache_beam/runners/worker/operations.py", line 183, in > apache_beam.runners.worker.operations.Operation.output > cython.cast(Receiver, > self.receivers[output_index]).receive(windowed_value) > File "apache_beam/runners/worker/operations.py", line 89, in > apache_beam.runners.worker.operations.ConsumerSet.receive > cython.cast(Operation, consumer).process(windowed_value) > File "apache_beam/runners/worker/operations.py", line 497, in > apache_beam.runners.worker.operations.DoOperation.process > with self.scoped_process_state: > File "apache_beam/runners/worker/operations.py", line 498, in > apache_beam.runners.worker.operations.DoOperation.process > self.dofn_receiver.receive(o) > File "apache_beam/runners/common.py", line 680, in > apache_beam.runners.common.DoFnRunner.receive > self.process(windowed_value) > File "apache_beam/runners/common.py", line 686, in > apache_beam.runners.common.DoFnRunner.process > self._reraise_augmented(exn) > File "apache_beam/runners/common.py", line 709, in > apache_beam.runners.common.DoFnRunner._reraise_augmented > raise > File "apache_beam/runners/common.py", line 684, in > apache_beam.runners.common.DoFnRunner.process > self.do_fn_invoker.invoke_process(windowed_value) > File "apache_beam/runners/common.py", line 420, in > apache_beam.runners.common.SimpleInvoker.invoke_process > output_processor.process_outputs( > File "apache_beam/runners/common.py", line 794, in > apache_beam.runners.common._OutputProcessor.process_outputs > self.main_receivers.receive(windowed_value) > File "apache_beam/runners/worker/operations.py", line 89, in > apache_beam.runners.worker.operations.ConsumerSet.receive > cython.cast(Operation, consumer).process(windowed_value) > File "apache_beam/runners/worker/operations.py", line 497, in > apache_beam.runners.worker.operations.DoOperation.process > with self.scoped_process_state: > File "apache_beam/runners/worker/operations.py", line
[jira] [Work logged] (BEAM-6179) Batch size estimation failing
[ https://issues.apache.org/jira/browse/BEAM-6179?focusedWorklogId=175533=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175533 ] ASF GitHub Bot logged work on BEAM-6179: Author: ASF GitHub Bot Created on: 14/Dec/18 18:38 Start Date: 14/Dec/18 18:38 Worklog Time Spent: 10m Work Description: angoenka commented on issue #7280: [BEAM-6179] Fixing bundle estimation when all xs are same URL: https://github.com/apache/beam/pull/7280#issuecomment-447415256 Run RAT PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175533) Time Spent: 1h (was: 50m) > Batch size estimation failing > - > > Key: BEAM-6179 > URL: https://issues.apache.org/jira/browse/BEAM-6179 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-py-harness >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Batch size estimation is failing on flink when running 13MB input pipeline > with error > ValueError: On entry to DLASCL parameter number 4 had an illegal value > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error > received from SDK harness for instruction 48: Traceback (most recent call > last): > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 135, in _execute > response = task() > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 170, in > self._execute(lambda: worker.do_instruction(work), work) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 221, in do_instruction > request.instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 237, in process_bundle > bundle_processor.process_bundle(instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 480, in process_bundle > ].process_encoded(data.data) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 125, in process_encoded > self.output(decoded_value) > File "apache_beam/runners/worker/operations.py", line 182, in > apache_beam.runners.worker.operations.Operation.output > def output(self, windowed_value, output_index=0): > File "apache_beam/runners/worker/operations.py", line 183, in > apache_beam.runners.worker.operations.Operation.output > cython.cast(Receiver, > self.receivers[output_index]).receive(windowed_value) > File "apache_beam/runners/worker/operations.py", line 89, in > apache_beam.runners.worker.operations.ConsumerSet.receive > cython.cast(Operation, consumer).process(windowed_value) > File "apache_beam/runners/worker/operations.py", line 497, in > apache_beam.runners.worker.operations.DoOperation.process > with self.scoped_process_state: > File "apache_beam/runners/worker/operations.py", line 498, in > apache_beam.runners.worker.operations.DoOperation.process > self.dofn_receiver.receive(o) > File "apache_beam/runners/common.py", line 680, in > apache_beam.runners.common.DoFnRunner.receive > self.process(windowed_value) > File "apache_beam/runners/common.py", line 686, in > apache_beam.runners.common.DoFnRunner.process > self._reraise_augmented(exn) > File "apache_beam/runners/common.py", line 709, in > apache_beam.runners.common.DoFnRunner._reraise_augmented > raise > File "apache_beam/runners/common.py", line 684, in > apache_beam.runners.common.DoFnRunner.process > self.do_fn_invoker.invoke_process(windowed_value) > File "apache_beam/runners/common.py", line 420, in > apache_beam.runners.common.SimpleInvoker.invoke_process > output_processor.process_outputs( > File "apache_beam/runners/common.py", line 794, in > apache_beam.runners.common._OutputProcessor.process_outputs > self.main_receivers.receive(windowed_value) > File "apache_beam/runners/worker/operations.py", line 89, in > apache_beam.runners.worker.operations.ConsumerSet.receive > cython.cast(Operation, consumer).process(windowed_value) > File "apache_beam/runners/worker/operations.py", line 497, in > apache_beam.runners.worker.operations.DoOperation.process > with self.scoped_process_state: > File "apache_beam/runners/worker/operations.py", line 498, in >
[jira] [Work logged] (BEAM-6179) Batch size estimation failing
[ https://issues.apache.org/jira/browse/BEAM-6179?focusedWorklogId=175529=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175529 ] ASF GitHub Bot logged work on BEAM-6179: Author: ASF GitHub Bot Created on: 14/Dec/18 18:36 Start Date: 14/Dec/18 18:36 Worklog Time Spent: 10m Work Description: angoenka commented on issue #7280: [BEAM-6179] Fixing bundle estimation when all xs are same URL: https://github.com/apache/beam/pull/7280#issuecomment-447414758 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175529) Time Spent: 40m (was: 0.5h) > Batch size estimation failing > - > > Key: BEAM-6179 > URL: https://issues.apache.org/jira/browse/BEAM-6179 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-py-harness >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Batch size estimation is failing on flink when running 13MB input pipeline > with error > ValueError: On entry to DLASCL parameter number 4 had an illegal value > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error > received from SDK harness for instruction 48: Traceback (most recent call > last): > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 135, in _execute > response = task() > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 170, in > self._execute(lambda: worker.do_instruction(work), work) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 221, in do_instruction > request.instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 237, in process_bundle > bundle_processor.process_bundle(instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 480, in process_bundle > ].process_encoded(data.data) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 125, in process_encoded > self.output(decoded_value) > File "apache_beam/runners/worker/operations.py", line 182, in > apache_beam.runners.worker.operations.Operation.output > def output(self, windowed_value, output_index=0): > File "apache_beam/runners/worker/operations.py", line 183, in > apache_beam.runners.worker.operations.Operation.output > cython.cast(Receiver, > self.receivers[output_index]).receive(windowed_value) > File "apache_beam/runners/worker/operations.py", line 89, in > apache_beam.runners.worker.operations.ConsumerSet.receive > cython.cast(Operation, consumer).process(windowed_value) > File "apache_beam/runners/worker/operations.py", line 497, in > apache_beam.runners.worker.operations.DoOperation.process > with self.scoped_process_state: > File "apache_beam/runners/worker/operations.py", line 498, in > apache_beam.runners.worker.operations.DoOperation.process > self.dofn_receiver.receive(o) > File "apache_beam/runners/common.py", line 680, in > apache_beam.runners.common.DoFnRunner.receive > self.process(windowed_value) > File "apache_beam/runners/common.py", line 686, in > apache_beam.runners.common.DoFnRunner.process > self._reraise_augmented(exn) > File "apache_beam/runners/common.py", line 709, in > apache_beam.runners.common.DoFnRunner._reraise_augmented > raise > File "apache_beam/runners/common.py", line 684, in > apache_beam.runners.common.DoFnRunner.process > self.do_fn_invoker.invoke_process(windowed_value) > File "apache_beam/runners/common.py", line 420, in > apache_beam.runners.common.SimpleInvoker.invoke_process > output_processor.process_outputs( > File "apache_beam/runners/common.py", line 794, in > apache_beam.runners.common._OutputProcessor.process_outputs > self.main_receivers.receive(windowed_value) > File "apache_beam/runners/worker/operations.py", line 89, in > apache_beam.runners.worker.operations.ConsumerSet.receive > cython.cast(Operation, consumer).process(windowed_value) > File "apache_beam/runners/worker/operations.py", line 497, in > apache_beam.runners.worker.operations.DoOperation.process > with self.scoped_process_state: > File "apache_beam/runners/worker/operations.py", line 498,
[jira] [Work logged] (BEAM-6179) Batch size estimation failing
[ https://issues.apache.org/jira/browse/BEAM-6179?focusedWorklogId=175530=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175530 ] ASF GitHub Bot logged work on BEAM-6179: Author: ASF GitHub Bot Created on: 14/Dec/18 18:36 Start Date: 14/Dec/18 18:36 Worklog Time Spent: 10m Work Description: angoenka commented on issue #7280: [BEAM-6179] Fixing bundle estimation when all xs are same URL: https://github.com/apache/beam/pull/7280#issuecomment-447414805 Run RAT PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175530) Time Spent: 50m (was: 40m) > Batch size estimation failing > - > > Key: BEAM-6179 > URL: https://issues.apache.org/jira/browse/BEAM-6179 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-py-harness >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Batch size estimation is failing on flink when running 13MB input pipeline > with error > ValueError: On entry to DLASCL parameter number 4 had an illegal value > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error > received from SDK harness for instruction 48: Traceback (most recent call > last): > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 135, in _execute > response = task() > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 170, in > self._execute(lambda: worker.do_instruction(work), work) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 221, in do_instruction > request.instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 237, in process_bundle > bundle_processor.process_bundle(instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 480, in process_bundle > ].process_encoded(data.data) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 125, in process_encoded > self.output(decoded_value) > File "apache_beam/runners/worker/operations.py", line 182, in > apache_beam.runners.worker.operations.Operation.output > def output(self, windowed_value, output_index=0): > File "apache_beam/runners/worker/operations.py", line 183, in > apache_beam.runners.worker.operations.Operation.output > cython.cast(Receiver, > self.receivers[output_index]).receive(windowed_value) > File "apache_beam/runners/worker/operations.py", line 89, in > apache_beam.runners.worker.operations.ConsumerSet.receive > cython.cast(Operation, consumer).process(windowed_value) > File "apache_beam/runners/worker/operations.py", line 497, in > apache_beam.runners.worker.operations.DoOperation.process > with self.scoped_process_state: > File "apache_beam/runners/worker/operations.py", line 498, in > apache_beam.runners.worker.operations.DoOperation.process > self.dofn_receiver.receive(o) > File "apache_beam/runners/common.py", line 680, in > apache_beam.runners.common.DoFnRunner.receive > self.process(windowed_value) > File "apache_beam/runners/common.py", line 686, in > apache_beam.runners.common.DoFnRunner.process > self._reraise_augmented(exn) > File "apache_beam/runners/common.py", line 709, in > apache_beam.runners.common.DoFnRunner._reraise_augmented > raise > File "apache_beam/runners/common.py", line 684, in > apache_beam.runners.common.DoFnRunner.process > self.do_fn_invoker.invoke_process(windowed_value) > File "apache_beam/runners/common.py", line 420, in > apache_beam.runners.common.SimpleInvoker.invoke_process > output_processor.process_outputs( > File "apache_beam/runners/common.py", line 794, in > apache_beam.runners.common._OutputProcessor.process_outputs > self.main_receivers.receive(windowed_value) > File "apache_beam/runners/worker/operations.py", line 89, in > apache_beam.runners.worker.operations.ConsumerSet.receive > cython.cast(Operation, consumer).process(windowed_value) > File "apache_beam/runners/worker/operations.py", line 497, in > apache_beam.runners.worker.operations.DoOperation.process > with self.scoped_process_state: > File "apache_beam/runners/worker/operations.py", line 498, in
[jira] [Work logged] (BEAM-6225) Setup Jenkins VR job for new bundle processing code
[ https://issues.apache.org/jira/browse/BEAM-6225?focusedWorklogId=175522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175522 ] ASF GitHub Bot logged work on BEAM-6225: Author: ASF GitHub Bot Created on: 14/Dec/18 18:31 Start Date: 14/Dec/18 18:31 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #7271: [BEAM-6225] Setup Jenkins Job to Run VR with ExecutableStage URL: https://github.com/apache/beam/pull/7271#issuecomment-447413447 > Please validate that the seed job passes before merging. > > Otherwise LGTM once above comments are addressed. All comments above are addressed in commit https://github.com/apache/beam/pull/7271/commits/bcba9ead665cab0ff7d7e8036b47b3321a2771d2 . This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175522) Time Spent: 3h (was: 2h 50m) > Setup Jenkins VR job for new bundle processing code > --- > > Key: BEAM-6225 > URL: https://issues.apache.org/jira/browse/BEAM-6225 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6236) Enable VR test job for ExecutableStage after Dataflow service roll out
Boyuan Zhang created BEAM-6236: -- Summary: Enable VR test job for ExecutableStage after Dataflow service roll out Key: BEAM-6236 URL: https://issues.apache.org/jira/browse/BEAM-6236 Project: Beam Issue Type: Task Components: runner-dataflow Reporter: Boyuan Zhang Assignee: Boyuan Zhang -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5723) CassandraIO is broken because of use of bad relocation of guava
[ https://issues.apache.org/jira/browse/BEAM-5723?focusedWorklogId=175520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175520 ] ASF GitHub Bot logged work on BEAM-5723: Author: ASF GitHub Bot Created on: 14/Dec/18 18:28 Start Date: 14/Dec/18 18:28 Worklog Time Spent: 10m Work Description: swegner commented on issue #7237: [BEAM-5723] Changed shadow plugin configuration to avoid relocating g… URL: https://github.com/apache/beam/pull/7237#issuecomment-447412413 Thanks for jumping in @kennknowles. If you don't mind, I'm going to remove myself from this review and let you carry it forward. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175520) Time Spent: 2h 50m (was: 2h 40m) > CassandraIO is broken because of use of bad relocation of guava > --- > > Key: BEAM-5723 > URL: https://issues.apache.org/jira/browse/BEAM-5723 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra >Affects Versions: 2.5.0, 2.6.0, 2.7.0 >Reporter: Arun sethia >Assignee: João Cabrita >Priority: Major > Fix For: 2.10.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > While using apache beam to run dataflow job to read data from BigQuery and > Store/Write to Cassandra with following libaries: > # beam-sdks-java-io-cassandra - 2.6.0 > # beam-sdks-java-io-jdbc - 2.6.0 > # beam-sdks-java-io-google-cloud-platform - 2.6.0 > # beam-sdks-java-core - 2.6.0 > # google-cloud-dataflow-java-sdk-all - 2.5.0 > # google-api-client -1.25.0 > > I am getting following error at the time insert/save data to Cassandra. > {code:java} > [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332) > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6225) Setup Jenkins VR job for new bundle processing code
[ https://issues.apache.org/jira/browse/BEAM-6225?focusedWorklogId=175509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175509 ] ASF GitHub Bot logged work on BEAM-6225: Author: ASF GitHub Bot Created on: 14/Dec/18 18:09 Start Date: 14/Dec/18 18:09 Worklog Time Spent: 10m Work Description: swegner commented on issue #7271: [BEAM-6225] Setup Jenkins Job to Run VR with ExecutableStage URL: https://github.com/apache/beam/pull/7271#issuecomment-447406616 Please validate that the seed job passes before merging. Otherwise LGTM once above comments are addressed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175509) Time Spent: 2h 50m (was: 2h 40m) > Setup Jenkins VR job for new bundle processing code > --- > > Key: BEAM-6225 > URL: https://issues.apache.org/jira/browse/BEAM-6225 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6225) Setup Jenkins VR job for new bundle processing code
[ https://issues.apache.org/jira/browse/BEAM-6225?focusedWorklogId=175508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175508 ] ASF GitHub Bot logged work on BEAM-6225: Author: ASF GitHub Bot Created on: 14/Dec/18 18:09 Start Date: 14/Dec/18 18:09 Worklog Time Spent: 10m Work Description: swegner commented on issue #7271: [BEAM-6225] Setup Jenkins Job to Run VR with ExecutableStage URL: https://github.com/apache/beam/pull/7271#issuecomment-447406437 You mentioned that Dataflow won't use the new `use_exetuable_stage_bundle_execution` until there is a service release. If that's the case, I don't think we should begin running these tests: 1. It's misleading because it looks like we're testing something we're not. 2. It's not transparent when the service will be released, and it might cause test failures that will be hard to diagnose. 3. Until the experiment is enabled, the tests are redundant with existing suites and wastes resources. How about checking this in bug disabling the job until Dataflow service is ready (track with a JIRA ticket)? The [Jenkins job DSL](https://jenkinsci.github.io/job-dsl-plugin/#path/job-disabled) includes a `disabled()` method. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175508) Time Spent: 2h 40m (was: 2.5h) > Setup Jenkins VR job for new bundle processing code > --- > > Key: BEAM-6225 > URL: https://issues.apache.org/jira/browse/BEAM-6225 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6225) Setup Jenkins VR job for new bundle processing code
[ https://issues.apache.org/jira/browse/BEAM-6225?focusedWorklogId=175504=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175504 ] ASF GitHub Bot logged work on BEAM-6225: Author: ASF GitHub Bot Created on: 14/Dec/18 17:50 Start Date: 14/Dec/18 17:50 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7271: [BEAM-6225] Setup Jenkins Job to Run VR with ExecutableStage URL: https://github.com/apache/beam/pull/7271#discussion_r241837408 ## File path: .test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_PortabilityApi_ExecutableStage_Dataflow.groovy ## @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import PostcommitJobBuilder + + +// This job runs the suite of ValidatesRunner tests against the Dataflow +// runner. +PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_ValidatesRunner_PortabilityApi_ExecutableStage Dataflow_Gradle', Review comment: We should remove the _Gradle suffix from the job names as well. This is an artifact from when we were migrating from Maven -> Gradle and had both job types. I'll go through and remove _Gradle from existing jobs; please remove from this job to be consistent. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175504) Time Spent: 2h 20m (was: 2h 10m) > Setup Jenkins VR job for new bundle processing code > --- > > Key: BEAM-6225 > URL: https://issues.apache.org/jira/browse/BEAM-6225 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6191) Redundant error messages for failures in Dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-6191?focusedWorklogId=175502=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175502 ] ASF GitHub Bot logged work on BEAM-6191: Author: ASF GitHub Bot Created on: 14/Dec/18 17:44 Start Date: 14/Dec/18 17:44 Worklog Time Spent: 10m Work Description: swegner closed pull request #7220: [BEAM-6191] Remove redundant error logging for Dataflow exception handling URL: https://github.com/apache/beam/pull/7220 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClient.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClient.java index 2d840e3f4356..4473c04f3da8 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClient.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClient.java @@ -117,17 +117,20 @@ public synchronized WorkItemServiceState reportError(Throwable e) throws IOExcep Status error = new Status(); error.setCode(2); // Code.UNKNOWN. TODO: Replace with a generated definition. // TODO: Attach the stack trace as exception details, not to the message. +String logPrefix = String.format("Failure processing work item %s", uniqueWorkId()); if (isOutOfMemoryError(t)) { String message = "An OutOfMemoryException occurred. Consider specifying higher memory " + "instances in PipelineOptions.\n"; - LOG.error(message); + LOG.error("{}: {}", logPrefix, message); error.setMessage(message + DataflowWorkerLoggingHandler.formatException(t)); } else { - LOG.error("Uncaught exception occurred during work unit execution. This will be retried.", t); + LOG.error( + "{}: Uncaught exception occurred during work unit execution. This will be retried.", + logPrefix, + t); error.setMessage(DataflowWorkerLoggingHandler.formatException(t)); } -LOG.warn("Failure processing work item {}", uniqueWorkId()); status.setErrors(ImmutableList.of(error)); return execute(status); diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/common/worker/MapTaskExecutor.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/common/worker/MapTaskExecutor.java index d11e72fe95dd..5690814f956d 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/common/worker/MapTaskExecutor.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/common/worker/MapTaskExecutor.java @@ -84,7 +84,7 @@ public void execute() throws Exception { op.finish(); } } catch (Exception | Error exn) { -LOG.warn("Aborting operations", exn); +LOG.debug("Aborting operations", exn); for (Operation op : operations) { try { op.abort(); This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175502) Time Spent: 1h (was: 50m) > Redundant error messages for failures in Dataflow runner > > > Key: BEAM-6191 > URL: https://issues.apache.org/jira/browse/BEAM-6191 > Project: Beam > Issue Type: New Feature > Components: runner-dataflow >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Minor > Fix For: 2.10.0 > > Time Spent: 1h > Remaining Estimate: 0h > > The Dataflow runner harness has redundant error logging from a couple > different components, which creates log spam and confusion when failures do > occur. We should dedupe redundant logs. > From a typical user-code exception, we see at least 3 error logs from the > worker: > http://screen/QZxsJOVnvt6 > "Aborting operations" > "Uncaught exception occurred during work unit execution. This will be > retried." > "Failure processing work item" -- This message
[jira] [Work logged] (BEAM-5959) Add Cloud KMS support to GCS copies
[ https://issues.apache.org/jira/browse/BEAM-5959?focusedWorklogId=175506=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175506 ] ASF GitHub Bot logged work on BEAM-5959: Author: ASF GitHub Bot Created on: 14/Dec/18 17:51 Start Date: 14/Dec/18 17:51 Worklog Time Spent: 10m Work Description: udim commented on a change in pull request #7266: [BEAM-5959] Add performance testing for writing many files URL: https://github.com/apache/beam/pull/7266#discussion_r241837665 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java ## @@ -758,7 +758,10 @@ final void moveToOutputFiles( } // During a failure case, files may have been deleted in an earlier step. Thus // we ignore missing files here. + long startTime = System.nanoTime(); FileSystems.rename(srcFiles, dstFiles, StandardMoveOptions.IGNORE_MISSING_FILES); + long endTime = System.nanoTime(); + LOG.info("Renamed {} files in {} seconds.", srcFiles.size(), (endTime - startTime) / 1e9); Review comment: I'm going to change the code that does batch copies and I needed a way to verify that there are no performance regressions. This PR should do that, but if there is a regression this log line could tell me if the batch operation is slower. This log line does appear in the terminal if I run the test directly using `gradlew`. Is there a way to export this metric to the dashboard? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175506) Time Spent: 7h (was: 6h 50m) > Add Cloud KMS support to GCS copies > --- > > Key: BEAM-5959 > URL: https://issues.apache.org/jira/browse/BEAM-5959 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > > Beam SDK currently uses the CopyTo GCS API call, which doesn't support > copying objects that Customer Managed Encryption Keys (CMEK). > CMEKs are managed in Cloud KMS. > Items (for Java and Python SDKs): > - Update clients to versions that support KMS keys. > - Change copyTo API calls to use rewriteTo (Python - directly, Java - > possibly convert copyTo API call to use client library) > - Add unit tests. > - Add basic tests (DirectRunner and GCS buckets with CMEK). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6225) Setup Jenkins VR job for new bundle processing code
[ https://issues.apache.org/jira/browse/BEAM-6225?focusedWorklogId=175505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175505 ] ASF GitHub Bot logged work on BEAM-6225: Author: ASF GitHub Bot Created on: 14/Dec/18 17:50 Start Date: 14/Dec/18 17:50 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7271: [BEAM-6225] Setup Jenkins Job to Run VR with ExecutableStage URL: https://github.com/apache/beam/pull/7271#discussion_r241837527 ## File path: runners/google-cloud-dataflow-java/build.gradle ## @@ -241,6 +241,38 @@ task validatesRunnerFnApiWorkerTest(type: Test) { } } +task validatesRunnerFnApiWorkerExecutableStageTest(type: Test) { +group = "Verification" +dependsOn ":beam-runners-google-cloud-dataflow-java-fn-api-worker:shadowJar" +dependsOn buildAndPushDockerContainer +fnApiPipelineOptions.remove(2) Review comment: That works, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175505) Time Spent: 2.5h (was: 2h 20m) > Setup Jenkins VR job for new bundle processing code > --- > > Key: BEAM-6225 > URL: https://issues.apache.org/jira/browse/BEAM-6225 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6191) Redundant error messages for failures in Dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner resolved BEAM-6191. Resolution: Fixed Fix Version/s: 2.10.0 > Redundant error messages for failures in Dataflow runner > > > Key: BEAM-6191 > URL: https://issues.apache.org/jira/browse/BEAM-6191 > Project: Beam > Issue Type: New Feature > Components: runner-dataflow >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Minor > Fix For: 2.10.0 > > Time Spent: 1h > Remaining Estimate: 0h > > The Dataflow runner harness has redundant error logging from a couple > different components, which creates log spam and confusion when failures do > occur. We should dedupe redundant logs. > From a typical user-code exception, we see at least 3 error logs from the > worker: > http://screen/QZxsJOVnvt6 > "Aborting operations" > "Uncaught exception occurred during work unit execution. This will be > retried." > "Failure processing work item" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6191) Redundant error messages for failures in Dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-6191?focusedWorklogId=175501=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175501 ] ASF GitHub Bot logged work on BEAM-6191: Author: ASF GitHub Bot Created on: 14/Dec/18 17:43 Start Date: 14/Dec/18 17:43 Worklog Time Spent: 10m Work Description: swegner commented on issue #7220: [BEAM-6191] Remove redundant error logging for Dataflow exception handling URL: https://github.com/apache/beam/pull/7220#issuecomment-447398897 > Sorry! What's the notification mechanism used here so I know to watch for it? I get GitHub notifications in my email. Note that it might be going to your personal email account, depending on how you have GitHub configured. [Here's the docs](https://help.github.com/articles/about-notifications/). > Does that OOM message actually come through? Don't think I've ever seen it, but it'd sure be handy! I don't know for sure-- I'm new to this code. I would imagine it should come through. Some reasons it wouldn't: (a) If OOM's typically manifest from some other place, or (b) if when we OOM we don't flush Dataflow logs to Stackdriver before the VM goes down. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175501) Time Spent: 50m (was: 40m) > Redundant error messages for failures in Dataflow runner > > > Key: BEAM-6191 > URL: https://issues.apache.org/jira/browse/BEAM-6191 > Project: Beam > Issue Type: New Feature > Components: runner-dataflow >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > The Dataflow runner harness has redundant error logging from a couple > different components, which creates log spam and confusion when failures do > occur. We should dedupe redundant logs. > From a typical user-code exception, we see at least 3 error logs from the > worker: > http://screen/QZxsJOVnvt6 > "Aborting operations" > "Uncaught exception occurred during work unit execution. This will be > retried." > "Failure processing work item" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6235) Upgrade AutoValue to version 1.6.3
[ https://issues.apache.org/jira/browse/BEAM-6235?focusedWorklogId=175492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175492 ] ASF GitHub Bot logged work on BEAM-6235: Author: ASF GitHub Bot Created on: 14/Dec/18 17:35 Start Date: 14/Dec/18 17:35 Worklog Time Spent: 10m Work Description: swegner commented on issue #7285: [BEAM-6235] Upgrade AutoValue to version 1.6.3 URL: https://github.com/apache/beam/pull/7285#issuecomment-447396536 I see [FindBugs errors](https://scans.gradle.com/s/n53asllrml3be/failure?openFailures=WzBd=WzFd#top=0) in the Java pre-commit. LGTM after the failures are addressed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175492) Time Spent: 20m (was: 10m) > Upgrade AutoValue to version 1.6.3 > -- > > Key: BEAM-6235 > URL: https://issues.apache.org/jira/browse/BEAM-6235 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > AutoValue 1.6 has now the annotations in a separate artifact: > auto-value-annotations. This allows users to specify the annotations in > compile scope and the processor in an annotation processing scope, without > leaking the processor to a release binary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6206) Dataflow template which reads from BigQuery fails if used more than once
[ https://issues.apache.org/jira/browse/BEAM-6206?focusedWorklogId=175489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175489 ] ASF GitHub Bot logged work on BEAM-6206: Author: ASF GitHub Bot Created on: 14/Dec/18 17:30 Start Date: 14/Dec/18 17:30 Worklog Time Spent: 10m Work Description: swegner closed pull request #7270: [BEAM-6206] Add CustomHttpErrors a tool to allow adding custom errors… URL: https://github.com/apache/beam/pull/7270 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/CustomHttpErrors.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/CustomHttpErrors.java new file mode 100644 index ..db46d981400f --- /dev/null +++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/CustomHttpErrors.java @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.util; + +import com.google.auto.value.AutoValue; +import java.io.Serializable; +import java.util.ArrayList; +import java.util.List; + +/** + * An optional component to use with the {@code RetryHttpRequestInitializer} in order to provide + * custom errors for failing http calls. This class allows you to specify custom error messages + * which match specific error codes and containing strings in the URL. The first matcher to match + * the request and response will be used to provide the custom error. + * + * The intended use case here is to examine one of the logs emitted by a failing call made by the + * RetryHttpRequestInitializer, and then adding a custom error message which matches the URL and + * code for it. + * + * Usage: See more in CustomHttpErrorsTest. + * + * {@code + * CustomHttpErrors.Builder builder = new CustomHttpErrors.Builder(); + * builder.addErrorForCodeAndUrlContains(403,"/tables?", "Custom Error Msg"); + * CustomHttpErrors customErrors = builder.build(); + * + * + * RetryHttpRequestInitializer initializer = ... + * initializer.setCustomErrors(customErrors); + * } + * + * Suggestions for future enhancements to anyone upgrading this file: + * + * + * This class is left open for extension, to allow different functions for HttpCallMatcher and + * HttpCallCustomError to match and log errors. For example, new functionality may include + * matching an error based on the HttpResponse body. Additionally, extracting and logging + * strings from the HttpResponse body may make useful functionality. + * Add a methods to add custom errors based on inspecting the contents of the HttpRequest and + * HttpResponse + * Be sure to update the HttpRequestWrapper and HttpResponseWrapper with any new getters that + * you may use. The wrappers were introduced to add a layer of indirection which could be + * mocked mocked out in tests. This was unfortunately needed because mockito cannot mock final + * classes and its non trivial to just construct HttpRequest and HttpResponse objects. + * Making matchers composable with an AND operator may simplify enhancing this code, if + * several different matchers are used. + * + * + * + */ +public class CustomHttpErrors { + + /** + * A simple Tuple class for creating a list of HttpResponseMatcher and HttpResponseCustomError to + * print for the responses. + */ + @AutoValue + public abstract static class MatcherAndError implements Serializable { +static MatcherAndError create(HttpCallMatcher matcher, HttpCallCustomError customError) { + return new AutoValue_CustomHttpErrors_MatcherAndError(matcher, customError); +} + +public abstract HttpCallMatcher getMatcher(); + +public abstract HttpCallCustomError getCustomError(); + } + + /** A Builder which allows building
[jira] [Work logged] (BEAM-4454) Provide automatic schema registration for AVROs
[ https://issues.apache.org/jira/browse/BEAM-4454?focusedWorklogId=175470=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175470 ] ASF GitHub Bot logged work on BEAM-4454: Author: ASF GitHub Bot Created on: 14/Dec/18 17:16 Start Date: 14/Dec/18 17:16 Worklog Time Spent: 10m Work Description: reuvenlax closed pull request #7267: [BEAM-4454] Support Avro POJO objects URL: https://github.com/apache/beam/pull/7267 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroSpecificRecordSchema.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroRecordSchema.java similarity index 67% rename from sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroSpecificRecordSchema.java rename to sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroRecordSchema.java index d8e4bda342f8..29bf51a06a77 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroSpecificRecordSchema.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroRecordSchema.java @@ -17,35 +17,34 @@ */ package org.apache.beam.sdk.schemas; -import org.apache.avro.specific.SpecificRecord; -import org.apache.beam.sdk.schemas.utils.AvroSpecificRecordTypeInformationFactory; import org.apache.beam.sdk.schemas.utils.AvroUtils; import org.apache.beam.sdk.values.TypeDescriptor; /** - * A {@link SchemaProvider} for AVRO generated SpecificRecords. + * A {@link SchemaProvider} for AVRO generated SpecificRecords and POJOs. * - * This provider infers a schema from generates SpecificRecord objects, and creates schemas and - * rows that bind to the appropriate fields. + * This provider infers a schema from generated SpecificRecord objects, and creates schemas and + * rows that bind to the appropriate fields. This provider also infers schemas from Java POJO + * objects, creating a schema that matches that inferred by the AVRO libraries. */ -public class AvroSpecificRecordSchema extends GetterBasedSchemaProvider { +public class AvroRecordSchema extends GetterBasedSchemaProvider { @Override public Schema schemaFor(TypeDescriptor typeDescriptor) { -return AvroUtils.getSchema((Class) typeDescriptor.getRawType()); +return AvroUtils.getSchema(typeDescriptor.getRawType()); } @Override public FieldValueGetterFactory fieldValueGetterFactory() { -return new AvroSpecificRecordGetterFactory(); +return AvroUtils::getGetters; } @Override public UserTypeCreatorFactory schemaTypeCreatorFactory() { -return new AvroSpecificRecordUserTypeCreatorFactory(); +return AvroUtils::getCreator; } @Override public FieldValueTypeInformationFactory fieldValueTypeInformationFactory() { -return new AvroSpecificRecordTypeInformationFactory(); +return AvroUtils::getFieldTypes; } } diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroSpecificRecordGetterFactory.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroSpecificRecordGetterFactory.java deleted file mode 100644 index fcb85f4dd664.. --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroSpecificRecordGetterFactory.java +++ /dev/null @@ -1,30 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.schemas; - -import java.util.List; -import org.apache.avro.specific.SpecificRecord; -import org.apache.beam.sdk.schemas.utils.AvroUtils; - -/** A {@link FieldValueGetterFactory} for AVRO-generated specific records. */ -public class AvroSpecificRecordGetterFactory implements FieldValueGetterFactory { - @Override - public List create(Class targetClass, Schema schema) { -return AvroUtils.getGetters((Class) targetClass, schema); - } -} diff --git
[jira] [Work logged] (BEAM-4454) Provide automatic schema registration for AVROs
[ https://issues.apache.org/jira/browse/BEAM-4454?focusedWorklogId=175457=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175457 ] ASF GitHub Bot logged work on BEAM-4454: Author: ASF GitHub Bot Created on: 14/Dec/18 16:59 Start Date: 14/Dec/18 16:59 Worklog Time Spent: 10m Work Description: kanterov commented on issue #7267: [BEAM-4454] Support Avro POJO objects URL: https://github.com/apache/beam/pull/7267#issuecomment-447386484 LGTM Great, the code became much cleaner after refactoring! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175457) Time Spent: 11h 10m (was: 11h) > Provide automatic schema registration for AVROs > --- > > Key: BEAM-4454 > URL: https://issues.apache.org/jira/browse/BEAM-4454 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Time Spent: 11h 10m > Remaining Estimate: 0h > > Need to make sure this is a compatible change -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6227) FlinkRunner errors if GroupByKey contains null values (streaming mode only)
[ https://issues.apache.org/jira/browse/BEAM-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Weise resolved BEAM-6227. Resolution: Fixed > FlinkRunner errors if GroupByKey contains null values (streaming mode only) > --- > > Key: BEAM-6227 > URL: https://issues.apache.org/jira/browse/BEAM-6227 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.9.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.10.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Apparently this passed ValidatesRunner in streaming mode although this is a > quite common operation: > {noformat} > FlinkPipelineOptions options = > PipelineOptionsFactory.as(FlinkPipelineOptions.class); > options.setRunner(FlinkRunner.class); > // force streaming mode > options.setStreaming(true); > Pipeline pipeline = Pipeline.create(options); > pipeline > .apply(GenerateSequence.from(0).to(100)) > .apply(Window.into(FixedWindows.of(Duration.millis(10 > .apply(ParDo.of( > new DoFn>() { > @ProcessElement > public void processElement(ProcessContext pc) { > pc.output(KV.of("hello", null)); > } > } > )) > .apply(GroupByKey.create()); > pipeline.run(); > {noformat} > Throws: > {noformat} > Caused by: java.lang.RuntimeException: Error adding to bag state. > at > org.apache.beam.runners.flink.translation.wrappers.streaming.state.FlinkStateInternals$FlinkBagState.add(FlinkStateInternals.java:299) > at > org.apache.beam.runners.core.SystemReduceFn.processValue(SystemReduceFn.java:115) > at > org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:608) > at > org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349) > at > org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136) > Caused by: java.lang.NullPointerException: You cannot add null to a ListState. > at > org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:75) > at > org.apache.flink.runtime.state.heap.HeapListState.add(HeapListState.java:89) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.state.FlinkStateInternals$FlinkBagState.add(FlinkStateInternals.java:297) > at > org.apache.beam.runners.core.SystemReduceFn.processValue(SystemReduceFn.java:115) > at > org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:608) > at > org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349) > at > org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136) > at > org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn$DoFnInvoker.invokeProcessElement(Unknown > Source) > at > org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:275) > at > org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:240) > at > org.apache.beam.runners.core.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:80) > at > org.apache.beam.runners.flink.metrics.DoFnRunnerWithMetricsUpdate.processElement(DoFnRunnerWithMetricsUpdate.java:63) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.processElement(DoFnOperator.java:460) > at > org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:104) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Will do a follow-up for running ValidatesRunner in streaming mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6227) FlinkRunner errors if GroupByKey contains null values (streaming mode only)
[ https://issues.apache.org/jira/browse/BEAM-6227?focusedWorklogId=175447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175447 ] ASF GitHub Bot logged work on BEAM-6227: Author: ASF GitHub Bot Created on: 14/Dec/18 16:41 Start Date: 14/Dec/18 16:41 Worklog Time Spent: 10m Work Description: tweise closed pull request #7282: [BEAM-6227] Fix GroupByKey with null values in Flink Runner URL: https://github.com/apache/beam/pull/7282 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/state/FlinkStateInternals.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/state/FlinkStateInternals.java index 02a2ebee74d0..b2f5aede9dfd 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/state/FlinkStateInternals.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/state/FlinkStateInternals.java @@ -17,13 +17,16 @@ */ package org.apache.beam.runners.flink.translation.wrappers.streaming.state; +import com.google.common.base.Preconditions; import com.google.common.collect.ImmutableList; import com.google.common.collect.Iterables; import com.google.common.collect.Lists; import java.nio.ByteBuffer; import java.util.Collections; import java.util.HashMap; +import java.util.Iterator; import java.util.Map; +import javax.annotation.Nonnull; import org.apache.beam.runners.core.StateInternals; import org.apache.beam.runners.core.StateNamespace; import org.apache.beam.runners.core.StateTag; @@ -31,6 +34,7 @@ import org.apache.beam.sdk.coders.Coder; import org.apache.beam.sdk.coders.CoderException; import org.apache.beam.sdk.coders.InstantCoder; +import org.apache.beam.sdk.coders.VoidCoder; import org.apache.beam.sdk.state.BagState; import org.apache.beam.sdk.state.CombiningState; import org.apache.beam.sdk.state.MapState; @@ -49,6 +53,7 @@ import org.apache.beam.sdk.transforms.windowing.TimestampCombiner; import org.apache.beam.sdk.util.CoderUtils; import org.apache.beam.sdk.util.CombineContextFactory; +import org.apache.flink.api.common.state.ListState; import org.apache.flink.api.common.state.ListStateDescriptor; import org.apache.flink.api.common.state.MapStateDescriptor; import org.apache.flink.api.common.state.ValueStateDescriptor; @@ -274,6 +279,7 @@ public int hashCode() { private final String stateId; private final ListStateDescriptor flinkStateDescriptor; private final KeyedStateBackend flinkStateBackend; +private final boolean storesVoidValues; FlinkBagState( KeyedStateBackend flinkStateBackend, @@ -284,17 +290,24 @@ public int hashCode() { this.namespace = namespace; this.stateId = stateId; this.flinkStateBackend = flinkStateBackend; - - flinkStateDescriptor = new ListStateDescriptor<>(stateId, new CoderTypeSerializer<>(coder)); + this.storesVoidValues = coder instanceof VoidCoder; + this.flinkStateDescriptor = + new ListStateDescriptor<>(stateId, new CoderTypeSerializer<>(coder)); } @Override public void add(T input) { try { -flinkStateBackend -.getPartitionedState( -namespace.stringKey(), StringSerializer.INSTANCE, flinkStateDescriptor) -.add(input); +ListState partitionedState = +flinkStateBackend.getPartitionedState( +namespace.stringKey(), StringSerializer.INSTANCE, flinkStateDescriptor); +if (storesVoidValues) { + Preconditions.checkState(input == null, "Expected to a null value but was: %s", input); + // Flink does not allow storing null values + // If we have null values, we use the structural null value + input = (T) VoidCoder.of().structuralValue((Void) input); +} +partitionedState.add(input); } catch (Exception e) { throw new RuntimeException("Error adding to bag state.", e); } @@ -306,14 +319,35 @@ public void add(T input) { } @Override +@Nonnull public Iterable read() { try { -Iterable result = -flinkStateBackend -.getPartitionedState( -namespace.stringKey(), StringSerializer.INSTANCE, flinkStateDescriptor) -.get(); - +ListState partitionedState = +flinkStateBackend.getPartitionedState( +namespace.stringKey(), StringSerializer.INSTANCE, flinkStateDescriptor); +Iterable result =
[jira] [Work logged] (BEAM-6235) Upgrade AutoValue to version 1.6.3
[ https://issues.apache.org/jira/browse/BEAM-6235?focusedWorklogId=175438=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175438 ] ASF GitHub Bot logged work on BEAM-6235: Author: ASF GitHub Bot Created on: 14/Dec/18 16:31 Start Date: 14/Dec/18 16:31 Worklog Time Spent: 10m Work Description: iemejia opened a new pull request #7285: [BEAM-6235] Upgrade AutoValue to version 1.6.3 URL: https://github.com/apache/beam/pull/7285 AutoValue 1.6 has now the annotations in a separate artifact: auto-value-annotations. This allows users to specify the annotations in compile scope and the processor in an annotation processing scope, without leaking the processor to a release binary. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175438) Time Spent: 10m Remaining Estimate: 0h > Upgrade AutoValue to version 1.6.3 > -- > > Key: BEAM-6235 > URL: https://issues.apache.org/jira/browse/BEAM-6235 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > AutoValue 1.6 has now the annotations in a separate artifact: > auto-value-annotations. This allows users to specify the annotations in > compile scope and the processor in an annotation processing scope, without > leaking the processor to a release binary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6235) Upgrade AutoValue to version 1.6.3
Ismaël Mejía created BEAM-6235: -- Summary: Upgrade AutoValue to version 1.6.3 Key: BEAM-6235 URL: https://issues.apache.org/jira/browse/BEAM-6235 Project: Beam Issue Type: Improvement Components: build-system Reporter: Ismaël Mejía Assignee: Ismaël Mejía AutoValue 1.6 has now the annotations in a separate artifact: auto-value-annotations. This allows users to specify the annotations in compile scope and the processor in an annotation processing scope, without leaking the processor to a release binary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4678) Support portable combiner lifting in Java Flink Runner
[ https://issues.apache.org/jira/browse/BEAM-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721570#comment-16721570 ] Maximilian Michels commented on BEAM-4678: -- Has been implemented for Python. AFAIK Go and Java do not supported it yet. > Support portable combiner lifting in Java Flink Runner > -- > > Key: BEAM-4678 > URL: https://issues.apache.org/jira/browse/BEAM-4678 > Project: Beam > Issue Type: Sub-task > Components: runner-flink >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Adjust Flink Runner to support portable combiner lifting as described in the > following doc: > https://s.apache.org/beam-runner-api-combine-model -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5723) CassandraIO is broken because of use of bad relocation of guava
[ https://issues.apache.org/jira/browse/BEAM-5723?focusedWorklogId=175426=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175426 ] ASF GitHub Bot logged work on BEAM-5723: Author: ASF GitHub Bot Created on: 14/Dec/18 16:07 Start Date: 14/Dec/18 16:07 Worklog Time Spent: 10m Work Description: kennknowles commented on a change in pull request #7237: [BEAM-5723] Changed shadow plugin configuration to avoid relocating g… URL: https://github.com/apache/beam/pull/7237#discussion_r241805762 ## File path: sdks/java/io/cassandra/build.gradle ## @@ -17,7 +17,24 @@ */ apply plugin: org.apache.beam.gradle.BeamModulePlugin -applyJavaNature() +applyJavaNature(shadowClosure: { +dependencies { +include(dependency(project.library.java.guava)) +} +// guava uses the com.google.common and com.google.thirdparty package namespaces +relocate("com.google.common", project.getJavaRelocatedPath("com.google.common")) { +// com.google.common is too generic, need to exclude guava-testlib +exclude "com.google.common.collect.testing.**" +exclude "com.google.common.escape.testing.**" +exclude "com.google.common.testing.**" +exclude "com.google.common.util.concurrent.testing.**" +// don't relocate because the cassandra driver's public API uses it Review comment: Yes, exactly This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175426) Time Spent: 2h 40m (was: 2.5h) > CassandraIO is broken because of use of bad relocation of guava > --- > > Key: BEAM-5723 > URL: https://issues.apache.org/jira/browse/BEAM-5723 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra >Affects Versions: 2.5.0, 2.6.0, 2.7.0 >Reporter: Arun sethia >Assignee: João Cabrita >Priority: Major > Fix For: 2.10.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > While using apache beam to run dataflow job to read data from BigQuery and > Store/Write to Cassandra with following libaries: > # beam-sdks-java-io-cassandra - 2.6.0 > # beam-sdks-java-io-jdbc - 2.6.0 > # beam-sdks-java-io-google-cloud-platform - 2.6.0 > # beam-sdks-java-core - 2.6.0 > # google-cloud-dataflow-java-sdk-all - 2.5.0 > # google-api-client -1.25.0 > > I am getting following error at the time insert/save data to Cassandra. > {code:java} > [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332) > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3472) Create a callback triggered at the end of a batch in flink runner
[ https://issues.apache.org/jira/browse/BEAM-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721543#comment-16721543 ] Etienne Chauchot commented on BEAM-3472: Thanks [~mxm] for reviving this subject. I workarounded the absence of callback by regularly watching in a thread the pipeline state and do a final push and thread stopping. But IMHO I still think such a callback could be useful. > Create a callback triggered at the end of a batch in flink runner > - > > Key: BEAM-3472 > URL: https://issues.apache.org/jira/browse/BEAM-3472 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Etienne Chauchot >Priority: Major > > In the future we might add new features to the runners for which we might > need to do some processing at the end of a batch. Currently there is not > unique place (a callback) to add this processing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6234) [Flink Runner] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-6234?focusedWorklogId=175411=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175411 ] ASF GitHub Bot logged work on BEAM-6234: Author: ASF GitHub Bot Created on: 14/Dec/18 15:49 Start Date: 14/Dec/18 15:49 Worklog Time Spent: 10m Work Description: djhworld opened a new pull request #7284: [BEAM-6234] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions URL: https://github.com/apache/beam/pull/7284 The configuration setting failOnCheckpointingErrors [1] is available in Flink to allow engineers to specify whether the job should fail in the event of checkpoint failure. This should be exposed in FlinkPipelineOptions and FlinkExecutionEnvironments to allow users to configure this. Follow this checklist to help us incorporate your contribution quickly and easily: - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue
[jira] [Updated] (BEAM-6234) [Flink Runner] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Harper updated BEAM-6234: Description: The configuration setting {{failOnCheckpointingErrors}} [1] is available in Flink to allow engineers to specify whether the job should fail in the event of checkpoint failure. This should be exposed in {{FlinkPipelineOptions}} and {{FlinkExecutionEnvironments}} to allow users to configure this. The default for this value in Flink is true [2] [1] https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L249 [2] https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L73 was: The configuration setting {{minPauseBetweenCheckpoints}} [1] is available in Flink to allow a grace period when checkpoints runtime is > checkpoint interval. This should be exposed in {{FlinkPipelineOptions}} and {{FlinkExecutionEnvironments}} to allow users to configure this. The default for this value in Flink is 0ms [2] [1] https://ci.apache.org/projects/flink/flink-docs-release-1.5/api/java/org/apache/flink/streaming/api/environment/CheckpointConfig.html#setMinPauseBetweenCheckpoints-long- [2] https://ci.apache.org/projects/flink/flink-docs-release-1.5/api/java/constant-values.html#org.apache.flink.streaming.api.environment.CheckpointConfig.DEFAULT_MIN_PAUSE_BETWEEN_CHECKPOINTS > [Flink Runner] Make failOnCheckpointingErrors setting available in > FlinkPipelineOptions > --- > > Key: BEAM-6234 > URL: https://issues.apache.org/jira/browse/BEAM-6234 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Daniel Harper >Assignee: Daniel Harper >Priority: Trivial > Fix For: 2.8.0 > > > The configuration setting {{failOnCheckpointingErrors}} [1] is available in > Flink to allow engineers to specify whether the job should fail in the event > of checkpoint failure. > This should be exposed in {{FlinkPipelineOptions}} and > {{FlinkExecutionEnvironments}} to allow users to configure this. > The default for this value in Flink is true [2] > [1] > https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L249 > [2] > https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L73 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6234) [Flink Runner] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions
Daniel Harper created BEAM-6234: --- Summary: [Flink Runner] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions Key: BEAM-6234 URL: https://issues.apache.org/jira/browse/BEAM-6234 Project: Beam Issue Type: Improvement Components: runner-flink Reporter: Daniel Harper Assignee: Daniel Harper Fix For: 2.8.0 The configuration setting {{minPauseBetweenCheckpoints}} [1] is available in Flink to allow a grace period when checkpoints runtime is > checkpoint interval. This should be exposed in {{FlinkPipelineOptions}} and {{FlinkExecutionEnvironments}} to allow users to configure this. The default for this value in Flink is 0ms [2] [1] https://ci.apache.org/projects/flink/flink-docs-release-1.5/api/java/org/apache/flink/streaming/api/environment/CheckpointConfig.html#setMinPauseBetweenCheckpoints-long- [2] https://ci.apache.org/projects/flink/flink-docs-release-1.5/api/java/constant-values.html#org.apache.flink.streaming.api.environment.CheckpointConfig.DEFAULT_MIN_PAUSE_BETWEEN_CHECKPOINTS -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6229) BigQuery returns value error while retrieving load test metrics
[ https://issues.apache.org/jira/browse/BEAM-6229?focusedWorklogId=175374=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175374 ] ASF GitHub Bot logged work on BEAM-6229: Author: ASF GitHub Bot Created on: 14/Dec/18 15:24 Start Date: 14/Dec/18 15:24 Worklog Time Spent: 10m Work Description: lgajowy closed pull request #7283: [BEAM-6229] Fix LoadTestResult to store propoer timestamp and runtime URL: https://github.com/apache/beam/pull/7283 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/ConsoleResultPublisher.java b/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/ConsoleResultPublisher.java index c9eba52ca86a..789ae51d1617 100644 --- a/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/ConsoleResultPublisher.java +++ b/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/ConsoleResultPublisher.java @@ -22,6 +22,6 @@ static void publish(LoadTestResult result) { System.out.println(String.format("Total bytes: %s", result.getTotalBytesCount())); -System.out.println(String.format("Total time (millis): %s", result.getRuntime())); +System.out.println(String.format("Total time (sec): %s", result.getRuntime())); } } diff --git a/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/LoadTestResult.java b/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/LoadTestResult.java index b09ce882617f..705d14eccd41 100644 --- a/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/LoadTestResult.java +++ b/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/LoadTestResult.java @@ -39,12 +39,12 @@ private LoadTestResult(Long timestamp, Long runtime, Long totalBytesCount) { } /** Constructs {@link LoadTestResult} from {@link PipelineResult}. */ - static LoadTestResult create(PipelineResult result, String namespace, long now) { + static LoadTestResult create(PipelineResult result, String namespace, long nowInMillis) { MetricsReader reader = new MetricsReader(result, namespace); return new LoadTestResult( -now, -reader.getEndTimeMetric("runtime") - reader.getStartTimeMetric("runtime"), +nowInMillis / 1000, +(reader.getEndTimeMetric("runtime") - reader.getStartTimeMetric("runtime")) / 1000, reader.getCounterMetric("totalBytes.count")); } @@ -61,7 +61,7 @@ public Long getTotalBytesCount() { return ImmutableMap.builder() .put("timestamp", timestamp) .put("runtime", runtime) -.put("totalBytesCount", totalBytesCount) +.put("total_bytes_count", totalBytesCount) .build(); } } This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175374) Time Spent: 0.5h (was: 20m) > BigQuery returns value error while retrieving load test metrics > --- > > Key: BEAM-6229 > URL: https://issues.apache.org/jira/browse/BEAM-6229 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Kasia Kucharczyk >Assignee: Lukasz Gajowy >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > GroupByKeyLoadTest run on Dataflow saves the timestamp metric is saved in the > wrong format: > {code} > Cannot return an invalid timestamp value of 154472066651564 microseconds > relative to the Unix epoch. The range of valid timestamp values is [0001-01-1 > 00:00:00, -12-31 23:59:59.99]; error in writing field timestamp > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5985) Create jenkins jobs to run the load tests for Java SDK
[ https://issues.apache.org/jira/browse/BEAM-5985?focusedWorklogId=175376=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175376 ] ASF GitHub Bot logged work on BEAM-5985: Author: ASF GitHub Bot Created on: 14/Dec/18 15:28 Start Date: 14/Dec/18 15:28 Worklog Time Spent: 10m Work Description: lgajowy commented on issue #7184: [BEAM-5985] Create jenkins jobs to run the load tests for Java SDK URL: https://github.com/apache/beam/pull/7184#issuecomment-447358869 @kkucharc please rebase before running the tests again - I provided fixes for BigQuery publishing code (https://github.com/apache/beam/pull/7283). Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175376) Time Spent: 7h 50m (was: 7h 40m) > Create jenkins jobs to run the load tests for Java SDK > -- > > Key: BEAM-5985 > URL: https://issues.apache.org/jira/browse/BEAM-5985 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Lukasz Gajowy >Assignee: Kasia Kucharczyk >Priority: Major > Time Spent: 7h 50m > Remaining Estimate: 0h > > How/how often/in what cases we run those tests is yet to be decided (this is > part of the task) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-3949) IOIT's setup() and teardown() db connection attempt sometimes fail resulting in test flakiness
[ https://issues.apache.org/jira/browse/BEAM-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukasz Gajowy closed BEAM-3949. --- Resolution: Fixed Fix Version/s: 2.6.0 > IOIT's setup() and teardown() db connection attempt sometimes fail resulting > in test flakiness > -- > > Key: BEAM-3949 > URL: https://issues.apache.org/jira/browse/BEAM-3949 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Lukasz Gajowy >Assignee: Kasia Kucharczyk >Priority: Major > Fix For: 2.6.0 > > Time Spent: 10.5h > Remaining Estimate: 0h > > setup() and teardown() methods sometimes have trouble connecting database in > Performance tests. It results in test flakiness. > Example logs: > [https://builds.apache.org/view/A-D/view/Beam/job/beam_PerformanceTests_HadoopInputFormat/65/console] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-4691) Rename (and reorganize?) jenkins jobs
[ https://issues.apache.org/jira/browse/BEAM-4691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukasz Gajowy resolved BEAM-4691. - Resolution: Fixed Fix Version/s: Not applicable Done only renaming. Reorganizing to different directories does not make much sense if we cannot import common .groovy files and have structure only 1 level deep. See this discussion for more context: [https://github.com/apache/beam/pull/5831] > Rename (and reorganize?) jenkins jobs > - > > Key: BEAM-4691 > URL: https://issues.apache.org/jira/browse/BEAM-4691 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Minor > Fix For: Not applicable > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Link to discussion: > [https://lists.apache.org/thread.html/ebe220ec1cebc73c8fb7190cf115fb9b23165fdbf950d58e05db544d@%3Cdev.beam.apache.org%3E] > Since jobs are Groovy files their names should be CamelCase. We could also > place them in subdirectories instead of prefixing job names. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6229) BigQuery returns value error while retrieving load test metrics
[ https://issues.apache.org/jira/browse/BEAM-6229?focusedWorklogId=175373=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175373 ] ASF GitHub Bot logged work on BEAM-6229: Author: ASF GitHub Bot Created on: 14/Dec/18 15:24 Start Date: 14/Dec/18 15:24 Worklog Time Spent: 10m Work Description: lgajowy commented on issue #7283: [BEAM-6229] Fix LoadTestResult to store propoer timestamp and runtime URL: https://github.com/apache/beam/pull/7283#issuecomment-447357639 Thanks! Merging This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175373) Time Spent: 20m (was: 10m) > BigQuery returns value error while retrieving load test metrics > --- > > Key: BEAM-6229 > URL: https://issues.apache.org/jira/browse/BEAM-6229 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Kasia Kucharczyk >Assignee: Lukasz Gajowy >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > GroupByKeyLoadTest run on Dataflow saves the timestamp metric is saved in the > wrong format: > {code} > Cannot return an invalid timestamp value of 154472066651564 microseconds > relative to the Unix epoch. The range of valid timestamp values is [0001-01-1 > 00:00:00, -12-31 23:59:59.99]; error in writing field timestamp > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3211) Add an integration test for TextIO ReadAll transform and dynamic writes
[ https://issues.apache.org/jira/browse/BEAM-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721516#comment-16721516 ] Lukasz Gajowy commented on BEAM-3211: - is this still needed or should we close this ticket? We have TextIO running for ~15Mb > Add an integration test for TextIO ReadAll transform and dynamic writes > --- > > Key: BEAM-3211 > URL: https://issues.apache.org/jira/browse/BEAM-3211 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Lukasz Gajowy >Priority: Major > > We should add a small scale version of performance test available in > following file run as a part of 'beam_PostCommit_Java_MavenInstall' and > 'beam_PostCommit_Java_ValidatesRunner*' Jenkins test suites. > https://github.com/apache/beam/blob/master/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-981) Not possible to directly submit a pipeline on spark cluster
[ https://issues.apache.org/jira/browse/BEAM-981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukasz Gajowy reassigned BEAM-981: -- Assignee: Amit Sela (was: Lukasz Gajowy) > Not possible to directly submit a pipeline on spark cluster > --- > > Key: BEAM-981 > URL: https://issues.apache.org/jira/browse/BEAM-981 > Project: Beam > Issue Type: Bug > Components: runner-spark >Affects Versions: 0.6.0 >Reporter: Jean-Baptiste Onofré >Assignee: Amit Sela >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > It's not possible to directly run a pipeline on the spark runner (for > instance using {{mvn exec:java}}. It fails with: > {code} > [appclient-register-master-threadpool-0] INFO > org.apache.spark.deploy.client.AppClient$ClientEndpoint - Connecting to > master spark://10.200.118.197:7077... > [shuffle-client-0] ERROR org.apache.spark.network.client.TransportClient - > Failed to send RPC 6813731522650020739 to /10.200.118.197:7077: > java.lang.AbstractMethodError: > org.apache.spark.network.protocol.MessageWithHeader.touch(Ljava/lang/Object;)Lio/netty/util/ReferenceCounted; > java.lang.AbstractMethodError: > org.apache.spark.network.protocol.MessageWithHeader.touch(Ljava/lang/Object;)Lio/netty/util/ReferenceCounted; > at io.netty.util.ReferenceCountUtil.touch(ReferenceCountUtil.java:73) > at > io.netty.channel.DefaultChannelPipeline.touch(DefaultChannelPipeline.java:107) > at > io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:820) > at > io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:733) > at > io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:111) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:748) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:740) > at > io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:826) > at > io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:733) > at > io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:284) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:748) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:740) > at > io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:38) > at > io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1101) > at > io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1148) > at > io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1090) > at > io.netty.util.concurrent.SingleThreadEventExecutor.safeExecute(SingleThreadEventExecutor.java:451) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:418) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:401) > at > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:877) > at java.lang.Thread.run(Thread.java:745) > [appclient-register-master-threadpool-0] WARN > org.apache.spark.deploy.client.AppClient$ClientEndpoint - Failed to connect > to master 10.200.118.197:7077 > java.io.IOException: Failed to send RPC 6813731522650020739 to > /10.200.118.197:7077: java.lang.AbstractMethodError: > org.apache.spark.network.protocol.MessageWithHeader.touch(Ljava/lang/Object;)Lio/netty/util/ReferenceCounted; > at > org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) > at > org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) > at > io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:514) > at > io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:507) > at > io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:486) > at > io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:427) > at > io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:129) > at > io.netty.channel.AbstractChannelHandlerContext.notifyOutboundHandlerException(AbstractChannelHandlerContext.java:845) > at >
[jira] [Resolved] (BEAM-3747) Performance tests flaky due to database connection problems
[ https://issues.apache.org/jira/browse/BEAM-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukasz Gajowy resolved BEAM-3747. - Resolution: Fixed Fix Version/s: Not applicable Not flaky any more thanks to solutions provided in subtasks > Performance tests flaky due to database connection problems > --- > > Key: BEAM-3747 > URL: https://issues.apache.org/jira/browse/BEAM-3747 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Lukasz Gajowy >Priority: Major > Fix For: Not applicable > > > [https://builds.apache.org/view/A-D/view/Beam/job/beam_PerformanceTests_JDBC/] > Latest failure is > [https://builds.apache.org/view/A-D/view/Beam/job/beam_PerformanceTests_JDBC/262/] > [ERROR] org.apache.beam.sdk.io.jdbc.JdbcIOIT Time elapsed: 0 s <<< ERROR! > org.postgresql.util.PSQLException: The connection attempt failed. > Łukasz can you take a look ? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3216) Add an integration test for HadoopInputFormatIO Read transform
[ https://issues.apache.org/jira/browse/BEAM-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721518#comment-16721518 ] Lukasz Gajowy commented on BEAM-3216: - is this ticket still valid? > Add an integration test for HadoopInputFormatIO Read transform > -- > > Key: BEAM-3216 > URL: https://issues.apache.org/jira/browse/BEAM-3216 > Project: Beam > Issue Type: Test > Components: io-java-hadoop >Reporter: Chamikara Jayalath >Assignee: Lukasz Gajowy >Priority: Major > > We should add small scale integration tests for HadoopInputFormatIO that can > be run as a part of 'beam_PostCommit_Java_MavenInstall' and > 'beam_PostCommit_Java_ValidatesRunner*' Jenkins test suites. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-1603) Enable programmatic execution of spark pipelines.
[ https://issues.apache.org/jira/browse/BEAM-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukasz Gajowy closed BEAM-1603. --- Resolution: Fixed Fix Version/s: Not applicable This is possible since https://issues.apache.org/jira/browse/BEAM-3371 was done > Enable programmatic execution of spark pipelines. > - > > Key: BEAM-1603 > URL: https://issues.apache.org/jira/browse/BEAM-1603 > Project: Beam > Issue Type: Bug > Components: runner-spark, testing >Reporter: Jason Kuster >Assignee: Lukasz Gajowy >Priority: Major > Fix For: Not applicable > > > In order to enable execution of Spark Integration Tests against a cluster, it > is necessary to have the ability to execute Spark pipelines via maven, rather > than spark-submit. The minimum necessary is to enable this in the > TestSparkRunner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-5037) HashFunction is not intialized in SyntheticOptions
[ https://issues.apache.org/jira/browse/BEAM-5037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukasz Gajowy closed BEAM-5037. --- Resolution: Fixed Fix Version/s: 2.7.0 > HashFunction is not intialized in SyntheticOptions > -- > > Key: BEAM-5037 > URL: https://issues.apache.org/jira/browse/BEAM-5037 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Major > Fix For: 2.7.0 > > Time Spent: 1h > Remaining Estimate: 0h > > This is due to fact that the field is transient hence not getting serialized > and then initialized again after deserialization. We need some other way of > initializing it, immune to the field's transiency. > Stacktrace: > {code:java} > Class org.apache.beam.sdk.io.synthetic.GroupByKeyLoadIT > all > org.apache.beam.sdk.io.synthetic > GroupByKeyLoadIT > 1 > tests > 1 > failures > 0 > ignored > 0.050s > duration > 0% > successful > Failed tests > Tests > Standard error > groupByKeyLoadTest > java.lang.IllegalArgumentException: hashFunction hasn't been initialized. > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:122) > at > org.apache.beam.sdk.io.synthetic.SyntheticOptions.validate(SyntheticOptions.java:301) > at > org.apache.beam.sdk.io.synthetic.SyntheticBoundedIO$SyntheticSourceOptions.validate(SyntheticBoundedIO.java:285) > at > org.apache.beam.sdk.io.synthetic.SyntheticBoundedIO$SyntheticBoundedSource.validate(SyntheticBoundedIO.java:119) > at org.apache.beam.sdk.io.Read$Bounded.expand(Read.java:95) > at org.apache.beam.sdk.io.Read$Bounded.expand(Read.java:85) > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537) > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:471) > at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:44) > at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:167) > at > org.apache.beam.sdk.io.synthetic.GroupByKeyLoadIT.groupByKeyLoadTest(GroupByKeyLoadIT.java:81) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at >
[jira] [Resolved] (BEAM-6076) NEXMark flakiness: NPE thrown from BigQuery client library once in a while
[ https://issues.apache.org/jira/browse/BEAM-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukasz Gajowy resolved BEAM-6076. - Resolution: Fixed Fix Version/s: 2.9.0 > NEXMark flakiness: NPE thrown from BigQuery client library once in a while > -- > > Key: BEAM-6076 > URL: https://issues.apache.org/jira/browse/BEAM-6076 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Major > Fix For: 2.9.0 > > Time Spent: 1h > Remaining Estimate: 0h > > It seems that once in a while the library that is used to connect to BigQuery > throws exceptions like this: > {code:java} > Exception in thread "main" > java.lang.NullPointerException > at > com.google.cloud.bigquery.StandardTableDefinition$StreamingBuffer.fromPb(StandardTableDefinition.java:116) > at > com.google.cloud.bigquery.StandardTableDefinition.fromPb(StandardTableDefinition.java:225) > at com.google.cloud.bigquery.TableDefinition.fromPb(TableDefinition.java:155) > at com.google.cloud.bigquery.TableInfo$BuilderImpl.(TableInfo.java:183) > at com.google.cloud.bigquery.Table.fromPb(Table.java:593) > at com.google.cloud.bigquery.BigQueryImpl.getTable(BigQueryImpl.java:410) > at > org.apache.beam.sdk.testutils.publishing.BigQueryClient.createTableIfNotExists(BigQueryClient.java:74) > at org.apache.beam.sdk.nexmark.Main.savePerfsToBigQuery(Main.java:184) > at org.apache.beam.sdk.nexmark.Main.runAll(Main.java:148) > at org.apache.beam.sdk.nexmark.Main.runAll(Main.java:98) > at org.apache.beam.sdk.nexmark.Main.main(Main.java:423){code} > Similar error found in the network: > [https://github.com/googleapis/google-cloud-java/issues/1689] > +Example logs:+ > > [https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_Nexmark_Spark/1085/console] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5723) CassandraIO is broken because of use of bad relocation of guava
[ https://issues.apache.org/jira/browse/BEAM-5723?focusedWorklogId=175359=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175359 ] ASF GitHub Bot logged work on BEAM-5723: Author: ASF GitHub Bot Created on: 14/Dec/18 15:00 Start Date: 14/Dec/18 15:00 Worklog Time Spent: 10m Work Description: iemejia commented on a change in pull request #7237: [BEAM-5723] Changed shadow plugin configuration to avoid relocating g… URL: https://github.com/apache/beam/pull/7237#discussion_r241782155 ## File path: sdks/java/io/cassandra/build.gradle ## @@ -17,7 +17,24 @@ */ apply plugin: org.apache.beam.gradle.BeamModulePlugin -applyJavaNature() +applyJavaNature(shadowClosure: { Review comment: This seems like the most restricted (and I would say proper way), my doubt is if there are some some other uses of Cassandra's guava that we could be missing and that can bite us later. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175359) Time Spent: 2.5h (was: 2h 20m) > CassandraIO is broken because of use of bad relocation of guava > --- > > Key: BEAM-5723 > URL: https://issues.apache.org/jira/browse/BEAM-5723 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra >Affects Versions: 2.5.0, 2.6.0, 2.7.0 >Reporter: Arun sethia >Assignee: João Cabrita >Priority: Major > Fix For: 2.10.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > While using apache beam to run dataflow job to read data from BigQuery and > Store/Write to Cassandra with following libaries: > # beam-sdks-java-io-cassandra - 2.6.0 > # beam-sdks-java-io-jdbc - 2.6.0 > # beam-sdks-java-io-google-cloud-platform - 2.6.0 > # beam-sdks-java-core - 2.6.0 > # google-cloud-dataflow-java-sdk-all - 2.5.0 > # google-api-client -1.25.0 > > I am getting following error at the time insert/save data to Cassandra. > {code:java} > [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332) > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work started] (BEAM-6229) BigQuery returns value error while retrieving load test metrics
[ https://issues.apache.org/jira/browse/BEAM-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-6229 started by Lukasz Gajowy. --- > BigQuery returns value error while retrieving load test metrics > --- > > Key: BEAM-6229 > URL: https://issues.apache.org/jira/browse/BEAM-6229 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Kasia Kucharczyk >Assignee: Lukasz Gajowy >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > GroupByKeyLoadTest run on Dataflow saves the timestamp metric is saved in the > wrong format: > {code} > Cannot return an invalid timestamp value of 154472066651564 microseconds > relative to the Unix epoch. The range of valid timestamp values is [0001-01-1 > 00:00:00, -12-31 23:59:59.99]; error in writing field timestamp > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5723) CassandraIO is broken because of use of bad relocation of guava
[ https://issues.apache.org/jira/browse/BEAM-5723?focusedWorklogId=175357=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175357 ] ASF GitHub Bot logged work on BEAM-5723: Author: ASF GitHub Bot Created on: 14/Dec/18 14:55 Start Date: 14/Dec/18 14:55 Worklog Time Spent: 10m Work Description: kewne commented on a change in pull request #7237: [BEAM-5723] Changed shadow plugin configuration to avoid relocating g… URL: https://github.com/apache/beam/pull/7237#discussion_r241780606 ## File path: sdks/java/io/cassandra/build.gradle ## @@ -17,7 +17,24 @@ */ apply plugin: org.apache.beam.gradle.BeamModulePlugin -applyJavaNature() +applyJavaNature(shadowClosure: { Review comment: I've actually gone in the opposite direction and replaced this with a blanket "exclude everything", since guava is now provided by the beam vendored version (except where it needs to use `ListenableFuture` from the driver, which comes from regular guava). This seems acceptable to me because the default rule is "include guava" (and exclude everything else) but let me know if you disagree. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175357) Time Spent: 2h 20m (was: 2h 10m) > CassandraIO is broken because of use of bad relocation of guava > --- > > Key: BEAM-5723 > URL: https://issues.apache.org/jira/browse/BEAM-5723 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra >Affects Versions: 2.5.0, 2.6.0, 2.7.0 >Reporter: Arun sethia >Assignee: João Cabrita >Priority: Major > Fix For: 2.10.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > While using apache beam to run dataflow job to read data from BigQuery and > Store/Write to Cassandra with following libaries: > # beam-sdks-java-io-cassandra - 2.6.0 > # beam-sdks-java-io-jdbc - 2.6.0 > # beam-sdks-java-io-google-cloud-platform - 2.6.0 > # beam-sdks-java-core - 2.6.0 > # google-cloud-dataflow-java-sdk-all - 2.5.0 > # google-api-client -1.25.0 > > I am getting following error at the time insert/save data to Cassandra. > {code:java} > [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332) > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6229) BigQuery returns value error while retrieving load test metrics
[ https://issues.apache.org/jira/browse/BEAM-6229?focusedWorklogId=175356=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175356 ] ASF GitHub Bot logged work on BEAM-6229: Author: ASF GitHub Bot Created on: 14/Dec/18 14:47 Start Date: 14/Dec/18 14:47 Worklog Time Spent: 10m Work Description: lgajowy opened a new pull request #7283: [BEAM-6229] Fix LoadTestResult to store propoer timestamp and runtime URL: https://github.com/apache/beam/pull/7283 Bugfixing and changing total runtime to seconds to be concise with python suites (and we don't need greater resolution here) R: @kkucharc Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175356) Time Spent: 10m Remaining Estimate: 0h > BigQuery returns value
[jira] [Closed] (BEAM-6187) Drop Scala suffix of FlinkRunner artifacts
[ https://issues.apache.org/jira/browse/BEAM-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed BEAM-6187. Resolution: Fixed The suffix has been dropped as part of upgrading to 1.6 with BEAM-5267. The suffix will be removed from the 1.5 version as soon as support for it is dropped. > Drop Scala suffix of FlinkRunner artifacts > -- > > Key: BEAM-6187 > URL: https://issues.apache.org/jira/browse/BEAM-6187 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.10.0 > > > With BEAM-5419 we will build multiple versions of the Flink Runner against > different Flink versions. The new artifacts will lead to confusing names like > {{beam-runners-flink1.5_2.11}}. I think it is time to drop the Scala suffix > and just build against the most stable Flink Scala version. > Projects like Scio have the option to cross-compile to different Scala > versions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6227) FlinkRunner errors if GroupByKey contains null values (streaming mode only)
[ https://issues.apache.org/jira/browse/BEAM-6227?focusedWorklogId=175352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175352 ] ASF GitHub Bot logged work on BEAM-6227: Author: ASF GitHub Bot Created on: 14/Dec/18 14:32 Start Date: 14/Dec/18 14:32 Worklog Time Spent: 10m Work Description: mxm commented on issue #7282: [BEAM-6227] Fix GroupByKey with null values in Flink Runner URL: https://github.com/apache/beam/pull/7282#issuecomment-447342445 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 175352) Time Spent: 20m (was: 10m) > FlinkRunner errors if GroupByKey contains null values (streaming mode only) > --- > > Key: BEAM-6227 > URL: https://issues.apache.org/jira/browse/BEAM-6227 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.9.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.10.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Apparently this passed ValidatesRunner in streaming mode although this is a > quite common operation: > {noformat} > FlinkPipelineOptions options = > PipelineOptionsFactory.as(FlinkPipelineOptions.class); > options.setRunner(FlinkRunner.class); > // force streaming mode > options.setStreaming(true); > Pipeline pipeline = Pipeline.create(options); > pipeline > .apply(GenerateSequence.from(0).to(100)) > .apply(Window.into(FixedWindows.of(Duration.millis(10 > .apply(ParDo.of( > new DoFn>() { > @ProcessElement > public void processElement(ProcessContext pc) { > pc.output(KV.of("hello", null)); > } > } > )) > .apply(GroupByKey.create()); > pipeline.run(); > {noformat} > Throws: > {noformat} > Caused by: java.lang.RuntimeException: Error adding to bag state. > at > org.apache.beam.runners.flink.translation.wrappers.streaming.state.FlinkStateInternals$FlinkBagState.add(FlinkStateInternals.java:299) > at > org.apache.beam.runners.core.SystemReduceFn.processValue(SystemReduceFn.java:115) > at > org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:608) > at > org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349) > at > org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136) > Caused by: java.lang.NullPointerException: You cannot add null to a ListState. > at > org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:75) > at > org.apache.flink.runtime.state.heap.HeapListState.add(HeapListState.java:89) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.state.FlinkStateInternals$FlinkBagState.add(FlinkStateInternals.java:297) > at > org.apache.beam.runners.core.SystemReduceFn.processValue(SystemReduceFn.java:115) > at > org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:608) > at > org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349) > at > org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136) > at > org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn$DoFnInvoker.invokeProcessElement(Unknown > Source) > at > org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:275) > at > org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:240) > at > org.apache.beam.runners.core.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:80) > at > org.apache.beam.runners.flink.metrics.DoFnRunnerWithMetricsUpdate.processElement(DoFnRunnerWithMetricsUpdate.java:63) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.processElement(DoFnOperator.java:460) > at > org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:104) > at >
[jira] [Commented] (BEAM-3472) Create a callback triggered at the end of a batch in flink runner
[ https://issues.apache.org/jira/browse/BEAM-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721464#comment-16721464 ] Maximilian Michels commented on BEAM-3472: -- It this still relevant? > Create a callback triggered at the end of a batch in flink runner > - > > Key: BEAM-3472 > URL: https://issues.apache.org/jira/browse/BEAM-3472 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Etienne Chauchot >Priority: Major > > In the future we might add new features to the runners for which we might > need to do some processing at the end of a batch. Currently there is not > unique place (a callback) to add this processing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6218) TensorFlow Model Analysis Fails when using the portable Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721459#comment-16721459 ] Maximilian Michels commented on BEAM-6218: -- This looks like an error from the Python SDK harness. > TensorFlow Model Analysis Fails when using the portable Flink runner > > > Key: BEAM-6218 > URL: https://issues.apache.org/jira/browse/BEAM-6218 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-py-harness >Reporter: Andrew Packer >Priority: Major > > Running a simple model analysis pipeline, trying to use the portable flink > runner running against a local cluster: > {code:python} > import apache_beam as beam > import tensorflow_model_analysis as tfma > from apache_beam.options.pipeline_options import PipelineOptions > from apache_beam.runners.portability import portable_runner > def pipeline(root): > data_location = './dataset/' > data = root | 'ReadData' >> beam.io.ReadFromTFRecord(data_location) > results = data | 'ExtractEvaluateAndWriteResults' >> > tfma.EvaluateAndWriteResults( > eval_saved_model_path='./model/15427633886/', > output_path='./output/', > display_only_data_location=data_location) > def run(argv=None): > runner = portable_runner.PortableRunner() > pipeline_options = > PipelineOptions(experiments=['beam_fn_api'],sdk_location='container',job_endpoint='localhost:8099',setup_file='./setup.py') > runner.run(pipeline, pipeline_options) > if __name__ == '__main__': > run() > {code} > Versions: > Apache Beam 2.8.0 > TensorFlow Model Analysis: 0.9.2 > Apache Flink: 1.5.3 > > Stack Trace: > {code} > [flink-runner-job-server] ERROR > org.apache.beam.runners.flink.FlinkJobInvocation - Error during job > invocation > BeamApp-apacker-1212082216-2dd571ba_359d85b7-4e08-49f3-bdc7-34cdb0e779bf. > org.apache.flink.client.program.ProgramInvocationException: Job > 22e7e9d229977f3f0518c37f507f5e07 failed. > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:265) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:452) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427) > at > org.apache.flink.client.RemoteExecutor.executePlanWithJars(RemoteExecutor.java:216) > at > org.apache.flink.client.RemoteExecutor.executePlan(RemoteExecutor.java:193) > at > org.apache.flink.api.java.RemoteEnvironment.execute(RemoteEnvironment.java:173) > at > org.apache.beam.runners.flink.FlinkJobInvocation.runPipeline(FlinkJobInvocation.java:121) > at > org.apache.beam.repackaged.beam_runners_flink_2.11.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111) > at > org.apache.beam.repackaged.beam_runners_flink_2.11.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58) > at > org.apache.beam.repackaged.beam_runners_flink_2.11.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > completed with illegal application status: UNKNOWN. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:150) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > ... 13 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error received from SDK harness for instruction > 22: Traceback (most recent call last): > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 131, in _execute > response = task() > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 166, in > self._execute(lambda: worker.do_instruction(work), work) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 212, in do_instruction > request.instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 231, in process_bundle > self.data_channel_factory) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 343, in __init__ >
[jira] [Updated] (BEAM-6218) TensorFlow Model Analysis Fails when using the portable Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels updated BEAM-6218: - Component/s: sdk-py-harness > TensorFlow Model Analysis Fails when using the portable Flink runner > > > Key: BEAM-6218 > URL: https://issues.apache.org/jira/browse/BEAM-6218 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-py-harness >Reporter: Andrew Packer >Priority: Major > > Running a simple model analysis pipeline, trying to use the portable flink > runner running against a local cluster: > {code:python} > import apache_beam as beam > import tensorflow_model_analysis as tfma > from apache_beam.options.pipeline_options import PipelineOptions > from apache_beam.runners.portability import portable_runner > def pipeline(root): > data_location = './dataset/' > data = root | 'ReadData' >> beam.io.ReadFromTFRecord(data_location) > results = data | 'ExtractEvaluateAndWriteResults' >> > tfma.EvaluateAndWriteResults( > eval_saved_model_path='./model/15427633886/', > output_path='./output/', > display_only_data_location=data_location) > def run(argv=None): > runner = portable_runner.PortableRunner() > pipeline_options = > PipelineOptions(experiments=['beam_fn_api'],sdk_location='container',job_endpoint='localhost:8099',setup_file='./setup.py') > runner.run(pipeline, pipeline_options) > if __name__ == '__main__': > run() > {code} > Versions: > Apache Beam 2.8.0 > TensorFlow Model Analysis: 0.9.2 > Apache Flink: 1.5.3 > > Stack Trace: > {code} > [flink-runner-job-server] ERROR > org.apache.beam.runners.flink.FlinkJobInvocation - Error during job > invocation > BeamApp-apacker-1212082216-2dd571ba_359d85b7-4e08-49f3-bdc7-34cdb0e779bf. > org.apache.flink.client.program.ProgramInvocationException: Job > 22e7e9d229977f3f0518c37f507f5e07 failed. > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:265) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:452) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427) > at > org.apache.flink.client.RemoteExecutor.executePlanWithJars(RemoteExecutor.java:216) > at > org.apache.flink.client.RemoteExecutor.executePlan(RemoteExecutor.java:193) > at > org.apache.flink.api.java.RemoteEnvironment.execute(RemoteEnvironment.java:173) > at > org.apache.beam.runners.flink.FlinkJobInvocation.runPipeline(FlinkJobInvocation.java:121) > at > org.apache.beam.repackaged.beam_runners_flink_2.11.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111) > at > org.apache.beam.repackaged.beam_runners_flink_2.11.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58) > at > org.apache.beam.repackaged.beam_runners_flink_2.11.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > completed with illegal application status: UNKNOWN. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:150) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > ... 13 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error received from SDK harness for instruction > 22: Traceback (most recent call last): > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 131, in _execute > response = task() > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 166, in > self._execute(lambda: worker.do_instruction(work), work) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 212, in do_instruction > request.instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 231, in process_bundle > self.data_channel_factory) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 343, in __init__ > self.ops =