date:20181214

[jira] [Work logged] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6240?focusedWorklogId=175678=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175678
 ]

ASF GitHub Bot logged work on BEAM-6240:


Author: ASF GitHub Bot
Created on: 15/Dec/18 06:39
Start Date: 15/Dec/18 06:39
Worklog Time Spent: 10m 
  Work Description: reuvenlax opened a new pull request #7289: [BEAM-6240] 
Add a library of schema annotations for POJO and JavaBeans
URL: https://github.com/apache/beam/pull/7289
 
 
   Annotations added:
   
   @SchemaIgnore - Skip a field
   @FieldName - Specify the name of a field (instead of just using the Java 
field name)
   @SchemaCreate - Specify a constructor or static creator for a user type. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175678)
Time Spent: 10m
Remaining Estimate: 0h

> Allow users to annotate POJOs and JavaBeans for richer functionality
> 
>
> Key: BEAM-6240
> URL: https://issues.apache.org/jira/browse/BEAM-6240
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Desired annotations:
>   * SchemaIgnore - ignore this field
>   * FieldName - allow the user to explicitly specify a field name
>   * SchemaCreate - register a function to be used to create an object (so 
> fields can be final, and no default constructor need be assumed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality

2018-12-14 Thread Reuven Lax (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuven Lax reassigned BEAM-6240:


Assignee: Reuven Lax  (was: Kenneth Knowles)

> Allow users to annotate POJOs and JavaBeans for richer functionality
> 
>
> Key: BEAM-6240
> URL: https://issues.apache.org/jira/browse/BEAM-6240
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>
> Desired annotations:
>   * SchemaIgnore - ignore this field
>   * FieldName - allow the user to explicitly specify a field name
>   * SchemaCreate - register a function to be used to create an object (so 
> fields can be final, and no default constructor need be assumed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (BEAM-6240) Allow users to annotate POJOs and JavaBeans for richer functionality

2018-12-14 Thread Reuven Lax (JIRA)

Reuven Lax created BEAM-6240:


 Summary: Allow users to annotate POJOs and JavaBeans for richer 
functionality
 Key: BEAM-6240
 URL: https://issues.apache.org/jira/browse/BEAM-6240
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-java-core
Reporter: Reuven Lax
Assignee: Kenneth Knowles


Desired annotations:

  * SchemaIgnore - ignore this field

  * FieldName - allow the user to explicitly specify a field name

  * SchemaCreate - register a function to be used to create an object (so 
fields can be final, and no default constructor need be assumed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6197) Log time for Dataflow GCS upload of staged files

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6197?focusedWorklogId=175675=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175675
 ]

ASF GitHub Bot logged work on BEAM-6197:


Author: ASF GitHub Bot
Created on: 15/Dec/18 05:30
Start Date: 15/Dec/18 05:30
Worklog Time Spent: 10m 
  Work Description: alanmyrvold removed a comment on issue #7235: 
[BEAM-6197] Log time for Dataflow GCS upload of staged files + add a test 
program
URL: https://github.com/apache/beam/pull/7235#issuecomment-447515127
 
 
   Run Portable_Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175675)
Time Spent: 2h  (was: 1h 50m)

> Log time for Dataflow GCS upload of staged files
> 
>
> Key: BEAM-6197
> URL: https://issues.apache.org/jira/browse/BEAM-6197
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Would be nice to collect timing in the logs of Dataflow GCS upload of staged 
> files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6197) Log time for Dataflow GCS upload of staged files

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6197?focusedWorklogId=175674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175674
 ]

ASF GitHub Bot logged work on BEAM-6197:


Author: ASF GitHub Bot
Created on: 15/Dec/18 05:30
Start Date: 15/Dec/18 05:30
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on issue #7235: [BEAM-6197] Log 
time for Dataflow GCS upload of staged files + add a test program
URL: https://github.com/apache/beam/pull/7235#issuecomment-447539110
 
 
   Run Portable_Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175674)
Time Spent: 1h 50m  (was: 1h 40m)

> Log time for Dataflow GCS upload of staged files
> 
>
> Key: BEAM-6197
> URL: https://issues.apache.org/jira/browse/BEAM-6197
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Would be nice to collect timing in the logs of Dataflow GCS upload of staged 
> files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6239) Nexmark benchmark for raw sessionization then stream enrichment

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6239?focusedWorklogId=175672=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175672
 ]

ASF GitHub Bot logged work on BEAM-6239:


Author: ASF GitHub Bot
Created on: 15/Dec/18 04:42
Start Date: 15/Dec/18 04:42
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #7287: [BEAM-6239] Add 
session side input join to Nexmark
URL: https://github.com/apache/beam/pull/7287#issuecomment-447537032
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175672)
Time Spent: 40m  (was: 0.5h)

> Nexmark benchmark for raw sessionization then stream enrichment
> ---
>
> Key: BEAM-6239
> URL: https://issues.apache.org/jira/browse/BEAM-6239
> Project: Beam
>  Issue Type: New Feature
>  Components: examples-nexmark
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We have BOUNDED_SIDE_INPUT_JOIN that just enriches a stream. Another use case 
> is to sessionize first. I am curious about the different in perf, and how 
> this plays out in SQL.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6239) Nexmark benchmark for raw sessionization then stream enrichment

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6239?focusedWorklogId=175659=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175659
 ]

ASF GitHub Bot logged work on BEAM-6239:


Author: ASF GitHub Bot
Created on: 15/Dec/18 02:14
Start Date: 15/Dec/18 02:14
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #7287: [BEAM-6239] Add 
session side input join to Nexmark
URL: https://github.com/apache/beam/pull/7287#issuecomment-447529180
 
 
   R: @akedin 
   
   @echauchot I know you are a bit busy but tagging you so you see this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175659)
Time Spent: 0.5h  (was: 20m)

> Nexmark benchmark for raw sessionization then stream enrichment
> ---
>
> Key: BEAM-6239
> URL: https://issues.apache.org/jira/browse/BEAM-6239
> Project: Beam
>  Issue Type: New Feature
>  Components: examples-nexmark
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We have BOUNDED_SIDE_INPUT_JOIN that just enriches a stream. Another use case 
> is to sessionize first. I am curious about the different in perf, and how 
> this plays out in SQL.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6239) Nexmark benchmark for raw sessionization then stream enrichment

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6239?focusedWorklogId=175657=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175657
 ]

ASF GitHub Bot logged work on BEAM-6239:


Author: ASF GitHub Bot
Created on: 15/Dec/18 02:12
Start Date: 15/Dec/18 02:12
Worklog Time Spent: 10m 
  Work Description: kennknowles opened a new pull request #7288: 
[BEAM-6239] Add SQL sessionize then side input join as a benchmark
URL: https://github.com/apache/beam/pull/7288
 
 
   This is my attempt at #7287 in SQL. I believe these features may not be 
supported by Calcite, and have reached out to their mailing list.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)
 [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175657)
Time Spent: 20m  (was: 10m)

> Nexmark benchmark for raw sessionization then stream enrichment
> ---
>
> Key: BEAM-6239
> URL:

[jira] [Work logged] (BEAM-6239) Nexmark benchmark for raw sessionization then stream enrichment

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6239?focusedWorklogId=175656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175656
 ]

ASF GitHub Bot logged work on BEAM-6239:


Author: ASF GitHub Bot
Created on: 15/Dec/18 02:11
Start Date: 15/Dec/18 02:11
Worklog Time Spent: 10m 
  Work Description: kennknowles opened a new pull request #7287: 
[BEAM-6239] Add session side input join to Nexmark
URL: https://github.com/apache/beam/pull/7287
 
 
   This is a new benchmark of sessionization then a join with a side input.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)
 [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175656)
Time Spent: 10m
Remaining Estimate: 0h

> Nexmark benchmark for raw sessionization then stream enrichment
> ---
>
> Key: BEAM-6239
> URL: https://issues.apache.org/jira/browse/BEAM-6239
>

[jira] [Created] (BEAM-6239) Nexmark benchmark for raw sessionization then stream enrichment

2018-12-14 Thread Kenneth Knowles (JIRA)

Kenneth Knowles created BEAM-6239:
-

 Summary: Nexmark benchmark for raw sessionization then stream 
enrichment
 Key: BEAM-6239
 URL: https://issues.apache.org/jira/browse/BEAM-6239
 Project: Beam
  Issue Type: New Feature
  Components: examples-nexmark
Reporter: Kenneth Knowles
Assignee: Kenneth Knowles


We have BOUNDED_SIDE_INPUT_JOIN that just enriches a stream. Another use case 
is to sessionize first. I am curious about the different in perf, and how this 
plays out in SQL.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4594) Implement Beam Python User State and Timer API

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-4594?focusedWorklogId=175650=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175650
 ]

ASF GitHub Bot logged work on BEAM-4594:


Author: ASF GitHub Bot
Created on: 15/Dec/18 00:57
Start Date: 15/Dec/18 00:57
Worklog Time Spent: 10m 
  Work Description: youngoli commented on a change in pull request #7252: 
[BEAM-4594] Support state continuation tokens over the Fn API.
URL: https://github.com/apache/beam/pull/7252#discussion_r241929690
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -1122,9 +1124,25 @@ def restore(self):
 def process_instruction_id(self, unused_instruction_id):
   yield
 
-def blocking_get(self, state_key):
+def blocking_get(self, state_key, continuation_token=None):
   with self._lock:
-return b''.join(self._state[self._to_key(state_key)])
+full_state = self._state[self._to_key(state_key)]
+if self._use_continuation_tokens:
+  new_token = 'token_%d' % len(self._continuations)
+  if not continuation_token:
+# Store (index, blobs).
+self._continuations[new_token] = 0, tuple(full_state)
+return b'', new_token
+  else:
+ix, full_state = self._continuations[continuation_token]
+if ix == len(full_state):
+  return b'', None
+else:
+  self._continuations[new_token] = ix + 1, full_state
+  return full_state[ix], new_token
 
 Review comment:
   So every time there's a valid continuation token and the iteration is 
advanced, that's stored in a new entry in the list of continuations. Is there 
any chance that this list may have get identical continuations with different 
tokens? Since this list seems to be only added to, I instinctively worry about 
it leaking memory like that. Or is it a non-issue based on the broader 
architecture?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175650)
Time Spent: 6h 50m  (was: 6h 40m)

> Implement Beam Python User State and Timer API
> --
>
> Key: BEAM-4594
> URL: https://issues.apache.org/jira/browse/BEAM-4594
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Major
>  Labels: portability
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the implementation of the Beam Python User State and Timer 
> API, described here: [https://s.apache.org/beam-python-user-state-and-timers].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6197) Log time for Dataflow GCS upload of staged files

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6197?focusedWorklogId=175638=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175638
 ]

ASF GitHub Bot logged work on BEAM-6197:


Author: ASF GitHub Bot
Created on: 15/Dec/18 00:08
Start Date: 15/Dec/18 00:08
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #7235: [BEAM-6197] Log time 
for Dataflow GCS upload of staged files + add a test program
URL: https://github.com/apache/beam/pull/7235#issuecomment-447515309
 
 
   Thank you. I can merge it after the last test passes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175638)
Time Spent: 1h 40m  (was: 1.5h)

> Log time for Dataflow GCS upload of staged files
> 
>
> Key: BEAM-6197
> URL: https://issues.apache.org/jira/browse/BEAM-6197
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Would be nice to collect timing in the logs of Dataflow GCS upload of staged 
> files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6197) Log time for Dataflow GCS upload of staged files

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6197?focusedWorklogId=175635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175635
 ]

ASF GitHub Bot logged work on BEAM-6197:


Author: ASF GitHub Bot
Created on: 15/Dec/18 00:07
Start Date: 15/Dec/18 00:07
Worklog Time Spent: 10m 
  Work Description: alanmyrvold removed a comment on issue #7235: 
[BEAM-6197] Log time for Dataflow GCS upload of staged files + add a test 
program
URL: https://github.com/apache/beam/pull/7235#issuecomment-447511793
 
 
   Run Portable_Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175635)
Time Spent: 1h 20m  (was: 1h 10m)

> Log time for Dataflow GCS upload of staged files
> 
>
> Key: BEAM-6197
> URL: https://issues.apache.org/jira/browse/BEAM-6197
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Would be nice to collect timing in the logs of Dataflow GCS upload of staged 
> files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6197) Log time for Dataflow GCS upload of staged files

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6197?focusedWorklogId=175636=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175636
 ]

ASF GitHub Bot logged work on BEAM-6197:


Author: ASF GitHub Bot
Created on: 15/Dec/18 00:07
Start Date: 15/Dec/18 00:07
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on issue #7235: [BEAM-6197] Log 
time for Dataflow GCS upload of staged files + add a test program
URL: https://github.com/apache/beam/pull/7235#issuecomment-447515127
 
 
   Run Portable_Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175636)
Time Spent: 1.5h  (was: 1h 20m)

> Log time for Dataflow GCS upload of staged files
> 
>
> Key: BEAM-6197
> URL: https://issues.apache.org/jira/browse/BEAM-6197
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Would be nice to collect timing in the logs of Dataflow GCS upload of staged 
> files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6197) Log time for Dataflow GCS upload of staged files

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6197?focusedWorklogId=175622=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175622
 ]

ASF GitHub Bot logged work on BEAM-6197:


Author: ASF GitHub Bot
Created on: 14/Dec/18 23:45
Start Date: 14/Dec/18 23:45
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on issue #7235: [BEAM-6197] Log 
time for Dataflow GCS upload of staged files + add a test program
URL: https://github.com/apache/beam/pull/7235#issuecomment-447511793
 
 
   Run Portable_Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175622)
Time Spent: 1h 10m  (was: 1h)

> Log time for Dataflow GCS upload of staged files
> 
>
> Key: BEAM-6197
> URL: https://issues.apache.org/jira/browse/BEAM-6197
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Would be nice to collect timing in the logs of Dataflow GCS upload of staged 
> files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6056) Migrate gRPC to use vendoring library and vendoring format

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6056?focusedWorklogId=175619=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175619
 ]

ASF GitHub Bot logged work on BEAM-6056:


Author: ASF GitHub Bot
Created on: 14/Dec/18 23:35
Start Date: 14/Dec/18 23:35
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #7255: WIP [BEAM-6056] Rename 
v1_13_1 to v1p13p1.
URL: https://github.com/apache/beam/pull/7255#issuecomment-447510252
 
 
   Run Java PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175619)
Time Spent: 2h 20m  (was: 2h 10m)

> Migrate gRPC to use vendoring library and vendoring format
> --
>
> Key: BEAM-6056
> URL: https://issues.apache.org/jira/browse/BEAM-6056
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
>  Labels: portability
> Fix For: 2.10.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This thread discusses the work: 
> https://lists.apache.org/thread.html/4c12db35b40a6d56e170cd6fc8bb0ac4c43a99aa3cb7dbae54176815@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6188) Create unbounded synthetic source

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6188?focusedWorklogId=175618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175618
 ]

ASF GitHub Bot logged work on BEAM-6188:


Author: ASF GitHub Bot
Created on: 14/Dec/18 23:32
Start Date: 14/Dec/18 23:32
Worklog Time Spent: 10m 
  Work Description: pabloem edited a comment on issue #7226: [BEAM-6188] 
Unbouded synthetic source
URL: https://github.com/apache/beam/pull/7226#issuecomment-447509801
 
 
   It all is looking good for now. Only classes that I have left to check out 
are SyntheticUnboundedSource, and SyntheticWatermark.
   
   One question: What do you think about using 
`SyntheticIO.unbounded(options)`/`SyntheticIO.bounded(options)` vs `new 
UnboundedSyntheticSource(options)`/`new BoundedSyntheticSource(options)` ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175618)
Time Spent: 2h 20m  (was: 2h 10m)

> Create unbounded synthetic source
> -
>
> Key: BEAM-6188
> URL: https://issues.apache.org/jira/browse/BEAM-6188
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> It is needed for streaming scenarios. It should provide ways to reason about 
> time and recovering from checkpoints. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6188) Create unbounded synthetic source

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6188?focusedWorklogId=175617=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175617
 ]

ASF GitHub Bot logged work on BEAM-6188:


Author: ASF GitHub Bot
Created on: 14/Dec/18 23:32
Start Date: 14/Dec/18 23:32
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #7226: [BEAM-6188] Unbouded 
synthetic source
URL: https://github.com/apache/beam/pull/7226#issuecomment-447509801
 
 
   It all is looking good for now. Only classes that I have left to check out 
are SyntheticUnboundedSource, and SyntheticWatermark.
   
   One question: What do you think about using `SyntheticIO.unbounded(options)` 
vs `new UnboundedSyntheticSource(options)`/`new 
BoundedSyntheticSource(options)` ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175617)
Time Spent: 2h 10m  (was: 2h)

> Create unbounded synthetic source
> -
>
> Key: BEAM-6188
> URL: https://issues.apache.org/jira/browse/BEAM-6188
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> It is needed for streaming scenarios. It should provide ways to reason about 
> time and recovering from checkpoints. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (BEAM-6238) ULR shows error when successfully finishing a job: "StatusRuntimeException: CANCELLED: cancelled before receiving half close"

2018-12-14 Thread Daniel Oliveira (JIRA)

Daniel Oliveira created BEAM-6238:
-

 Summary: ULR shows error when successfully finishing a job: 
"StatusRuntimeException: CANCELLED: cancelled before receiving half close"
 Key: BEAM-6238
 URL: https://issues.apache.org/jira/browse/BEAM-6238
 Project: Beam
  Issue Type: Bug
  Components: runner-direct
Reporter: Daniel Oliveira
Assignee: Daniel Oliveira


When the ULR finishes running a job, it seems to always end with the following 
error:
{noformat}
[grpc-default-executor-0] ERROR 
org.apache.beam.repackaged.beam_runners_direct_java.sdk.fn.data.BeamFnDataGrpcMultiplexer
 - Failed to handle for unknown endpoint
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.StatusRuntimeException: CANCELLED: 
cancelled before receiving half close
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.Status.asRuntimeException(Status.java:517)
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onCancel(ServerCalls.java:272)
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.PartialForwardingServerCallListener.onCancel(PartialForwardingServerCallListener.java:40)
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.ForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:23)
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:40)
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.Contexts$ContextualizedServerCallListener.onCancel(Contexts.java:96)
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.closed(ServerCallImpl.java:293)
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1Closed.runInContext(ServerImpl.java:738)
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.StatusRuntimeException: CANCELLED: 
cancelled before receiving half close
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.Status.asRuntimeException(Status.java:517)
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onCancel(ServerCalls.java:272)
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.PartialForwardingServerCallListener.onCancel(PartialForwardingServerCallListener.java:40)
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.ForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:23)
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:40)
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.Contexts$ContextualizedServerCallListener.onCancel(Contexts.java:96)
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.closed(ServerCallImpl.java:293)
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1Closed.runInContext(ServerImpl.java:738)
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
[grpc-default-executor-38] WARN 
org.apache.beam.repackaged.beam_runners_direct_java.runners.fnexecution.logging.GrpcLoggingService
 - Beam Fn Logging client failed to be complete.
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.StatusRuntimeException: CANCELLED: 
call already cancelled
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.Status.asRuntimeException(Status.java:517)
 at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onCompleted(ServerCalls.java:356)
 at 
org.apache.beam.repackaged.beam_runners_direct_java.runners.fnexecution.logging.GrpcLoggingService.completeIfNotNull(GrpcLoggingService.java:78)
 at 
org.apache.beam.repackaged.beam_runners_direct_java.runners.fnexecution.logging.GrpcLoggingService.access$400(GrpcLoggingService.java:33)
 at 
org.apache.beam.repackaged.beam_runners_direct_java.runners.fnexecution.logging.GrpcLoggingService$InboundObserver.onError(GrpcLoggingService.java:105)
 at

[jira] [Resolved] (BEAM-6225) Setup Jenkins VR job for new bundle processing code

2018-12-14 Thread Boyuan Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyuan Zhang resolved BEAM-6225.

   Resolution: Fixed
Fix Version/s: 2.10.0

> Setup Jenkins VR job for new bundle processing code
> ---
>
> Key: BEAM-6225
> URL: https://issues.apache.org/jira/browse/BEAM-6225
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (BEAM-5953) Support DataflowRunner on Python 3

2018-12-14 Thread Mark Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702207#comment-16702207
 ] 

Mark Liu edited comment on BEAM-5953 at 12/14/18 10:34 PM:
---

Thanks Valentyn. If I use google-cloud-storage instead of apitools to stage 
files, error will gone.

https://github.com/apache/beam/pull/7051 is the code change, but unfortunately, 
transitive dependency google-cloud-core will have version conflicts if we use 
apitools and google-cloud-storage at same time. The PR is rolled back.


was (Author: markflyhigh):
Thanks Valentyn. The apitools is replaced by google-cloud-storage for staging 
files and error is gone. 

> Support DataflowRunner on Python 3
> --
>
> Key: BEAM-5953
> URL: https://issues.apache.org/jira/browse/BEAM-5953
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6225) Setup Jenkins VR job for new bundle processing code

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6225?focusedWorklogId=175593=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175593
 ]

ASF GitHub Bot logged work on BEAM-6225:


Author: ASF GitHub Bot
Created on: 14/Dec/18 21:47
Start Date: 14/Dec/18 21:47
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #7271: [BEAM-6225] Setup 
Jenkins Job to Run VR with ExecutableStage
URL: https://github.com/apache/beam/pull/7271#issuecomment-447488924
 
 
   Seed Job passes; LGTM, merging


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175593)
Time Spent: 3.5h  (was: 3h 20m)

> Setup Jenkins VR job for new bundle processing code
> ---
>
> Key: BEAM-6225
> URL: https://issues.apache.org/jira/browse/BEAM-6225
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6225) Setup Jenkins VR job for new bundle processing code

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6225?focusedWorklogId=175594=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175594
 ]

ASF GitHub Bot logged work on BEAM-6225:


Author: ASF GitHub Bot
Created on: 14/Dec/18 21:47
Start Date: 14/Dec/18 21:47
Worklog Time Spent: 10m 
  Work Description: swegner closed pull request #7271: [BEAM-6225] Setup 
Jenkins Job to Run VR with ExecutableStage
URL: https://github.com/apache/beam/pull/7271
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_DataflowPortabilityExecutableStage.groovy
 
b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_DataflowPortabilityExecutableStage.groovy
new file mode 100644
index ..62e73617b74e
--- /dev/null
+++ 
b/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_DataflowPortabilityExecutableStage.groovy
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import CommonJobProperties as commonJobProperties
+import PostcommitJobBuilder
+
+
+// This job runs the suite of ValidatesRunner tests against the Dataflow
+// runner.
+PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_ValidatesRunner_DataflowPortabilityExecutableStage',
+  'Run Dataflow Portability ExecutableStage ValidatesRunner', 'Google Cloud 
Dataflow Runner PortabilityApi ExecutableStage ValidatesRunner Tests', this) {
+
+  description('Runs the ValidatesRunner suite on the Dataflow PortabilityApi 
runner with ExecutableStage code path enabled.')
+
+  // Set common parameters. Sets a 3 hour timeout.
+  commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 400)
+
+  // Publish all test results to Jenkins
+  publishers {
+archiveJunit('**/build/test-results/**/*.xml')
+  }
+
+  // Gradle goals for this job.
+  steps {
+gradle {
+  rootBuildScriptDir(commonJobProperties.checkoutDir)
+  
tasks(':beam-runners-google-cloud-dataflow-java:validatesRunnerFnApiWorkerExecutableStageTest')
+  // Increase parallel worker threads above processor limit since most 
time is
+  // spent waiting on Dataflow jobs. ValidatesRunner tests on Dataflow are 
slow
+  // because each one launches a Dataflow job with about 3 mins of 
overhead.
+  // 3 x num_cores strikes a good balance between maxing out parallelism 
without
+  // overloading the machines.
+  commonJobProperties.setGradleSwitches(delegate, 3 * 
Runtime.runtime.availableProcessors())
+}
+  }
+
+  // [BEAM-6236] "use_executable_stage_bundle_execution" hasn't been rolled 
out.
+  disabled()
+}
diff --git a/runners/google-cloud-dataflow-java/build.gradle 
b/runners/google-cloud-dataflow-java/build.gradle
index c0b831c6e547..9c6aaf4f773a 100644
--- a/runners/google-cloud-dataflow-java/build.gradle
+++ b/runners/google-cloud-dataflow-java/build.gradle
@@ -241,17 +241,55 @@ task validatesRunnerFnApiWorkerTest(type: Test) {
 }
 }
 
+task validatesRunnerFnApiWorkerExecutableStageTest(type: Test) {
+group = "Verification"
+description "Validates Dataflow PortabilityApi runner"
+dependsOn 
":beam-runners-google-cloud-dataflow-java-fn-api-worker:shadowJar"
+dependsOn buildAndPushDockerContainer
+
+systemProperty "beamTestPipelineOptions", JsonOutput.toJson([
+"--runner=TestDataflowRunner",
+"--project=${dataflowProject}",
+"--tempRoot=${dataflowPostCommitTempRoot}",
+"--dataflowWorkerJar=${dataflowFnApiWorkerJar}",
+
"--workerHarnessContainerImage=${dockerImageContainer}:${dockerTag}",
+"--experiments=beam_fn_api,use_executable_stage_bundle_execution"]
+)
+
+// Increase test parallelism up to the number of Gradle workers. By 
default this is equal
+// to the number of CPU cores, but can be increased by setting 
--max-workers=N.
+maxParallelForks Integer.MAX_VALUE
+

[jira] [Comment Edited] (BEAM-5419) Build multiple versions of the Flink Runner against different Flink versions

2018-12-14 Thread Giorgos Stamatakis (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721719#comment-16721719
 ] 

Giorgos Stamatakis edited comment on BEAM-5419 at 12/14/18 8:56 PM:


Thank you for your time but it appears that even after installing (and 
including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up:

java.lang.NoClassDefFoundError: org/apache/flink/streaming/api/CheckpointingMode

mvn package exec:java Dexec.mainClass=myPipeline 
"Dexec.args-=--runner=FlinkRunner --flinkMaster=localhost:8081 --streaming=true 
--parallelism=4 --windowSize=10 --filesToStage=target/XXX-bundled-flink.jar" 
-Pflink-runner

 

Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: 


 flink-runner
 
 
 org.apache.beam
 beam-runners-flink-1.6
 2.10.0
 runtime
 
 



 

If there is a sample pom.xml Im interested


was (Author: gstamatakis):
Thank you for your time but it appears that even after installing (and 
including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up:

java.lang.NoClassDefFoundError: org/apache/flink/streaming/api/CheckpointingMode

mvn package exec:java Dexec.mainClass=myPipeline 
"Dexec.args-=--runner=FlinkRunner --flinkMaster=localhost:8081 --streaming=true 
--parallelism=4 --windowSize=10 --filesToStage=target/XXX-bundled-flink.jar" 
-Pflink-runner

 

Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: 


 flink-runner
 
 
 org.apache.beam
 beam-runners-flink-1.6
 2.10.0
 runtime
 
 



> Build multiple versions of the Flink Runner against different Flink versions
> 
>
> Key: BEAM-5419
> URL: https://issues.apache.org/jira/browse/BEAM-5419
> Project: Beam
>  Issue Type: New Feature
>  Components: build-system, runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Following up on a discussion on the mailing list.
> We want to keep the Flink version stable across different versions to avoid 
> upgrade pain for long-term users. At the same time, there are users out there 
> with newer Flink clusters and developers also want to utilize new Flink 
> features.
> It would be great to build multiple versions of the Flink Runner against 
> different Flink versions.
> When the upgrade is as simple as changing the version property in the build 
> script, this should be pretty straight-forward. If not, having a "base 
> version" and applying a patch during the build could be an option. We should 
> avoid duplicating any Runner code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6150) Provide alternatives to SerializableFunction and SimpleFunction that may declare exceptions

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6150?focusedWorklogId=175569=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175569
 ]

ASF GitHub Bot logged work on BEAM-6150:


Author: ASF GitHub Bot
Created on: 14/Dec/18 20:13
Start Date: 14/Dec/18 20:13
Worklog Time Spent: 10m 
  Work Description: kennknowles closed pull request #7160: [BEAM-6150] 
Superinterface for SerializableFunction allowing declared exceptions
URL: https://github.com/apache/beam/pull/7160
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Contextful.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Contextful.java
index 7e788cf05866..97a994f3727e 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Contextful.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Contextful.java
@@ -104,11 +104,11 @@ public String toString() {
   }
 
   /**
-   * Wraps a {@link SerializableFunction} as a {@link Contextful} of {@link 
Fn} with empty {@link
+   * Wraps a {@link ProcessFunction} as a {@link Contextful} of {@link Fn} 
with empty {@link
* Requirements}.
*/
   public static  Contextful> fn(
-  final SerializableFunction fn) {
+  final ProcessFunction fn) {
 return new Contextful<>((element, c) -> fn.apply(element), 
Requirements.empty());
   }
 
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Filter.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Filter.java
index 4bffeb6be3d0..aa9d2cd38100 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Filter.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Filter.java
@@ -32,7 +32,7 @@
   /**
* Returns a {@code PTransform} that takes an input {@code PCollection} 
and returns a {@code
* PCollection} with elements that satisfy the given predicate. The 
predicate must be a {@code
-   * SerializableFunction}.
+   * ProcessFunction}.
*
* Example of use:
*
@@ -46,7 +46,7 @@
* #greaterThanEq}, which return elements satisfying various inequalities 
with the specified value
* based on the elements' natural ordering.
*/
-  public static > 
Filter by(
+  public static > Filter 
by(
   PredicateT predicate) {
 return new Filter<>(predicate);
   }
@@ -71,7 +71,7 @@
* See also {@link #by}, which returns elements that satisfy the given 
predicate.
*/
   public static > Filter lessThan(final T value) {
-return by((SerializableFunction) input -> 
input.compareTo(value) < 0)
+return by((ProcessFunction) input -> input.compareTo(value) < 
0)
 .described(String.format("x < %s", value));
   }
 
@@ -95,7 +95,7 @@
* See also {@link #by}, which returns elements that satisfy the given 
predicate.
*/
   public static > Filter greaterThan(final T value) 
{
-return by((SerializableFunction) input -> 
input.compareTo(value) > 0)
+return by((ProcessFunction) input -> input.compareTo(value) > 
0)
 .described(String.format("x > %s", value));
   }
 
@@ -119,7 +119,7 @@
* See also {@link #by}, which returns elements that satisfy the given 
predicate.
*/
   public static > Filter lessThanEq(final T value) {
-return by((SerializableFunction) input -> 
input.compareTo(value) <= 0)
+return by((ProcessFunction) input -> input.compareTo(value) <= 
0)
 .described(String.format("x ≤ %s", value));
   }
 
@@ -143,7 +143,7 @@
* See also {@link #by}, which returns elements that satisfy the given 
predicate.
*/
   public static > Filter greaterThanEq(final T 
value) {
-return by((SerializableFunction) input -> 
input.compareTo(value) >= 0)
+return by((ProcessFunction) input -> input.compareTo(value) >= 
0)
 .described(String.format("x ≥ %s", value));
   }
 
@@ -166,20 +166,20 @@
* See also {@link #by}, which returns elements that satisfy the given 
predicate.
*/
   public static > Filter equal(final T value) {
-return by((SerializableFunction) input -> 
input.compareTo(value) == 0)
+return by((ProcessFunction) input -> input.compareTo(value) == 
0)
 .described(String.format("x == %s", value));
   }
 
   
///
 
-  private SerializableFunction predicate;
+  private ProcessFunction predicate;
   private String predicateDescription;
 
-  private Filter(SerializableFunction predicate) {
+  private Filter(ProcessFunction predicate) {
 this(predicate, "Filter.predicate");
   }
 
-  private Filter(SerializableFunction predicate, String

[jira] [Work logged] (BEAM-6150) Provide alternatives to SerializableFunction and SimpleFunction that may declare exceptions

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6150?focusedWorklogId=175570=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175570
 ]

ASF GitHub Bot logged work on BEAM-6150:


Author: ASF GitHub Bot
Created on: 14/Dec/18 20:13
Start Date: 14/Dec/18 20:13
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #7160: [BEAM-6150] 
Superinterface for SerializableFunction allowing declared exceptions
URL: https://github.com/apache/beam/pull/7160#issuecomment-447441634
 
 
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175570)
Time Spent: 1h 40m  (was: 1.5h)

> Provide alternatives to SerializableFunction and SimpleFunction that may 
> declare exceptions
> ---
>
> Key: BEAM-6150
> URL: https://issues.apache.org/jira/browse/BEAM-6150
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jeff Klukas
>Assignee: Jeff Klukas
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Contextful.Fn allows subclasses to declare checked exceptions, but neither 
> SerializableFunction nor SimpleFunction do. We want to add a new entry in 
> each of those hierarchies where checked exceptions are allowed. We can then 
> change existing method signatures to accept the new superinterfaces in 
> contexts where allowing user code to throw checked exceptions is acceptable, 
> such as in ProcessElement methods.
> Discussed on the dev mailing list:
> https://lists.apache.org/thread.html/eecd8dea8b47710098ec67d73b87cf9b4e2926c444c3fee1a6b9a743@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6150) Provide alternatives to SerializableFunction and SimpleFunction that may declare exceptions

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6150?focusedWorklogId=175568=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175568
 ]

ASF GitHub Bot logged work on BEAM-6150:


Author: ASF GitHub Bot
Created on: 14/Dec/18 20:12
Start Date: 14/Dec/18 20:12
Worklog Time Spent: 10m 
  Work Description: jklukas commented on issue #7160: [BEAM-6150] 
Superinterface for SerializableFunction allowing declared exceptions
URL: https://github.com/apache/beam/pull/7160#issuecomment-447441228
 
 
   > It looks like the commits should be squashed
   
   Yes, that would be lovely. Do you need me to do that, or are you happy to 
use GitHub's squash and merge?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175568)
Time Spent: 1h 20m  (was: 1h 10m)

> Provide alternatives to SerializableFunction and SimpleFunction that may 
> declare exceptions
> ---
>
> Key: BEAM-6150
> URL: https://issues.apache.org/jira/browse/BEAM-6150
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jeff Klukas
>Assignee: Jeff Klukas
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Contextful.Fn allows subclasses to declare checked exceptions, but neither 
> SerializableFunction nor SimpleFunction do. We want to add a new entry in 
> each of those hierarchies where checked exceptions are allowed. We can then 
> change existing method signatures to accept the new superinterfaces in 
> contexts where allowing user code to throw checked exceptions is acceptable, 
> such as in ProcessElement methods.
> Discussed on the dev mailing list:
> https://lists.apache.org/thread.html/eecd8dea8b47710098ec67d73b87cf9b4e2926c444c3fee1a6b9a743@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6150) Provide alternatives to SerializableFunction and SimpleFunction that may declare exceptions

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6150?focusedWorklogId=175567=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175567
 ]

ASF GitHub Bot logged work on BEAM-6150:


Author: ASF GitHub Bot
Created on: 14/Dec/18 20:10
Start Date: 14/Dec/18 20:10
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #7160: [BEAM-6150] 
Superinterface for SerializableFunction allowing declared exceptions
URL: https://github.com/apache/beam/pull/7160#issuecomment-447440631
 
 
   Nice. It looks like the commits should be squashed, but I don't want to 
assume - is that what you intended?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175567)
Time Spent: 1h 10m  (was: 1h)

> Provide alternatives to SerializableFunction and SimpleFunction that may 
> declare exceptions
> ---
>
> Key: BEAM-6150
> URL: https://issues.apache.org/jira/browse/BEAM-6150
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jeff Klukas
>Assignee: Jeff Klukas
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Contextful.Fn allows subclasses to declare checked exceptions, but neither 
> SerializableFunction nor SimpleFunction do. We want to add a new entry in 
> each of those hierarchies where checked exceptions are allowed. We can then 
> change existing method signatures to accept the new superinterfaces in 
> contexts where allowing user code to throw checked exceptions is acceptable, 
> such as in ProcessElement methods.
> Discussed on the dev mailing list:
> https://lists.apache.org/thread.html/eecd8dea8b47710098ec67d73b87cf9b4e2926c444c3fee1a6b9a743@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6179) Batch size estimation failing

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6179?focusedWorklogId=175562=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175562
 ]

ASF GitHub Bot logged work on BEAM-6179:


Author: ASF GitHub Bot
Created on: 14/Dec/18 19:36
Start Date: 14/Dec/18 19:36
Worklog Time Spent: 10m 
  Work Description: angoenka closed pull request #7280: [BEAM-6179] Fixing 
bundle estimation when all xs are same
URL: https://github.com/apache/beam/pull/7280
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/transforms/util.py 
b/sdks/python/apache_beam/transforms/util.py
index 38839f57842d..046be0b417c2 100644
--- a/sdks/python/apache_beam/transforms/util.py
+++ b/sdks/python/apache_beam/transforms/util.py
@@ -281,6 +281,9 @@ def linear_regression_no_numpy(xs, ys):
 n = float(len(xs))
 xbar = sum(xs) / n
 ybar = sum(ys) / n
+if [xs[0]] * len(xs) == xs:
+  # Simply use the mean if all values in xs are same.
+  return 0, ybar/xbar
 b = (sum([(x - xbar) * (y - ybar) for x, y in zip(xs, ys)])
  / sum([(x - xbar)**2 for x in xs]))
 a = ybar - b * xbar
@@ -291,13 +294,16 @@ def linear_regression_numpy(xs, ys):
 # pylint: disable=wrong-import-order, wrong-import-position
 import numpy as np
 from numpy import sum
+n = len(xs)
+if [xs[0]] * n == xs:
+  # If all values of xs are same then fallback to 
linear_regression_no_numpy
+  return _BatchSizeEstimator.linear_regression_no_numpy(xs, ys)
 xs = np.asarray(xs, dtype=float)
 ys = np.asarray(ys, dtype=float)
 
 # First do a simple least squares fit for y = a + bx over all points.
 b, a = np.polyfit(xs, ys, 1)
 
-n = len(xs)
 if n < 10:
   return a, b
 else:
diff --git a/sdks/python/apache_beam/transforms/util_test.py 
b/sdks/python/apache_beam/transforms/util_test.py
index e592f938e175..f0296c0a0f12 100644
--- a/sdks/python/apache_beam/transforms/util_test.py
+++ b/sdks/python/apache_beam/transforms/util_test.py
@@ -18,6 +18,7 @@
 """Unit tests for the transform.util classes."""
 
 from __future__ import absolute_import
+from __future__ import division
 
 import logging
 import random
@@ -160,6 +161,13 @@ def _run_regression_test(self, linear_regression_fn, 
test_outliers):
 self.assertAlmostEqual(a, 5, delta=0.01)
 self.assertAlmostEqual(b, 7, delta=0.01)
 
+# Test repeated xs
+xs = [1 + random.random()] * 100
+ys = [7 * x + 5 + 0.01 * random.random() for x in xs]
+a, b = linear_regression_fn(xs, ys)
+self.assertAlmostEqual(a, 0, delta=0.01)
+self.assertAlmostEqual(b, sum(ys)/(len(ys) * xs[0]), delta=0.01)
+
 if test_outliers:
   xs = [1 + random.random() for _ in range(100)]
   ys = [2*x + 1 for x in xs]


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175562)
Time Spent: 2h  (was: 1h 50m)

> Batch size estimation failing
> -
>
> Key: BEAM-6179
> URL: https://issues.apache.org/jira/browse/BEAM-6179
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Batch size estimation is failing on flink when running 13MB input pipeline 
> with error
> ValueError: On entry to DLASCL parameter number 4 had an illegal value 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction 48: Traceback (most recent call 
> last):
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 135, in _execute
> response = task()
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 170, in 
> self._execute(lambda: worker.do_instruction(work), work)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 221, in do_instruction
> request.instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 237, in process_bundle
> bundle_processor.process_bundle(instruction_id)
>   File 
>

[jira] [Created] (BEAM-6237) ULR ValidatesRunner tests not deleting artifacts.

2018-12-14 Thread Daniel Oliveira (JIRA)

Daniel Oliveira created BEAM-6237:
-

 Summary: ULR ValidatesRunner tests not deleting artifacts.
 Key: BEAM-6237
 URL: https://issues.apache.org/jira/browse/BEAM-6237
 Project: Beam
  Issue Type: Bug
  Components: runner-direct
Reporter: Daniel Oliveira
Assignee: Daniel Oliveira


When running ValidatesRunner tests with the ULR, artifacts are never deleted. 
Since a new job is run per test, this uses up massive amounts of disk storage 
quickly (over 20 Gigabytes per execution). This often causes the machine 
running these tests to run out of disk space which means tests start failing.

The ULR should be modified to delete these artifacts after they have been 
staged to avoid this issue. Flink already does this, so the infrastructure 
exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6179) Batch size estimation failing

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6179?focusedWorklogId=175561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175561
 ]

ASF GitHub Bot logged work on BEAM-6179:


Author: ASF GitHub Bot
Created on: 14/Dec/18 19:35
Start Date: 14/Dec/18 19:35
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #7280: [BEAM-6179] Fixing 
bundle estimation when all xs are same
URL: https://github.com/apache/beam/pull/7280#issuecomment-447431662
 
 
   Thanks robertwb!
   Merging it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175561)
Time Spent: 1h 50m  (was: 1h 40m)

> Batch size estimation failing
> -
>
> Key: BEAM-6179
> URL: https://issues.apache.org/jira/browse/BEAM-6179
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Batch size estimation is failing on flink when running 13MB input pipeline 
> with error
> ValueError: On entry to DLASCL parameter number 4 had an illegal value 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction 48: Traceback (most recent call 
> last):
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 135, in _execute
> response = task()
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 170, in 
> self._execute(lambda: worker.do_instruction(work), work)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 221, in do_instruction
> request.instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 237, in process_bundle
> bundle_processor.process_bundle(instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 480, in process_bundle
> ].process_encoded(data.data)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 125, in process_encoded
> self.output(decoded_value)
>   File "apache_beam/runners/worker/operations.py", line 182, in 
> apache_beam.runners.worker.operations.Operation.output
> def output(self, windowed_value, output_index=0):
>   File "apache_beam/runners/worker/operations.py", line 183, in 
> apache_beam.runners.worker.operations.Operation.output
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 89, in 
> apache_beam.runners.worker.operations.ConsumerSet.receive
> cython.cast(Operation, consumer).process(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 497, in 
> apache_beam.runners.worker.operations.DoOperation.process
> with self.scoped_process_state:
>   File "apache_beam/runners/worker/operations.py", line 498, in 
> apache_beam.runners.worker.operations.DoOperation.process
> self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 680, in 
> apache_beam.runners.common.DoFnRunner.receive
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 686, in 
> apache_beam.runners.common.DoFnRunner.process
> self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 709, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
> raise
>   File "apache_beam/runners/common.py", line 684, in 
> apache_beam.runners.common.DoFnRunner.process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 420, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
> output_processor.process_outputs(
>   File "apache_beam/runners/common.py", line 794, in 
> apache_beam.runners.common._OutputProcessor.process_outputs
> self.main_receivers.receive(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 89, in 
> apache_beam.runners.worker.operations.ConsumerSet.receive
> cython.cast(Operation, consumer).process(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 497, in 
> apache_beam.runners.worker.operations.DoOperation.process
> with self.scoped_process_state:
>   File

[jira] [Comment Edited] (BEAM-5419) Build multiple versions of the Flink Runner against different Flink versions

2018-12-14 Thread Giorgos Stamatakis (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721719#comment-16721719
 ] 

Giorgos Stamatakis edited comment on BEAM-5419 at 12/14/18 7:25 PM:


Thank you for your time but it appears that even after installing (and 
including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up:

Caused by: java.lang.ClassNotFoundException: 
org.apache.flink.api.common.ExecutionMode

mvn package exec:java Dexec.mainClass=myPipeline 
"Dexec.args-=--runner=FlinkRunner --flinkMaster=localhost:8081 --streaming=true 
--parallelism=4 --windowSize=10 --filesToStage=target/XXX-bundled-flink.jar" 
-Pflink-runner

 

Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: 


 flink-runner
 
 
 org.apache.beam
 beam-runners-flink-1.6
 2.10.0
 runtime
 
 




was (Author: gstamatakis):
Thank you for your time but it appears that even after installing (and 
including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up:

Caused by: java.lang.ClassNotFoundException: 
org.apache.flink.api.common.ExecutionMode

mvn package exec:java Dexec.mainClass=myPipeline 
"-Dexec.args=--runner=FlinkRunner --flinkMaster=localhost:8081 --streaming=true 
--parallelism=4 --windowSize=10 --filesToStage=target/XXX-bundled-flink.jar" 
-Pflink-runner

 

Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: 


 flink-runner
 
 
 org.apache.beam
 beam-runners-flink-1.6
 2.10.0
 runtime
 
 



> Build multiple versions of the Flink Runner against different Flink versions
> 
>
> Key: BEAM-5419
> URL: https://issues.apache.org/jira/browse/BEAM-5419
> Project: Beam
>  Issue Type: New Feature
>  Components: build-system, runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Following up on a discussion on the mailing list.
> We want to keep the Flink version stable across different versions to avoid 
> upgrade pain for long-term users. At the same time, there are users out there 
> with newer Flink clusters and developers also want to utilize new Flink 
> features.
> It would be great to build multiple versions of the Flink Runner against 
> different Flink versions.
> When the upgrade is as simple as changing the version property in the build 
> script, this should be pretty straight-forward. If not, having a "base 
> version" and applying a patch during the build could be an option. We should 
> avoid duplicating any Runner code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-5419) Build multiple versions of the Flink Runner against different Flink versions

2018-12-14 Thread Giorgos Stamatakis (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721719#comment-16721719
 ] 

Giorgos Stamatakis commented on BEAM-5419:
--

Thank you for your time but it appears that even after installing (and 
including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up:

Caused by: java.lang.ClassNotFoundException: 
org.apache.flink.api.common.ExecutionMode

mvn package exec:java -Dexec.mainClass=myPipeline 
"-Dexec.args=--runner=FlinkRunner --flinkMaster=localhost:8081 --streaming=true 
--parallelism=4 --windowSize=10 --filesToStage=target/XXX-bundled-flink.jar" 
-Pflink-runner

 

Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: 


 flink-runner
 
 
 org.apache.beam
 beam-runners-flink-1.6
 2.10.0
 runtime
 
 



> Build multiple versions of the Flink Runner against different Flink versions
> 
>
> Key: BEAM-5419
> URL: https://issues.apache.org/jira/browse/BEAM-5419
> Project: Beam
>  Issue Type: New Feature
>  Components: build-system, runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Following up on a discussion on the mailing list.
> We want to keep the Flink version stable across different versions to avoid 
> upgrade pain for long-term users. At the same time, there are users out there 
> with newer Flink clusters and developers also want to utilize new Flink 
> features.
> It would be great to build multiple versions of the Flink Runner against 
> different Flink versions.
> When the upgrade is as simple as changing the version property in the build 
> script, this should be pretty straight-forward. If not, having a "base 
> version" and applying a patch during the build could be an option. We should 
> avoid duplicating any Runner code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6197) Log time for Dataflow GCS upload of staged files

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6197?focusedWorklogId=175557=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175557
 ]

ASF GitHub Bot logged work on BEAM-6197:


Author: ASF GitHub Bot
Created on: 14/Dec/18 19:20
Start Date: 14/Dec/18 19:20
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on issue #7235: [BEAM-6197] Log 
time for Dataflow GCS upload of staged files + add a test program
URL: https://github.com/apache/beam/pull/7235#issuecomment-447427556
 
 
   > @alanmyrvold do you know why tests are failing?
   
   Yes. checkstyle was failing due to the format of the class comment. Fixed 
the comment and re-trying


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175557)
Time Spent: 1h  (was: 50m)

> Log time for Dataflow GCS upload of staged files
> 
>
> Key: BEAM-6197
> URL: https://issues.apache.org/jira/browse/BEAM-6197
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Would be nice to collect timing in the logs of Dataflow GCS upload of staged 
> files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (BEAM-5419) Build multiple versions of the Flink Runner against different Flink versions

2018-12-14 Thread Giorgos Stamatakis (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721719#comment-16721719
 ] 

Giorgos Stamatakis edited comment on BEAM-5419 at 12/14/18 7:27 PM:


Thank you for your time but it appears that even after installing (and 
including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up:

java.lang.NoClassDefFoundError: org/apache/flink/streaming/api/CheckpointingMode

mvn package exec:java Dexec.mainClass=myPipeline 
"Dexec.args-=--runner=FlinkRunner --flinkMaster=localhost:8081 --streaming=true 
--parallelism=4 --windowSize=10 --filesToStage=target/XXX-bundled-flink.jar" 
-Pflink-runner

 

Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: 


 flink-runner
 
 
 org.apache.beam
 beam-runners-flink-1.6
 2.10.0
 runtime
 
 




was (Author: gstamatakis):
Thank you for your time but it appears that even after installing (and 
including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up:

Caused by: java.lang.ClassNotFoundException: 
org.apache.flink.api.common.ExecutionMode

mvn package exec:java Dexec.mainClass=myPipeline 
"Dexec.args-=--runner=FlinkRunner --flinkMaster=localhost:8081 --streaming=true 
--parallelism=4 --windowSize=10 --filesToStage=target/XXX-bundled-flink.jar" 
-Pflink-runner

 

Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: 


 flink-runner
 
 
 org.apache.beam
 beam-runners-flink-1.6
 2.10.0
 runtime
 
 



> Build multiple versions of the Flink Runner against different Flink versions
> 
>
> Key: BEAM-5419
> URL: https://issues.apache.org/jira/browse/BEAM-5419
> Project: Beam
>  Issue Type: New Feature
>  Components: build-system, runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Following up on a discussion on the mailing list.
> We want to keep the Flink version stable across different versions to avoid 
> upgrade pain for long-term users. At the same time, there are users out there 
> with newer Flink clusters and developers also want to utilize new Flink 
> features.
> It would be great to build multiple versions of the Flink Runner against 
> different Flink versions.
> When the upgrade is as simple as changing the version property in the build 
> script, this should be pretty straight-forward. If not, having a "base 
> version" and applying a patch during the build could be an option. We should 
> avoid duplicating any Runner code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (BEAM-3211) Add an integration test for TextIO ReadAll transform and dynamic writes

2018-12-14 Thread Chamikara Jayalath (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Jayalath resolved BEAM-3211.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> Add an integration test for TextIO ReadAll transform and dynamic writes
> ---
>
> Key: BEAM-3211
> URL: https://issues.apache.org/jira/browse/BEAM-3211
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>
> We should add a small scale version of performance test available in 
> following file run as a part of 'beam_PostCommit_Java_MavenInstall' and 
> 'beam_PostCommit_Java_ValidatesRunner*' Jenkins test suites.
> https://github.com/apache/beam/blob/master/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6179) Batch size estimation failing

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6179?focusedWorklogId=175558=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175558
 ]

ASF GitHub Bot logged work on BEAM-6179:


Author: ASF GitHub Bot
Created on: 14/Dec/18 19:25
Start Date: 14/Dec/18 19:25
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #7280: [BEAM-6179] Fixing 
bundle estimation when all xs are same
URL: https://github.com/apache/beam/pull/7280#issuecomment-447428867
 
 
   Run Portable_Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175558)
Time Spent: 1h 40m  (was: 1.5h)

> Batch size estimation failing
> -
>
> Key: BEAM-6179
> URL: https://issues.apache.org/jira/browse/BEAM-6179
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Batch size estimation is failing on flink when running 13MB input pipeline 
> with error
> ValueError: On entry to DLASCL parameter number 4 had an illegal value 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction 48: Traceback (most recent call 
> last):
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 135, in _execute
> response = task()
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 170, in 
> self._execute(lambda: worker.do_instruction(work), work)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 221, in do_instruction
> request.instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 237, in process_bundle
> bundle_processor.process_bundle(instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 480, in process_bundle
> ].process_encoded(data.data)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 125, in process_encoded
> self.output(decoded_value)
>   File "apache_beam/runners/worker/operations.py", line 182, in 
> apache_beam.runners.worker.operations.Operation.output
> def output(self, windowed_value, output_index=0):
>   File "apache_beam/runners/worker/operations.py", line 183, in 
> apache_beam.runners.worker.operations.Operation.output
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 89, in 
> apache_beam.runners.worker.operations.ConsumerSet.receive
> cython.cast(Operation, consumer).process(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 497, in 
> apache_beam.runners.worker.operations.DoOperation.process
> with self.scoped_process_state:
>   File "apache_beam/runners/worker/operations.py", line 498, in 
> apache_beam.runners.worker.operations.DoOperation.process
> self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 680, in 
> apache_beam.runners.common.DoFnRunner.receive
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 686, in 
> apache_beam.runners.common.DoFnRunner.process
> self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 709, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
> raise
>   File "apache_beam/runners/common.py", line 684, in 
> apache_beam.runners.common.DoFnRunner.process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 420, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
> output_processor.process_outputs(
>   File "apache_beam/runners/common.py", line 794, in 
> apache_beam.runners.common._OutputProcessor.process_outputs
> self.main_receivers.receive(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 89, in 
> apache_beam.runners.worker.operations.ConsumerSet.receive
> cython.cast(Operation, consumer).process(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 497, in 
> apache_beam.runners.worker.operations.DoOperation.process
> with self.scoped_process_state:
>   File

[jira] [Commented] (BEAM-3211) Add an integration test for TextIO ReadAll transform and dynamic writes

2018-12-14 Thread Chamikara Jayalath (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721721#comment-16721721
 ] 

Chamikara Jayalath commented on BEAM-3211:
--

Yeah, we can close this.

> Add an integration test for TextIO ReadAll transform and dynamic writes
> ---
>
> Key: BEAM-3211
> URL: https://issues.apache.org/jira/browse/BEAM-3211
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Lukasz Gajowy
>Priority: Major
>
> We should add a small scale version of performance test available in 
> following file run as a part of 'beam_PostCommit_Java_MavenInstall' and 
> 'beam_PostCommit_Java_ValidatesRunner*' Jenkins test suites.
> https://github.com/apache/beam/blob/master/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (BEAM-5419) Build multiple versions of the Flink Runner against different Flink versions

2018-12-14 Thread Giorgos Stamatakis (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721719#comment-16721719
 ] 

Giorgos Stamatakis edited comment on BEAM-5419 at 12/14/18 7:24 PM:


Thank you for your time but it appears that even after installing (and 
including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up:

Caused by: java.lang.ClassNotFoundException: 
org.apache.flink.api.common.ExecutionMode

mvn package exec:java Dexec.mainClass=myPipeline 
"-Dexec.args=--runner=FlinkRunner --flinkMaster=localhost:8081 --streaming=true 
--parallelism=4 --windowSize=10 --filesToStage=target/XXX-bundled-flink.jar" 
-Pflink-runner

 

Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: 


 flink-runner
 
 
 org.apache.beam
 beam-runners-flink-1.6
 2.10.0
 runtime
 
 




was (Author: gstamatakis):
Thank you for your time but it appears that even after installing (and 
including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up:

Caused by: java.lang.ClassNotFoundException: 
org.apache.flink.api.common.ExecutionMode

mvn package exec:java -Dexec.mainClass=myPipeline 
"-Dexec.args-=--runner=FlinkRunner --flinkMaster=localhost:8081 
--streaming=true --parallelism=4 --windowSize=10 
--filesToStage=target/XXX-bundled-flink.jar" -Pflink-runner

 

Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: 


 flink-runner
 
 
 org.apache.beam
 beam-runners-flink-1.6
 2.10.0
 runtime
 
 



> Build multiple versions of the Flink Runner against different Flink versions
> 
>
> Key: BEAM-5419
> URL: https://issues.apache.org/jira/browse/BEAM-5419
> Project: Beam
>  Issue Type: New Feature
>  Components: build-system, runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Following up on a discussion on the mailing list.
> We want to keep the Flink version stable across different versions to avoid 
> upgrade pain for long-term users. At the same time, there are users out there 
> with newer Flink clusters and developers also want to utilize new Flink 
> features.
> It would be great to build multiple versions of the Flink Runner against 
> different Flink versions.
> When the upgrade is as simple as changing the version property in the build 
> script, this should be pretty straight-forward. If not, having a "base 
> version" and applying a patch during the build could be an option. We should 
> avoid duplicating any Runner code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (BEAM-5419) Build multiple versions of the Flink Runner against different Flink versions

2018-12-14 Thread Giorgos Stamatakis (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721719#comment-16721719
 ] 

Giorgos Stamatakis edited comment on BEAM-5419 at 12/14/18 7:24 PM:


Thank you for your time but it appears that even after installing (and 
including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up:

Caused by: java.lang.ClassNotFoundException: 
org.apache.flink.api.common.ExecutionMode

mvn package exec:java -Dexec.mainClass=myPipeline 
"-Dexec.args-=--runner=FlinkRunner --flinkMaster=localhost:8081 
--streaming=true --parallelism=4 --windowSize=10 
--filesToStage=target/XXX-bundled-flink.jar" -Pflink-runner

 

Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: 


 flink-runner
 
 
 org.apache.beam
 beam-runners-flink-1.6
 2.10.0
 runtime
 
 




was (Author: gstamatakis):
Thank you for your time but it appears that even after installing (and 
including in the pom.xml) the flink-1.6 ~3MB jar an error still pops up:

Caused by: java.lang.ClassNotFoundException: 
org.apache.flink.api.common.ExecutionMode

mvn package exec:java -Dexec.mainClass=myPipeline 
"-Dexec.args=--runner=FlinkRunner --flinkMaster=localhost:8081 --streaming=true 
--parallelism=4 --windowSize=10 --filesToStage=target/XXX-bundled-flink.jar" 
-Pflink-runner

 

Trying to run on a 1.6.2 cluster.The pom.xml flink profile is the following: 


 flink-runner
 
 
 org.apache.beam
 beam-runners-flink-1.6
 2.10.0
 runtime
 
 



> Build multiple versions of the Flink Runner against different Flink versions
> 
>
> Key: BEAM-5419
> URL: https://issues.apache.org/jira/browse/BEAM-5419
> Project: Beam
>  Issue Type: New Feature
>  Components: build-system, runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Following up on a discussion on the mailing list.
> We want to keep the Flink version stable across different versions to avoid 
> upgrade pain for long-term users. At the same time, there are users out there 
> with newer Flink clusters and developers also want to utilize new Flink 
> features.
> It would be great to build multiple versions of the Flink Runner against 
> different Flink versions.
> When the upgrade is as simple as changing the version property in the build 
> script, this should be pretty straight-forward. If not, having a "base 
> version" and applying a patch during the build could be an option. We should 
> avoid duplicating any Runner code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6150) Provide alternatives to SerializableFunction and SimpleFunction that may declare exceptions

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6150?focusedWorklogId=175556=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175556
 ]

ASF GitHub Bot logged work on BEAM-6150:


Author: ASF GitHub Bot
Created on: 14/Dec/18 19:15
Start Date: 14/Dec/18 19:15
Worklog Time Spent: 10m 
  Work Description: jklukas commented on issue #7160: [BEAM-6150] 
Superinterface for SerializableFunction allowing declared exceptions
URL: https://github.com/apache/beam/pull/7160#issuecomment-447426035
 
 
   Thanks for the review, @kennknowles.
   
   > But instead of porting tests to InferableFunction can you duplicate them, 
or some of them.
   
   Done. This should be ready for another review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175556)
Time Spent: 1h  (was: 50m)

> Provide alternatives to SerializableFunction and SimpleFunction that may 
> declare exceptions
> ---
>
> Key: BEAM-6150
> URL: https://issues.apache.org/jira/browse/BEAM-6150
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jeff Klukas
>Assignee: Jeff Klukas
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Contextful.Fn allows subclasses to declare checked exceptions, but neither 
> SerializableFunction nor SimpleFunction do. We want to add a new entry in 
> each of those hierarchies where checked exceptions are allowed. We can then 
> change existing method signatures to accept the new superinterfaces in 
> contexts where allowing user code to throw checked exceptions is acceptable, 
> such as in ProcessElement methods.
> Discussed on the dev mailing list:
> https://lists.apache.org/thread.html/eecd8dea8b47710098ec67d73b87cf9b4e2926c444c3fee1a6b9a743@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6150) Provide alternatives to SerializableFunction and SimpleFunction that may declare exceptions

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6150?focusedWorklogId=17=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-17
 ]

ASF GitHub Bot logged work on BEAM-6150:


Author: ASF GitHub Bot
Created on: 14/Dec/18 19:14
Start Date: 14/Dec/18 19:14
Worklog Time Spent: 10m 
  Work Description: jklukas commented on a change in pull request #7160: 
[BEAM-6150] Superinterface for SerializableFunction allowing declared exceptions
URL: https://github.com/apache/beam/pull/7160#discussion_r241860714
 
 

 ##
 File path: website/src/contribute/ptransform-style-guide.md
 ##
 @@ -395,8 +395,8 @@ If the transform has an aspect of behavior to be 
customized by a user's code, ma
 
 Do:
 
-* If possible, just use PTransform composition as an extensibility device - 
i.e. if the same effect can be achieved by the user applying the transform in 
their pipeline and composing it with another `PTransform`, then the transform 
itself should not be extensible. E.g., a transform that writes JSON objects to 
a third-party system should take a `PCollection` (assuming it is 
possible to provide a `Coder` for `JsonObject`), rather than taking a generic 
`PCollection` and a `SerializableFunction` (anti-example that 
should be fixed: `TextIO`).
-* If extensibility by user code is necessary inside the transform, pass the 
user code as a `SerializableFunction` or define your own serializable 
function-like type (ideally single-method, for interoperability with Java 8 
lambdas). Because Java erases the types of lambdas, you should be sure to have 
adequate type information even if a raw-type `SerializableFunction` is provided 
by the user. See `MapElements` and `FlatMapElements` for examples of how to use 
`SimpleFunction` and `SerializableFunction` in tandem to support Java 7 and 
Java 8 well.
+* If possible, just use PTransform composition as an extensibility device - 
i.e. if the same effect can be achieved by the user applying the transform in 
their pipeline and composing it with another `PTransform`, then the transform 
itself should not be extensible. E.g., a transform that writes JSON objects to 
a third-party system should take a `PCollection` (assuming it is 
possible to provide a `Coder` for `JsonObject`), rather than taking a generic 
`PCollection` and a `ProcessFunction` (anti-example that 
should be fixed: `TextIO`).
+* If extensibility by user code is necessary inside the transform, pass the 
user code as a `ProcessFunction` or define your own serializable function-like 
type (ideally single-method, for interoperability with Java 8 lambdas). Because 
Java erases the types of lambdas, you should be sure to have adequate type 
information even if a raw-type `ProcessFunction` is provided by the user. See 
`MapElements` and `FlatMapElements` for examples of how to use 
`ProcessFunction` and `InferableFunction` in tandem to provide good support for 
both lambdas and concrete subclasses with type information.
 
 Review comment:
   Changed wording of the last sentence here to remove reference to Java 7 and 
instead discuss type inferability.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 17)
Time Spent: 50m  (was: 40m)

> Provide alternatives to SerializableFunction and SimpleFunction that may 
> declare exceptions
> ---
>
> Key: BEAM-6150
> URL: https://issues.apache.org/jira/browse/BEAM-6150
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jeff Klukas
>Assignee: Jeff Klukas
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Contextful.Fn allows subclasses to declare checked exceptions, but neither 
> SerializableFunction nor SimpleFunction do. We want to add a new entry in 
> each of those hierarchies where checked exceptions are allowed. We can then 
> change existing method signatures to accept the new superinterfaces in 
> contexts where allowing user code to throw checked exceptions is acceptable, 
> such as in ProcessElement methods.
> Discussed on the dev mailing list:
> https://lists.apache.org/thread.html/eecd8dea8b47710098ec67d73b87cf9b4e2926c444c3fee1a6b9a743@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6150) Provide alternatives to SerializableFunction and SimpleFunction that may declare exceptions

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6150?focusedWorklogId=175554=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175554
 ]

ASF GitHub Bot logged work on BEAM-6150:


Author: ASF GitHub Bot
Created on: 14/Dec/18 19:13
Start Date: 14/Dec/18 19:13
Worklog Time Spent: 10m 
  Work Description: jklukas commented on a change in pull request #7160: 
[BEAM-6150] Superinterface for SerializableFunction allowing declared exceptions
URL: https://github.com/apache/beam/pull/7160#discussion_r241860482
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/MapElementsTest.java
 ##
 @@ -166,12 +231,31 @@ public Integer apply(KV input) {
   }
 
   /**
-   * Basic test of {@link MapElements} with a {@link SerializableFunction}. 
This style is generally
-   * discouraged in Java 7, in favor of {@link SimpleFunction}.
+   * Test of {@link MapElements} coder propagation with a parametric {@link 
InferableFunction} where
+   * the type variable occurs nested within other concrete type constructors.
 
 Review comment:
   This looks out of place in the diff. The removed lines are the docstring for 
`testMapBasicSerializableFunction` which has been renamed to 
`testMapBasicProcessFunction` and appears as the next function below.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175554)
Time Spent: 40m  (was: 0.5h)

> Provide alternatives to SerializableFunction and SimpleFunction that may 
> declare exceptions
> ---
>
> Key: BEAM-6150
> URL: https://issues.apache.org/jira/browse/BEAM-6150
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jeff Klukas
>Assignee: Jeff Klukas
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Contextful.Fn allows subclasses to declare checked exceptions, but neither 
> SerializableFunction nor SimpleFunction do. We want to add a new entry in 
> each of those hierarchies where checked exceptions are allowed. We can then 
> change existing method signatures to accept the new superinterfaces in 
> contexts where allowing user code to throw checked exceptions is acceptable, 
> such as in ProcessElement methods.
> Discussed on the dev mailing list:
> https://lists.apache.org/thread.html/eecd8dea8b47710098ec67d73b87cf9b4e2926c444c3fee1a6b9a743@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6190) "Processing stuck" messages should be visible on Pantheon

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6190?focusedWorklogId=175549=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175549
 ]

ASF GitHub Bot logged work on BEAM-6190:


Author: ASF GitHub Bot
Created on: 14/Dec/18 19:07
Start Date: 14/Dec/18 19:07
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #7240: [BEAM-6190] Add 
processing stuck message to Pantheon.
URL: https://github.com/apache/beam/pull/7240#issuecomment-447423853
 
 
   I don't know what's your LDAP. Mine is pabloem@ - talk to me in hangouts?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175549)
Time Spent: 1h 10m  (was: 1h)
Remaining Estimate: 22h 50m  (was: 23h)

> "Processing stuck" messages should be visible on Pantheon
> -
>
> Key: BEAM-6190
> URL: https://issues.apache.org/jira/browse/BEAM-6190
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Affects Versions: 2.8.0
> Environment: Running on Google Cloud Dataflow
>Reporter: Dustin Rhodes
>Assignee: Dustin Rhodes
>Priority: Minor
> Fix For: Not applicable
>
>   Original Estimate: 24h
>  Time Spent: 1h 10m
>  Remaining Estimate: 22h 50m
>
> When user processing results in an exception, it is clearly visible on the 
> Pantheon landing page for a streaming Dataflow job. But when user processing 
> becomes stuck, there is no indication, even though the worker logs it. Most 
> users don't check worker logs and it is not that convenient to check for most 
> users.  Ideally a stuck worker would result in a visible error on the 
> Pantheon landing page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6225) Setup Jenkins VR job for new bundle processing code

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6225?focusedWorklogId=175541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175541
 ]

ASF GitHub Bot logged work on BEAM-6225:


Author: ASF GitHub Bot
Created on: 14/Dec/18 18:44
Start Date: 14/Dec/18 18:44
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #7271: [BEAM-6225] Setup 
Jenkins Job to Run VR with ExecutableStage
URL: https://github.com/apache/beam/pull/7271#issuecomment-447417017
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175541)
Time Spent: 3h 20m  (was: 3h 10m)

> Setup Jenkins VR job for new bundle processing code
> ---
>
> Key: BEAM-6225
> URL: https://issues.apache.org/jira/browse/BEAM-6225
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6225) Setup Jenkins VR job for new bundle processing code

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6225?focusedWorklogId=175523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175523
 ]

ASF GitHub Bot logged work on BEAM-6225:


Author: ASF GitHub Bot
Created on: 14/Dec/18 18:32
Start Date: 14/Dec/18 18:32
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #7271: [BEAM-6225] Setup 
Jenkins Job to Run VR with ExecutableStage
URL: https://github.com/apache/beam/pull/7271#issuecomment-447413580
 
 
   > You mentioned that Dataflow won't use the new 
`use_exetuable_stage_bundle_execution` until there is a service release. If 
that's the case, I don't think we should begin running these tests:
   > 
   > 1. It's misleading because it looks like we're testing something we're not.
   > 2. It's not transparent when the service will be released, and it might 
cause test failures that will be hard to diagnose.
   > 3. Until the experiment is enabled, the tests are redundant with existing 
suites and wastes resources.
   > 
   > How about checking this in bug disabling the job until Dataflow service is 
ready (track with a JIRA ticket)? The [Jenkins job 
DSL](https://jenkinsci.github.io/job-dsl-plugin/#path/job-disabled) includes a 
`disabled()` method.
   
   Filed JIRA: https://issues.apache.org/jira/browse/BEAM-6236


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175523)
Time Spent: 3h 10m  (was: 3h)

> Setup Jenkins VR job for new bundle processing code
> ---
>
> Key: BEAM-6225
> URL: https://issues.apache.org/jira/browse/BEAM-6225
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6179) Batch size estimation failing

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6179?focusedWorklogId=175527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175527
 ]

ASF GitHub Bot logged work on BEAM-6179:


Author: ASF GitHub Bot
Created on: 14/Dec/18 18:36
Start Date: 14/Dec/18 18:36
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #7280: [BEAM-6179] Fixing 
bundle estimation when all xs are same
URL: https://github.com/apache/beam/pull/7280#issuecomment-447414722
 
 
   Run Portable_Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175527)
Time Spent: 0.5h  (was: 20m)

> Batch size estimation failing
> -
>
> Key: BEAM-6179
> URL: https://issues.apache.org/jira/browse/BEAM-6179
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Batch size estimation is failing on flink when running 13MB input pipeline 
> with error
> ValueError: On entry to DLASCL parameter number 4 had an illegal value 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction 48: Traceback (most recent call 
> last):
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 135, in _execute
> response = task()
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 170, in 
> self._execute(lambda: worker.do_instruction(work), work)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 221, in do_instruction
> request.instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 237, in process_bundle
> bundle_processor.process_bundle(instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 480, in process_bundle
> ].process_encoded(data.data)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 125, in process_encoded
> self.output(decoded_value)
>   File "apache_beam/runners/worker/operations.py", line 182, in 
> apache_beam.runners.worker.operations.Operation.output
> def output(self, windowed_value, output_index=0):
>   File "apache_beam/runners/worker/operations.py", line 183, in 
> apache_beam.runners.worker.operations.Operation.output
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 89, in 
> apache_beam.runners.worker.operations.ConsumerSet.receive
> cython.cast(Operation, consumer).process(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 497, in 
> apache_beam.runners.worker.operations.DoOperation.process
> with self.scoped_process_state:
>   File "apache_beam/runners/worker/operations.py", line 498, in 
> apache_beam.runners.worker.operations.DoOperation.process
> self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 680, in 
> apache_beam.runners.common.DoFnRunner.receive
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 686, in 
> apache_beam.runners.common.DoFnRunner.process
> self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 709, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
> raise
>   File "apache_beam/runners/common.py", line 684, in 
> apache_beam.runners.common.DoFnRunner.process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 420, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
> output_processor.process_outputs(
>   File "apache_beam/runners/common.py", line 794, in 
> apache_beam.runners.common._OutputProcessor.process_outputs
> self.main_receivers.receive(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 89, in 
> apache_beam.runners.worker.operations.ConsumerSet.receive
> cython.cast(Operation, consumer).process(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 497, in 
> apache_beam.runners.worker.operations.DoOperation.process
> with self.scoped_process_state:
>   File "apache_beam/runners/worker/operations.py",

[jira] [Work logged] (BEAM-6179) Batch size estimation failing

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6179?focusedWorklogId=175535=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175535
 ]

ASF GitHub Bot logged work on BEAM-6179:


Author: ASF GitHub Bot
Created on: 14/Dec/18 18:38
Start Date: 14/Dec/18 18:38
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #7280: [BEAM-6179] Fixing 
bundle estimation when all xs are same
URL: https://github.com/apache/beam/pull/7280#issuecomment-447415359
 
 
   Run Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175535)
Time Spent: 1h 20m  (was: 1h 10m)

> Batch size estimation failing
> -
>
> Key: BEAM-6179
> URL: https://issues.apache.org/jira/browse/BEAM-6179
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Batch size estimation is failing on flink when running 13MB input pipeline 
> with error
> ValueError: On entry to DLASCL parameter number 4 had an illegal value 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction 48: Traceback (most recent call 
> last):
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 135, in _execute
> response = task()
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 170, in 
> self._execute(lambda: worker.do_instruction(work), work)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 221, in do_instruction
> request.instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 237, in process_bundle
> bundle_processor.process_bundle(instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 480, in process_bundle
> ].process_encoded(data.data)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 125, in process_encoded
> self.output(decoded_value)
>   File "apache_beam/runners/worker/operations.py", line 182, in 
> apache_beam.runners.worker.operations.Operation.output
> def output(self, windowed_value, output_index=0):
>   File "apache_beam/runners/worker/operations.py", line 183, in 
> apache_beam.runners.worker.operations.Operation.output
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 89, in 
> apache_beam.runners.worker.operations.ConsumerSet.receive
> cython.cast(Operation, consumer).process(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 497, in 
> apache_beam.runners.worker.operations.DoOperation.process
> with self.scoped_process_state:
>   File "apache_beam/runners/worker/operations.py", line 498, in 
> apache_beam.runners.worker.operations.DoOperation.process
> self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 680, in 
> apache_beam.runners.common.DoFnRunner.receive
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 686, in 
> apache_beam.runners.common.DoFnRunner.process
> self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 709, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
> raise
>   File "apache_beam/runners/common.py", line 684, in 
> apache_beam.runners.common.DoFnRunner.process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 420, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
> output_processor.process_outputs(
>   File "apache_beam/runners/common.py", line 794, in 
> apache_beam.runners.common._OutputProcessor.process_outputs
> self.main_receivers.receive(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 89, in 
> apache_beam.runners.worker.operations.ConsumerSet.receive
> cython.cast(Operation, consumer).process(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 497, in 
> apache_beam.runners.worker.operations.DoOperation.process
> with self.scoped_process_state:
>   File "apache_beam/runners/worker/operations.py",

[jira] [Work logged] (BEAM-6186) Cleanup FnApiRunner optimization phases.

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6186?focusedWorklogId=175532=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175532
 ]

ASF GitHub Bot logged work on BEAM-6186:


Author: ASF GitHub Bot
Created on: 14/Dec/18 18:38
Start Date: 14/Dec/18 18:38
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #7281: [BEAM-6186] Finish 
moving optimization phases.
URL: https://github.com/apache/beam/pull/7281#issuecomment-447415196
 
 
   Run RAT PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175532)
Time Spent: 1h 10m  (was: 1h)

> Cleanup FnApiRunner optimization phases.
> 
>
> Key: BEAM-6186
> URL: https://issues.apache.org/jira/browse/BEAM-6186
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Ahmet Altay
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> They are currently expressed as functions with closure. It would be good to 
> pull them out with explicit dependencies both to better be able to follow the 
> code, and also be able to test and reuse them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6179) Batch size estimation failing

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6179?focusedWorklogId=175534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175534
 ]

ASF GitHub Bot logged work on BEAM-6179:


Author: ASF GitHub Bot
Created on: 14/Dec/18 18:38
Start Date: 14/Dec/18 18:38
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #7280: [BEAM-6179] Fixing 
bundle estimation when all xs are same
URL: https://github.com/apache/beam/pull/7280#issuecomment-447415322
 
 
   Run Portable_Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175534)
Time Spent: 1h 10m  (was: 1h)

> Batch size estimation failing
> -
>
> Key: BEAM-6179
> URL: https://issues.apache.org/jira/browse/BEAM-6179
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Batch size estimation is failing on flink when running 13MB input pipeline 
> with error
> ValueError: On entry to DLASCL parameter number 4 had an illegal value 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction 48: Traceback (most recent call 
> last):
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 135, in _execute
> response = task()
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 170, in 
> self._execute(lambda: worker.do_instruction(work), work)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 221, in do_instruction
> request.instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 237, in process_bundle
> bundle_processor.process_bundle(instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 480, in process_bundle
> ].process_encoded(data.data)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 125, in process_encoded
> self.output(decoded_value)
>   File "apache_beam/runners/worker/operations.py", line 182, in 
> apache_beam.runners.worker.operations.Operation.output
> def output(self, windowed_value, output_index=0):
>   File "apache_beam/runners/worker/operations.py", line 183, in 
> apache_beam.runners.worker.operations.Operation.output
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 89, in 
> apache_beam.runners.worker.operations.ConsumerSet.receive
> cython.cast(Operation, consumer).process(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 497, in 
> apache_beam.runners.worker.operations.DoOperation.process
> with self.scoped_process_state:
>   File "apache_beam/runners/worker/operations.py", line 498, in 
> apache_beam.runners.worker.operations.DoOperation.process
> self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 680, in 
> apache_beam.runners.common.DoFnRunner.receive
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 686, in 
> apache_beam.runners.common.DoFnRunner.process
> self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 709, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
> raise
>   File "apache_beam/runners/common.py", line 684, in 
> apache_beam.runners.common.DoFnRunner.process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 420, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
> output_processor.process_outputs(
>   File "apache_beam/runners/common.py", line 794, in 
> apache_beam.runners.common._OutputProcessor.process_outputs
> self.main_receivers.receive(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 89, in 
> apache_beam.runners.worker.operations.ConsumerSet.receive
> cython.cast(Operation, consumer).process(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 497, in 
> apache_beam.runners.worker.operations.DoOperation.process
> with self.scoped_process_state:
>   File

[jira] [Work logged] (BEAM-6179) Batch size estimation failing

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6179?focusedWorklogId=175537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175537
 ]

ASF GitHub Bot logged work on BEAM-6179:


Author: ASF GitHub Bot
Created on: 14/Dec/18 18:38
Start Date: 14/Dec/18 18:38
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #7280: [BEAM-6179] Fixing 
bundle estimation when all xs are same
URL: https://github.com/apache/beam/pull/7280#issuecomment-447415445
 
 
   Run Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175537)
Time Spent: 1.5h  (was: 1h 20m)

> Batch size estimation failing
> -
>
> Key: BEAM-6179
> URL: https://issues.apache.org/jira/browse/BEAM-6179
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Batch size estimation is failing on flink when running 13MB input pipeline 
> with error
> ValueError: On entry to DLASCL parameter number 4 had an illegal value 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction 48: Traceback (most recent call 
> last):
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 135, in _execute
> response = task()
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 170, in 
> self._execute(lambda: worker.do_instruction(work), work)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 221, in do_instruction
> request.instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 237, in process_bundle
> bundle_processor.process_bundle(instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 480, in process_bundle
> ].process_encoded(data.data)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 125, in process_encoded
> self.output(decoded_value)
>   File "apache_beam/runners/worker/operations.py", line 182, in 
> apache_beam.runners.worker.operations.Operation.output
> def output(self, windowed_value, output_index=0):
>   File "apache_beam/runners/worker/operations.py", line 183, in 
> apache_beam.runners.worker.operations.Operation.output
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 89, in 
> apache_beam.runners.worker.operations.ConsumerSet.receive
> cython.cast(Operation, consumer).process(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 497, in 
> apache_beam.runners.worker.operations.DoOperation.process
> with self.scoped_process_state:
>   File "apache_beam/runners/worker/operations.py", line 498, in 
> apache_beam.runners.worker.operations.DoOperation.process
> self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 680, in 
> apache_beam.runners.common.DoFnRunner.receive
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 686, in 
> apache_beam.runners.common.DoFnRunner.process
> self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 709, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
> raise
>   File "apache_beam/runners/common.py", line 684, in 
> apache_beam.runners.common.DoFnRunner.process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 420, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
> output_processor.process_outputs(
>   File "apache_beam/runners/common.py", line 794, in 
> apache_beam.runners.common._OutputProcessor.process_outputs
> self.main_receivers.receive(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 89, in 
> apache_beam.runners.worker.operations.ConsumerSet.receive
> cython.cast(Operation, consumer).process(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 497, in 
> apache_beam.runners.worker.operations.DoOperation.process
> with self.scoped_process_state:
>   File "apache_beam/runners/worker/operations.py", line

[jira] [Work logged] (BEAM-6179) Batch size estimation failing

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6179?focusedWorklogId=175533=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175533
 ]

ASF GitHub Bot logged work on BEAM-6179:


Author: ASF GitHub Bot
Created on: 14/Dec/18 18:38
Start Date: 14/Dec/18 18:38
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #7280: [BEAM-6179] Fixing 
bundle estimation when all xs are same
URL: https://github.com/apache/beam/pull/7280#issuecomment-447415256
 
 
   Run RAT PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175533)
Time Spent: 1h  (was: 50m)

> Batch size estimation failing
> -
>
> Key: BEAM-6179
> URL: https://issues.apache.org/jira/browse/BEAM-6179
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Batch size estimation is failing on flink when running 13MB input pipeline 
> with error
> ValueError: On entry to DLASCL parameter number 4 had an illegal value 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction 48: Traceback (most recent call 
> last):
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 135, in _execute
> response = task()
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 170, in 
> self._execute(lambda: worker.do_instruction(work), work)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 221, in do_instruction
> request.instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 237, in process_bundle
> bundle_processor.process_bundle(instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 480, in process_bundle
> ].process_encoded(data.data)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 125, in process_encoded
> self.output(decoded_value)
>   File "apache_beam/runners/worker/operations.py", line 182, in 
> apache_beam.runners.worker.operations.Operation.output
> def output(self, windowed_value, output_index=0):
>   File "apache_beam/runners/worker/operations.py", line 183, in 
> apache_beam.runners.worker.operations.Operation.output
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 89, in 
> apache_beam.runners.worker.operations.ConsumerSet.receive
> cython.cast(Operation, consumer).process(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 497, in 
> apache_beam.runners.worker.operations.DoOperation.process
> with self.scoped_process_state:
>   File "apache_beam/runners/worker/operations.py", line 498, in 
> apache_beam.runners.worker.operations.DoOperation.process
> self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 680, in 
> apache_beam.runners.common.DoFnRunner.receive
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 686, in 
> apache_beam.runners.common.DoFnRunner.process
> self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 709, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
> raise
>   File "apache_beam/runners/common.py", line 684, in 
> apache_beam.runners.common.DoFnRunner.process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 420, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
> output_processor.process_outputs(
>   File "apache_beam/runners/common.py", line 794, in 
> apache_beam.runners.common._OutputProcessor.process_outputs
> self.main_receivers.receive(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 89, in 
> apache_beam.runners.worker.operations.ConsumerSet.receive
> cython.cast(Operation, consumer).process(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 497, in 
> apache_beam.runners.worker.operations.DoOperation.process
> with self.scoped_process_state:
>   File "apache_beam/runners/worker/operations.py", line 498, in 
>

[jira] [Work logged] (BEAM-6179) Batch size estimation failing

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6179?focusedWorklogId=175529=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175529
 ]

ASF GitHub Bot logged work on BEAM-6179:


Author: ASF GitHub Bot
Created on: 14/Dec/18 18:36
Start Date: 14/Dec/18 18:36
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #7280: [BEAM-6179] Fixing 
bundle estimation when all xs are same
URL: https://github.com/apache/beam/pull/7280#issuecomment-447414758
 
 
   Run Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175529)
Time Spent: 40m  (was: 0.5h)

> Batch size estimation failing
> -
>
> Key: BEAM-6179
> URL: https://issues.apache.org/jira/browse/BEAM-6179
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Batch size estimation is failing on flink when running 13MB input pipeline 
> with error
> ValueError: On entry to DLASCL parameter number 4 had an illegal value 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction 48: Traceback (most recent call 
> last):
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 135, in _execute
> response = task()
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 170, in 
> self._execute(lambda: worker.do_instruction(work), work)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 221, in do_instruction
> request.instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 237, in process_bundle
> bundle_processor.process_bundle(instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 480, in process_bundle
> ].process_encoded(data.data)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 125, in process_encoded
> self.output(decoded_value)
>   File "apache_beam/runners/worker/operations.py", line 182, in 
> apache_beam.runners.worker.operations.Operation.output
> def output(self, windowed_value, output_index=0):
>   File "apache_beam/runners/worker/operations.py", line 183, in 
> apache_beam.runners.worker.operations.Operation.output
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 89, in 
> apache_beam.runners.worker.operations.ConsumerSet.receive
> cython.cast(Operation, consumer).process(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 497, in 
> apache_beam.runners.worker.operations.DoOperation.process
> with self.scoped_process_state:
>   File "apache_beam/runners/worker/operations.py", line 498, in 
> apache_beam.runners.worker.operations.DoOperation.process
> self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 680, in 
> apache_beam.runners.common.DoFnRunner.receive
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 686, in 
> apache_beam.runners.common.DoFnRunner.process
> self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 709, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
> raise
>   File "apache_beam/runners/common.py", line 684, in 
> apache_beam.runners.common.DoFnRunner.process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 420, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
> output_processor.process_outputs(
>   File "apache_beam/runners/common.py", line 794, in 
> apache_beam.runners.common._OutputProcessor.process_outputs
> self.main_receivers.receive(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 89, in 
> apache_beam.runners.worker.operations.ConsumerSet.receive
> cython.cast(Operation, consumer).process(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 497, in 
> apache_beam.runners.worker.operations.DoOperation.process
> with self.scoped_process_state:
>   File "apache_beam/runners/worker/operations.py", line 498,

[jira] [Work logged] (BEAM-6179) Batch size estimation failing

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6179?focusedWorklogId=175530=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175530
 ]

ASF GitHub Bot logged work on BEAM-6179:


Author: ASF GitHub Bot
Created on: 14/Dec/18 18:36
Start Date: 14/Dec/18 18:36
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #7280: [BEAM-6179] Fixing 
bundle estimation when all xs are same
URL: https://github.com/apache/beam/pull/7280#issuecomment-447414805
 
 
   Run RAT PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175530)
Time Spent: 50m  (was: 40m)

> Batch size estimation failing
> -
>
> Key: BEAM-6179
> URL: https://issues.apache.org/jira/browse/BEAM-6179
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Batch size estimation is failing on flink when running 13MB input pipeline 
> with error
> ValueError: On entry to DLASCL parameter number 4 had an illegal value 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction 48: Traceback (most recent call 
> last):
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 135, in _execute
> response = task()
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 170, in 
> self._execute(lambda: worker.do_instruction(work), work)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 221, in do_instruction
> request.instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 237, in process_bundle
> bundle_processor.process_bundle(instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 480, in process_bundle
> ].process_encoded(data.data)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 125, in process_encoded
> self.output(decoded_value)
>   File "apache_beam/runners/worker/operations.py", line 182, in 
> apache_beam.runners.worker.operations.Operation.output
> def output(self, windowed_value, output_index=0):
>   File "apache_beam/runners/worker/operations.py", line 183, in 
> apache_beam.runners.worker.operations.Operation.output
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 89, in 
> apache_beam.runners.worker.operations.ConsumerSet.receive
> cython.cast(Operation, consumer).process(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 497, in 
> apache_beam.runners.worker.operations.DoOperation.process
> with self.scoped_process_state:
>   File "apache_beam/runners/worker/operations.py", line 498, in 
> apache_beam.runners.worker.operations.DoOperation.process
> self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 680, in 
> apache_beam.runners.common.DoFnRunner.receive
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 686, in 
> apache_beam.runners.common.DoFnRunner.process
> self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 709, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
> raise
>   File "apache_beam/runners/common.py", line 684, in 
> apache_beam.runners.common.DoFnRunner.process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 420, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
> output_processor.process_outputs(
>   File "apache_beam/runners/common.py", line 794, in 
> apache_beam.runners.common._OutputProcessor.process_outputs
> self.main_receivers.receive(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 89, in 
> apache_beam.runners.worker.operations.ConsumerSet.receive
> cython.cast(Operation, consumer).process(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 497, in 
> apache_beam.runners.worker.operations.DoOperation.process
> with self.scoped_process_state:
>   File "apache_beam/runners/worker/operations.py", line 498, in

[jira] [Work logged] (BEAM-6225) Setup Jenkins VR job for new bundle processing code

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6225?focusedWorklogId=175522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175522
 ]

ASF GitHub Bot logged work on BEAM-6225:


Author: ASF GitHub Bot
Created on: 14/Dec/18 18:31
Start Date: 14/Dec/18 18:31
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #7271: [BEAM-6225] Setup 
Jenkins Job to Run VR with ExecutableStage
URL: https://github.com/apache/beam/pull/7271#issuecomment-447413447
 
 
   > Please validate that the seed job passes before merging.
   > 
   > Otherwise LGTM once above comments are addressed.
   
   All comments above are addressed in commit 
https://github.com/apache/beam/pull/7271/commits/bcba9ead665cab0ff7d7e8036b47b3321a2771d2
 . 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175522)
Time Spent: 3h  (was: 2h 50m)

> Setup Jenkins VR job for new bundle processing code
> ---
>
> Key: BEAM-6225
> URL: https://issues.apache.org/jira/browse/BEAM-6225
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (BEAM-6236) Enable VR test job for ExecutableStage after Dataflow service roll out

2018-12-14 Thread Boyuan Zhang (JIRA)

Boyuan Zhang created BEAM-6236:
--

 Summary: Enable VR test job for ExecutableStage after Dataflow 
service roll out
 Key: BEAM-6236
 URL: https://issues.apache.org/jira/browse/BEAM-6236
 Project: Beam
  Issue Type: Task
  Components: runner-dataflow
Reporter: Boyuan Zhang
Assignee: Boyuan Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-5723) CassandraIO is broken because of use of bad relocation of guava

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5723?focusedWorklogId=175520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175520
 ]

ASF GitHub Bot logged work on BEAM-5723:


Author: ASF GitHub Bot
Created on: 14/Dec/18 18:28
Start Date: 14/Dec/18 18:28
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #7237: [BEAM-5723] Changed 
shadow plugin configuration to avoid relocating g…
URL: https://github.com/apache/beam/pull/7237#issuecomment-447412413
 
 
   Thanks for jumping in @kennknowles. If you don't mind, I'm going to remove 
myself from this review and let you carry it forward.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175520)
Time Spent: 2h 50m  (was: 2h 40m)

> CassandraIO is broken because of use of bad relocation of guava
> ---
>
> Key: BEAM-5723
> URL: https://issues.apache.org/jira/browse/BEAM-5723
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra
>Affects Versions: 2.5.0, 2.6.0, 2.7.0
>Reporter: Arun sethia
>Assignee: João Cabrita
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> While using apache beam to run dataflow job to read data from BigQuery and 
> Store/Write to Cassandra with following libaries:
>  # beam-sdks-java-io-cassandra - 2.6.0
>  # beam-sdks-java-io-jdbc - 2.6.0
>  # beam-sdks-java-io-google-cloud-platform - 2.6.0
>  # beam-sdks-java-core - 2.6.0
>  # google-cloud-dataflow-java-sdk-all - 2.5.0
>  # google-api-client -1.25.0
>  
> I am getting following error at the time insert/save data to Cassandra.
> {code:java}
> [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6225) Setup Jenkins VR job for new bundle processing code

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6225?focusedWorklogId=175509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175509
 ]

ASF GitHub Bot logged work on BEAM-6225:


Author: ASF GitHub Bot
Created on: 14/Dec/18 18:09
Start Date: 14/Dec/18 18:09
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #7271: [BEAM-6225] Setup 
Jenkins Job to Run VR with ExecutableStage
URL: https://github.com/apache/beam/pull/7271#issuecomment-447406616
 
 
   Please validate that the seed job passes before merging.
   
   Otherwise LGTM once above comments are addressed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175509)
Time Spent: 2h 50m  (was: 2h 40m)

> Setup Jenkins VR job for new bundle processing code
> ---
>
> Key: BEAM-6225
> URL: https://issues.apache.org/jira/browse/BEAM-6225
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6225) Setup Jenkins VR job for new bundle processing code

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6225?focusedWorklogId=175508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175508
 ]

ASF GitHub Bot logged work on BEAM-6225:


Author: ASF GitHub Bot
Created on: 14/Dec/18 18:09
Start Date: 14/Dec/18 18:09
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #7271: [BEAM-6225] Setup 
Jenkins Job to Run VR with ExecutableStage
URL: https://github.com/apache/beam/pull/7271#issuecomment-447406437
 
 
   You mentioned that Dataflow won't use the new 
`use_exetuable_stage_bundle_execution` until there is a service release. If 
that's the case, I don't think we should begin running these tests:
   
   1. It's misleading because it looks like we're testing something we're not.
   2. It's not transparent when the service will be released, and it might 
cause test failures that will be hard to diagnose.
   3. Until the experiment is enabled, the tests are redundant with existing 
suites and wastes resources.
   
   How about checking this in bug disabling the job until Dataflow service is 
ready (track with a JIRA ticket)? The [Jenkins job 
DSL](https://jenkinsci.github.io/job-dsl-plugin/#path/job-disabled) includes a 
`disabled()` method.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175508)
Time Spent: 2h 40m  (was: 2.5h)

> Setup Jenkins VR job for new bundle processing code
> ---
>
> Key: BEAM-6225
> URL: https://issues.apache.org/jira/browse/BEAM-6225
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6225) Setup Jenkins VR job for new bundle processing code

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6225?focusedWorklogId=175504=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175504
 ]

ASF GitHub Bot logged work on BEAM-6225:


Author: ASF GitHub Bot
Created on: 14/Dec/18 17:50
Start Date: 14/Dec/18 17:50
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7271: 
[BEAM-6225] Setup Jenkins Job to Run VR with ExecutableStage
URL: https://github.com/apache/beam/pull/7271#discussion_r241837408
 
 

 ##
 File path: 
.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_PortabilityApi_ExecutableStage_Dataflow.groovy
 ##
 @@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import CommonJobProperties as commonJobProperties
+import PostcommitJobBuilder
+
+
+// This job runs the suite of ValidatesRunner tests against the Dataflow
+// runner.
+PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_ValidatesRunner_PortabilityApi_ExecutableStage
 Dataflow_Gradle',
 
 Review comment:
   We should remove the _Gradle suffix from the job names as well. This is an 
artifact from when we were migrating from Maven -> Gradle and had both job 
types.
   
   I'll go through and remove _Gradle from existing jobs; please remove from 
this job to be consistent.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175504)
Time Spent: 2h 20m  (was: 2h 10m)

> Setup Jenkins VR job for new bundle processing code
> ---
>
> Key: BEAM-6225
> URL: https://issues.apache.org/jira/browse/BEAM-6225
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6191) Redundant error messages for failures in Dataflow runner

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6191?focusedWorklogId=175502=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175502
 ]

ASF GitHub Bot logged work on BEAM-6191:


Author: ASF GitHub Bot
Created on: 14/Dec/18 17:44
Start Date: 14/Dec/18 17:44
Worklog Time Spent: 10m 
  Work Description: swegner closed pull request #7220: [BEAM-6191] Remove 
redundant error logging for Dataflow exception handling
URL: https://github.com/apache/beam/pull/7220
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClient.java
 
b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClient.java
index 2d840e3f4356..4473c04f3da8 100644
--- 
a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClient.java
+++ 
b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClient.java
@@ -117,17 +117,20 @@ public synchronized WorkItemServiceState 
reportError(Throwable e) throws IOExcep
 Status error = new Status();
 error.setCode(2); // Code.UNKNOWN.  TODO: Replace with a generated 
definition.
 // TODO: Attach the stack trace as exception details, not to the message.
+String logPrefix = String.format("Failure processing work item %s", 
uniqueWorkId());
 if (isOutOfMemoryError(t)) {
   String message =
   "An OutOfMemoryException occurred. Consider specifying higher memory 
"
   + "instances in PipelineOptions.\n";
-  LOG.error(message);
+  LOG.error("{}: {}", logPrefix, message);
   error.setMessage(message + 
DataflowWorkerLoggingHandler.formatException(t));
 } else {
-  LOG.error("Uncaught exception occurred during work unit execution. This 
will be retried.", t);
+  LOG.error(
+  "{}: Uncaught exception occurred during work unit execution. This 
will be retried.",
+  logPrefix,
+  t);
   error.setMessage(DataflowWorkerLoggingHandler.formatException(t));
 }
-LOG.warn("Failure processing work item {}", uniqueWorkId());
 status.setErrors(ImmutableList.of(error));
 
 return execute(status);
diff --git 
a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/common/worker/MapTaskExecutor.java
 
b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/common/worker/MapTaskExecutor.java
index d11e72fe95dd..5690814f956d 100644
--- 
a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/common/worker/MapTaskExecutor.java
+++ 
b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/common/worker/MapTaskExecutor.java
@@ -84,7 +84,7 @@ public void execute() throws Exception {
   op.finish();
 }
   } catch (Exception | Error exn) {
-LOG.warn("Aborting operations", exn);
+LOG.debug("Aborting operations", exn);
 for (Operation op : operations) {
   try {
 op.abort();


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175502)
Time Spent: 1h  (was: 50m)

> Redundant error messages for failures in Dataflow runner
> 
>
> Key: BEAM-6191
> URL: https://issues.apache.org/jira/browse/BEAM-6191
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Minor
> Fix For: 2.10.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The Dataflow runner harness has redundant error logging from a couple 
> different components, which creates log spam and confusion when failures do 
> occur. We should dedupe redundant logs.
> From a typical user-code exception, we see at least 3 error logs from the 
> worker:
> http://screen/QZxsJOVnvt6
> "Aborting operations"
> "Uncaught exception occurred during work unit execution. This will be 
> retried."
> "Failure processing work item"



--
This message

[jira] [Work logged] (BEAM-5959) Add Cloud KMS support to GCS copies

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5959?focusedWorklogId=175506=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175506
 ]

ASF GitHub Bot logged work on BEAM-5959:


Author: ASF GitHub Bot
Created on: 14/Dec/18 17:51
Start Date: 14/Dec/18 17:51
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #7266: 
[BEAM-5959] Add performance testing for writing many files
URL: https://github.com/apache/beam/pull/7266#discussion_r241837665
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
 ##
 @@ -758,7 +758,10 @@ final void moveToOutputFiles(
   }
   // During a failure case, files may have been deleted in an earlier 
step. Thus
   // we ignore missing files here.
+  long startTime = System.nanoTime();
   FileSystems.rename(srcFiles, dstFiles, 
StandardMoveOptions.IGNORE_MISSING_FILES);
+  long endTime = System.nanoTime();
+  LOG.info("Renamed {} files in {} seconds.", srcFiles.size(), (endTime - 
startTime) / 1e9);
 
 Review comment:
   I'm going to change the code that does batch copies and I needed a way to 
verify that there are no performance regressions. This PR should do that, but 
if there is a regression this log line could tell me if the batch operation is 
slower.
   This log line does appear in the terminal if I run the test directly using 
`gradlew`. Is there a way to export this metric to the dashboard?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175506)
Time Spent: 7h  (was: 6h 50m)

> Add Cloud KMS support to GCS copies
> ---
>
> Key: BEAM-5959
> URL: https://issues.apache.org/jira/browse/BEAM-5959
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Beam SDK currently uses the CopyTo GCS API call, which doesn't support 
> copying objects that Customer Managed Encryption Keys (CMEK).
> CMEKs are managed in Cloud KMS.
> Items (for Java and Python SDKs):
> - Update clients to versions that support KMS keys.
> - Change copyTo API calls to use rewriteTo (Python - directly, Java - 
> possibly convert copyTo API call to use client library)
> - Add unit tests.
> - Add basic tests (DirectRunner and GCS buckets with CMEK).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6225) Setup Jenkins VR job for new bundle processing code

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6225?focusedWorklogId=175505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175505
 ]

ASF GitHub Bot logged work on BEAM-6225:


Author: ASF GitHub Bot
Created on: 14/Dec/18 17:50
Start Date: 14/Dec/18 17:50
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #7271: 
[BEAM-6225] Setup Jenkins Job to Run VR with ExecutableStage
URL: https://github.com/apache/beam/pull/7271#discussion_r241837527
 
 

 ##
 File path: runners/google-cloud-dataflow-java/build.gradle
 ##
 @@ -241,6 +241,38 @@ task validatesRunnerFnApiWorkerTest(type: Test) {
 }
 }
 
+task validatesRunnerFnApiWorkerExecutableStageTest(type: Test) {
+group = "Verification"
+dependsOn 
":beam-runners-google-cloud-dataflow-java-fn-api-worker:shadowJar"
+dependsOn buildAndPushDockerContainer
+fnApiPipelineOptions.remove(2)
 
 Review comment:
   That works, thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175505)
Time Spent: 2.5h  (was: 2h 20m)

> Setup Jenkins VR job for new bundle processing code
> ---
>
> Key: BEAM-6225
> URL: https://issues.apache.org/jira/browse/BEAM-6225
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (BEAM-6191) Redundant error messages for failures in Dataflow runner

2018-12-14 Thread Scott Wegner (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wegner resolved BEAM-6191.

   Resolution: Fixed
Fix Version/s: 2.10.0

> Redundant error messages for failures in Dataflow runner
> 
>
> Key: BEAM-6191
> URL: https://issues.apache.org/jira/browse/BEAM-6191
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Minor
> Fix For: 2.10.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The Dataflow runner harness has redundant error logging from a couple 
> different components, which creates log spam and confusion when failures do 
> occur. We should dedupe redundant logs.
> From a typical user-code exception, we see at least 3 error logs from the 
> worker:
> http://screen/QZxsJOVnvt6
> "Aborting operations"
> "Uncaught exception occurred during work unit execution. This will be 
> retried."
> "Failure processing work item"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6191) Redundant error messages for failures in Dataflow runner

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6191?focusedWorklogId=175501=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175501
 ]

ASF GitHub Bot logged work on BEAM-6191:


Author: ASF GitHub Bot
Created on: 14/Dec/18 17:43
Start Date: 14/Dec/18 17:43
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #7220: [BEAM-6191] Remove 
redundant error logging for Dataflow exception handling
URL: https://github.com/apache/beam/pull/7220#issuecomment-447398897
 
 
   > Sorry! What's the notification mechanism used here so I know to watch for 
it?
   
   I get GitHub notifications in my email. Note that it might be going to your 
personal email account, depending on how you have GitHub configured. [Here's 
the docs](https://help.github.com/articles/about-notifications/).
   
   > Does that OOM message actually come through? Don't think I've ever seen 
it, but it'd sure be handy!
   
   I don't know for sure-- I'm new to this code. I would imagine it should come 
through. Some reasons it wouldn't: (a) If OOM's typically manifest from some 
other place, or (b) if when we OOM we don't flush Dataflow logs to Stackdriver 
before the VM goes down.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175501)
Time Spent: 50m  (was: 40m)

> Redundant error messages for failures in Dataflow runner
> 
>
> Key: BEAM-6191
> URL: https://issues.apache.org/jira/browse/BEAM-6191
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The Dataflow runner harness has redundant error logging from a couple 
> different components, which creates log spam and confusion when failures do 
> occur. We should dedupe redundant logs.
> From a typical user-code exception, we see at least 3 error logs from the 
> worker:
> http://screen/QZxsJOVnvt6
> "Aborting operations"
> "Uncaught exception occurred during work unit execution. This will be 
> retried."
> "Failure processing work item"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6235) Upgrade AutoValue to version 1.6.3

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6235?focusedWorklogId=175492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175492
 ]

ASF GitHub Bot logged work on BEAM-6235:


Author: ASF GitHub Bot
Created on: 14/Dec/18 17:35
Start Date: 14/Dec/18 17:35
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #7285: [BEAM-6235] Upgrade 
AutoValue to version 1.6.3
URL: https://github.com/apache/beam/pull/7285#issuecomment-447396536
 
 
   I see [FindBugs 
errors](https://scans.gradle.com/s/n53asllrml3be/failure?openFailures=WzBd=WzFd#top=0)
 in the Java pre-commit.
   
   LGTM after the failures are addressed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175492)
Time Spent: 20m  (was: 10m)

> Upgrade AutoValue to version 1.6.3
> --
>
> Key: BEAM-6235
> URL: https://issues.apache.org/jira/browse/BEAM-6235
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> AutoValue 1.6 has now the annotations in a separate artifact: 
> auto-value-annotations. This allows users to specify the annotations in 
> compile scope and the processor in an annotation processing scope, without 
> leaking the processor to a release binary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6206) Dataflow template which reads from BigQuery fails if used more than once

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6206?focusedWorklogId=175489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175489
 ]

ASF GitHub Bot logged work on BEAM-6206:


Author: ASF GitHub Bot
Created on: 14/Dec/18 17:30
Start Date: 14/Dec/18 17:30
Worklog Time Spent: 10m 
  Work Description: swegner closed pull request #7270: [BEAM-6206] Add 
CustomHttpErrors a tool to allow adding custom errors…
URL: https://github.com/apache/beam/pull/7270
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/CustomHttpErrors.java
 
b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/CustomHttpErrors.java
new file mode 100644
index ..db46d981400f
--- /dev/null
+++ 
b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/CustomHttpErrors.java
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.util;
+
+import com.google.auto.value.AutoValue;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * An optional component to use with the {@code RetryHttpRequestInitializer} 
in order to provide
+ * custom errors for failing http calls. This class allows you to specify 
custom error messages
+ * which match specific error codes and containing strings in the URL. The 
first matcher to match
+ * the request and response will be used to provide the custom error.
+ *
+ * The intended use case here is to examine one of the logs emitted by a 
failing call made by the
+ * RetryHttpRequestInitializer, and then adding a custom error message which 
matches the URL and
+ * code for it.
+ *
+ * Usage: See more in CustomHttpErrorsTest.
+ *
+ * {@code
+ * CustomHttpErrors.Builder builder = new CustomHttpErrors.Builder();
+ * builder.addErrorForCodeAndUrlContains(403,"/tables?", "Custom Error Msg");
+ * CustomHttpErrors customErrors = builder.build();
+ *
+ *
+ * RetryHttpRequestInitializer initializer = ...
+ * initializer.setCustomErrors(customErrors);
+ * }
+ *
+ * Suggestions for future enhancements to anyone upgrading this file:
+ *
+ * 
+ *   This class is left open for extension, to allow different functions 
for HttpCallMatcher and
+ *   HttpCallCustomError to match and log errors. For example, new 
functionality may include
+ *   matching an error based on the HttpResponse body. Additionally, 
extracting and logging
+ *   strings from the HttpResponse body may make useful functionality.
+ *   Add a methods to add custom errors based on inspecting the contents 
of the HttpRequest and
+ *   HttpResponse
+ *   Be sure to update the HttpRequestWrapper and HttpResponseWrapper with 
any new getters that
+ *   you may use. The wrappers were introduced to add a layer of 
indirection which could be
+ *   mocked mocked out in tests. This was unfortunately needed because 
mockito cannot mock final
+ *   classes and its non trivial to just construct HttpRequest and 
HttpResponse objects.
+ *   Making matchers composable with an AND operator may simplify 
enhancing this code, if
+ *   several different matchers are used.
+ * 
+ *
+ * 
+ */
+public class CustomHttpErrors {
+
+  /**
+   * A simple Tuple class for creating a list of HttpResponseMatcher and 
HttpResponseCustomError to
+   * print for the responses.
+   */
+  @AutoValue
+  public abstract static class MatcherAndError implements Serializable {
+static MatcherAndError create(HttpCallMatcher matcher, HttpCallCustomError 
customError) {
+  return new AutoValue_CustomHttpErrors_MatcherAndError(matcher, 
customError);
+}
+
+public abstract HttpCallMatcher getMatcher();
+
+public abstract HttpCallCustomError getCustomError();
+  }
+
+  /** A Builder which allows building

[jira] [Work logged] (BEAM-4454) Provide automatic schema registration for AVROs

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-4454?focusedWorklogId=175470=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175470
 ]

ASF GitHub Bot logged work on BEAM-4454:


Author: ASF GitHub Bot
Created on: 14/Dec/18 17:16
Start Date: 14/Dec/18 17:16
Worklog Time Spent: 10m 
  Work Description: reuvenlax closed pull request #7267: [BEAM-4454] 
Support Avro POJO objects
URL: https://github.com/apache/beam/pull/7267
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroSpecificRecordSchema.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroRecordSchema.java
similarity index 67%
rename from 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroSpecificRecordSchema.java
rename to 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroRecordSchema.java
index d8e4bda342f8..29bf51a06a77 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroSpecificRecordSchema.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroRecordSchema.java
@@ -17,35 +17,34 @@
  */
 package org.apache.beam.sdk.schemas;
 
-import org.apache.avro.specific.SpecificRecord;
-import 
org.apache.beam.sdk.schemas.utils.AvroSpecificRecordTypeInformationFactory;
 import org.apache.beam.sdk.schemas.utils.AvroUtils;
 import org.apache.beam.sdk.values.TypeDescriptor;
 
 /**
- * A {@link SchemaProvider} for AVRO generated SpecificRecords.
+ * A {@link SchemaProvider} for AVRO generated SpecificRecords and POJOs.
  *
- * This provider infers a schema from generates SpecificRecord objects, and 
creates schemas and
- * rows that bind to the appropriate fields.
+ * This provider infers a schema from generated SpecificRecord objects, and 
creates schemas and
+ * rows that bind to the appropriate fields. This provider also infers schemas 
from Java POJO
+ * objects, creating a schema that matches that inferred by the AVRO libraries.
  */
-public class AvroSpecificRecordSchema extends GetterBasedSchemaProvider {
+public class AvroRecordSchema extends GetterBasedSchemaProvider {
   @Override
   public  Schema schemaFor(TypeDescriptor typeDescriptor) {
-return AvroUtils.getSchema((Class) 
typeDescriptor.getRawType());
+return AvroUtils.getSchema(typeDescriptor.getRawType());
   }
 
   @Override
   public FieldValueGetterFactory fieldValueGetterFactory() {
-return new AvroSpecificRecordGetterFactory();
+return AvroUtils::getGetters;
   }
 
   @Override
   public UserTypeCreatorFactory schemaTypeCreatorFactory() {
-return new AvroSpecificRecordUserTypeCreatorFactory();
+return AvroUtils::getCreator;
   }
 
   @Override
   public FieldValueTypeInformationFactory fieldValueTypeInformationFactory() {
-return new AvroSpecificRecordTypeInformationFactory();
+return AvroUtils::getFieldTypes;
   }
 }
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroSpecificRecordGetterFactory.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroSpecificRecordGetterFactory.java
deleted file mode 100644
index fcb85f4dd664..
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroSpecificRecordGetterFactory.java
+++ /dev/null
@@ -1,30 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.beam.sdk.schemas;
-
-import java.util.List;
-import org.apache.avro.specific.SpecificRecord;
-import org.apache.beam.sdk.schemas.utils.AvroUtils;
-
-/** A {@link FieldValueGetterFactory} for AVRO-generated specific records. */
-public class AvroSpecificRecordGetterFactory implements 
FieldValueGetterFactory {
-  @Override
-  public List create(Class targetClass, Schema schema) {
-return AvroUtils.getGetters((Class) targetClass, 
schema);
-  }
-}
diff --git

[jira] [Work logged] (BEAM-4454) Provide automatic schema registration for AVROs

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-4454?focusedWorklogId=175457=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175457
 ]

ASF GitHub Bot logged work on BEAM-4454:


Author: ASF GitHub Bot
Created on: 14/Dec/18 16:59
Start Date: 14/Dec/18 16:59
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #7267: [BEAM-4454] Support 
Avro POJO objects
URL: https://github.com/apache/beam/pull/7267#issuecomment-447386484
 
 
   LGTM  Great, the code became much cleaner after refactoring!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175457)
Time Spent: 11h 10m  (was: 11h)

> Provide automatic schema registration for AVROs
> ---
>
> Key: BEAM-4454
> URL: https://issues.apache.org/jira/browse/BEAM-4454
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> Need to make sure this is a compatible change



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (BEAM-6227) FlinkRunner errors if GroupByKey contains null values (streaming mode only)

2018-12-14 Thread Thomas Weise (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise resolved BEAM-6227.

Resolution: Fixed

> FlinkRunner errors if GroupByKey contains null values (streaming mode only)
> ---
>
> Key: BEAM-6227
> URL: https://issues.apache.org/jira/browse/BEAM-6227
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.9.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Apparently this passed ValidatesRunner in streaming mode although this is a 
> quite common operation:
> {noformat}
> FlinkPipelineOptions options = 
> PipelineOptionsFactory.as(FlinkPipelineOptions.class);
> options.setRunner(FlinkRunner.class);
> // force streaming mode
> options.setStreaming(true);
> Pipeline pipeline = Pipeline.create(options);
> pipeline
> .apply(GenerateSequence.from(0).to(100))
> .apply(Window.into(FixedWindows.of(Duration.millis(10
> .apply(ParDo.of(
> new DoFn>() {
>   @ProcessElement
>   public void processElement(ProcessContext pc) {
> pc.output(KV.of("hello", null));
>   }
> }
> ))
> .apply(GroupByKey.create());
> pipeline.run();
> {noformat}
> Throws:
> {noformat}
> Caused by: java.lang.RuntimeException: Error adding to bag state.
>   at 
> org.apache.beam.runners.flink.translation.wrappers.streaming.state.FlinkStateInternals$FlinkBagState.add(FlinkStateInternals.java:299)
>   at 
> org.apache.beam.runners.core.SystemReduceFn.processValue(SystemReduceFn.java:115)
>   at 
> org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:608)
>   at 
> org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349)
>   at 
> org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136)
> Caused by: java.lang.NullPointerException: You cannot add null to a ListState.
>   at 
> org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:75)
>   at 
> org.apache.flink.runtime.state.heap.HeapListState.add(HeapListState.java:89)
>   at 
> org.apache.beam.runners.flink.translation.wrappers.streaming.state.FlinkStateInternals$FlinkBagState.add(FlinkStateInternals.java:297)
>   at 
> org.apache.beam.runners.core.SystemReduceFn.processValue(SystemReduceFn.java:115)
>   at 
> org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:608)
>   at 
> org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349)
>   at 
> org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136)
>   at 
> org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn$DoFnInvoker.invokeProcessElement(Unknown
>  Source)
>   at 
> org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:275)
>   at 
> org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:240)
>   at 
> org.apache.beam.runners.core.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:80)
>   at 
> org.apache.beam.runners.flink.metrics.DoFnRunnerWithMetricsUpdate.processElement(DoFnRunnerWithMetricsUpdate.java:63)
>   at 
> org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.processElement(DoFnOperator.java:460)
>   at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:104)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Will do a follow-up for running ValidatesRunner in streaming mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6227) FlinkRunner errors if GroupByKey contains null values (streaming mode only)

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6227?focusedWorklogId=175447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175447
 ]

ASF GitHub Bot logged work on BEAM-6227:


Author: ASF GitHub Bot
Created on: 14/Dec/18 16:41
Start Date: 14/Dec/18 16:41
Worklog Time Spent: 10m 
  Work Description: tweise closed pull request #7282: [BEAM-6227] Fix 
GroupByKey with null values in Flink Runner
URL: https://github.com/apache/beam/pull/7282
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/state/FlinkStateInternals.java
 
b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/state/FlinkStateInternals.java
index 02a2ebee74d0..b2f5aede9dfd 100644
--- 
a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/state/FlinkStateInternals.java
+++ 
b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/state/FlinkStateInternals.java
@@ -17,13 +17,16 @@
  */
 package org.apache.beam.runners.flink.translation.wrappers.streaming.state;
 
+import com.google.common.base.Preconditions;
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.Iterables;
 import com.google.common.collect.Lists;
 import java.nio.ByteBuffer;
 import java.util.Collections;
 import java.util.HashMap;
+import java.util.Iterator;
 import java.util.Map;
+import javax.annotation.Nonnull;
 import org.apache.beam.runners.core.StateInternals;
 import org.apache.beam.runners.core.StateNamespace;
 import org.apache.beam.runners.core.StateTag;
@@ -31,6 +34,7 @@
 import org.apache.beam.sdk.coders.Coder;
 import org.apache.beam.sdk.coders.CoderException;
 import org.apache.beam.sdk.coders.InstantCoder;
+import org.apache.beam.sdk.coders.VoidCoder;
 import org.apache.beam.sdk.state.BagState;
 import org.apache.beam.sdk.state.CombiningState;
 import org.apache.beam.sdk.state.MapState;
@@ -49,6 +53,7 @@
 import org.apache.beam.sdk.transforms.windowing.TimestampCombiner;
 import org.apache.beam.sdk.util.CoderUtils;
 import org.apache.beam.sdk.util.CombineContextFactory;
+import org.apache.flink.api.common.state.ListState;
 import org.apache.flink.api.common.state.ListStateDescriptor;
 import org.apache.flink.api.common.state.MapStateDescriptor;
 import org.apache.flink.api.common.state.ValueStateDescriptor;
@@ -274,6 +279,7 @@ public int hashCode() {
 private final String stateId;
 private final ListStateDescriptor flinkStateDescriptor;
 private final KeyedStateBackend flinkStateBackend;
+private final boolean storesVoidValues;
 
 FlinkBagState(
 KeyedStateBackend flinkStateBackend,
@@ -284,17 +290,24 @@ public int hashCode() {
   this.namespace = namespace;
   this.stateId = stateId;
   this.flinkStateBackend = flinkStateBackend;
-
-  flinkStateDescriptor = new ListStateDescriptor<>(stateId, new 
CoderTypeSerializer<>(coder));
+  this.storesVoidValues = coder instanceof VoidCoder;
+  this.flinkStateDescriptor =
+  new ListStateDescriptor<>(stateId, new CoderTypeSerializer<>(coder));
 }
 
 @Override
 public void add(T input) {
   try {
-flinkStateBackend
-.getPartitionedState(
-namespace.stringKey(), StringSerializer.INSTANCE, 
flinkStateDescriptor)
-.add(input);
+ListState partitionedState =
+flinkStateBackend.getPartitionedState(
+namespace.stringKey(), StringSerializer.INSTANCE, 
flinkStateDescriptor);
+if (storesVoidValues) {
+  Preconditions.checkState(input == null, "Expected to a null value 
but was: %s", input);
+  // Flink does not allow storing null values
+  // If we have null values, we use the structural null value
+  input = (T) VoidCoder.of().structuralValue((Void) input);
+}
+partitionedState.add(input);
   } catch (Exception e) {
 throw new RuntimeException("Error adding to bag state.", e);
   }
@@ -306,14 +319,35 @@ public void add(T input) {
 }
 
 @Override
+@Nonnull
 public Iterable read() {
   try {
-Iterable result =
-flinkStateBackend
-.getPartitionedState(
-namespace.stringKey(), StringSerializer.INSTANCE, 
flinkStateDescriptor)
-.get();
-
+ListState partitionedState =
+flinkStateBackend.getPartitionedState(
+namespace.stringKey(), StringSerializer.INSTANCE, 
flinkStateDescriptor);
+Iterable result =

[jira] [Work logged] (BEAM-6235) Upgrade AutoValue to version 1.6.3

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6235?focusedWorklogId=175438=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175438
 ]

ASF GitHub Bot logged work on BEAM-6235:


Author: ASF GitHub Bot
Created on: 14/Dec/18 16:31
Start Date: 14/Dec/18 16:31
Worklog Time Spent: 10m 
  Work Description: iemejia opened a new pull request #7285: [BEAM-6235] 
Upgrade AutoValue to version 1.6.3
URL: https://github.com/apache/beam/pull/7285
 
 
   AutoValue 1.6 has now the annotations in a separate artifact: 
auto-value-annotations. This allows users to specify the annotations in compile 
scope and the processor in an annotation processing scope, without leaking the 
processor to a release binary.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175438)
Time Spent: 10m
Remaining Estimate: 0h

> Upgrade AutoValue to version 1.6.3
> --
>
> Key: BEAM-6235
> URL: https://issues.apache.org/jira/browse/BEAM-6235
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> AutoValue 1.6 has now the annotations in a separate artifact: 
> auto-value-annotations. This allows users to specify the annotations in 
> compile scope and the processor in an annotation processing scope, without 
> leaking the processor to a release binary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (BEAM-6235) Upgrade AutoValue to version 1.6.3

2018-12-14 Thread JIRA

Ismaël Mejía created BEAM-6235:
--

 Summary: Upgrade AutoValue to version 1.6.3
 Key: BEAM-6235
 URL: https://issues.apache.org/jira/browse/BEAM-6235
 Project: Beam
  Issue Type: Improvement
  Components: build-system
Reporter: Ismaël Mejía
Assignee: Ismaël Mejía


AutoValue 1.6 has now the annotations in a separate artifact: 
auto-value-annotations. This allows users to specify the annotations in compile 
scope and the processor in an annotation processing scope, without leaking the 
processor to a release binary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-4678) Support portable combiner lifting in Java Flink Runner

2018-12-14 Thread Maximilian Michels (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721570#comment-16721570
 ] 

Maximilian Michels commented on BEAM-4678:
--

Has been implemented for Python. AFAIK Go and Java do not supported it yet.

> Support portable combiner lifting in Java Flink Runner
> --
>
> Key: BEAM-4678
> URL: https://issues.apache.org/jira/browse/BEAM-4678
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Adjust Flink Runner to support portable combiner lifting as described in the 
> following doc:
> https://s.apache.org/beam-runner-api-combine-model



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-5723) CassandraIO is broken because of use of bad relocation of guava

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5723?focusedWorklogId=175426=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175426
 ]

ASF GitHub Bot logged work on BEAM-5723:


Author: ASF GitHub Bot
Created on: 14/Dec/18 16:07
Start Date: 14/Dec/18 16:07
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#7237: [BEAM-5723] Changed shadow plugin configuration to avoid relocating g…
URL: https://github.com/apache/beam/pull/7237#discussion_r241805762
 
 

 ##
 File path: sdks/java/io/cassandra/build.gradle
 ##
 @@ -17,7 +17,24 @@
  */
 
 apply plugin: org.apache.beam.gradle.BeamModulePlugin
-applyJavaNature()
+applyJavaNature(shadowClosure: {
+dependencies {
+include(dependency(project.library.java.guava))
+}
+// guava uses the com.google.common and com.google.thirdparty package 
namespaces
+relocate("com.google.common", 
project.getJavaRelocatedPath("com.google.common")) {
+// com.google.common is too generic, need to exclude guava-testlib
+exclude "com.google.common.collect.testing.**"
+exclude "com.google.common.escape.testing.**"
+exclude "com.google.common.testing.**"
+exclude "com.google.common.util.concurrent.testing.**"
+// don't relocate because the cassandra driver's public API uses it
 
 Review comment:
   Yes, exactly


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175426)
Time Spent: 2h 40m  (was: 2.5h)

> CassandraIO is broken because of use of bad relocation of guava
> ---
>
> Key: BEAM-5723
> URL: https://issues.apache.org/jira/browse/BEAM-5723
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra
>Affects Versions: 2.5.0, 2.6.0, 2.7.0
>Reporter: Arun sethia
>Assignee: João Cabrita
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> While using apache beam to run dataflow job to read data from BigQuery and 
> Store/Write to Cassandra with following libaries:
>  # beam-sdks-java-io-cassandra - 2.6.0
>  # beam-sdks-java-io-jdbc - 2.6.0
>  # beam-sdks-java-io-google-cloud-platform - 2.6.0
>  # beam-sdks-java-core - 2.6.0
>  # google-cloud-dataflow-java-sdk-all - 2.5.0
>  # google-api-client -1.25.0
>  
> I am getting following error at the time insert/save data to Cassandra.
> {code:java}
> [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-3472) Create a callback triggered at the end of a batch in flink runner

2018-12-14 Thread Etienne Chauchot (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721543#comment-16721543
 ] 

Etienne Chauchot commented on BEAM-3472:


Thanks [~mxm] for reviving this subject. I workarounded the absence of callback 
by regularly watching in a thread the pipeline state and do a final push and 
thread stopping.

But IMHO I still think such a callback could be useful.

 

> Create a callback triggered at the end of a batch in flink runner
> -
>
> Key: BEAM-3472
> URL: https://issues.apache.org/jira/browse/BEAM-3472
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Etienne Chauchot
>Priority: Major
>
> In the future we might add new features to the runners for which we might 
> need to do some processing at the end of a batch. Currently there is not 
> unique place (a callback) to add this processing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6234) [Flink Runner] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6234?focusedWorklogId=175411=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175411
 ]

ASF GitHub Bot logged work on BEAM-6234:


Author: ASF GitHub Bot
Created on: 14/Dec/18 15:49
Start Date: 14/Dec/18 15:49
Worklog Time Spent: 10m 
  Work Description: djhworld opened a new pull request #7284: [BEAM-6234] 
Make failOnCheckpointingErrors setting available in FlinkPipelineOptions
URL: https://github.com/apache/beam/pull/7284
 
 
   The configuration setting failOnCheckpointingErrors [1] is available in 
Flink to allow engineers to specify whether the job should fail in the event of 
checkpoint failure.
   
   This should be exposed in FlinkPipelineOptions and 
FlinkExecutionEnvironments to allow users to configure this.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue

[jira] [Updated] (BEAM-6234) [Flink Runner] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions

2018-12-14 Thread Daniel Harper (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Harper updated BEAM-6234:

Description: 
The configuration setting {{failOnCheckpointingErrors}} [1] is available in 
Flink to allow engineers to specify whether the job should fail in the event of 
checkpoint failure. 

This should be exposed in {{FlinkPipelineOptions}} and 
{{FlinkExecutionEnvironments}} to allow users to configure this.

The default for this value in Flink is true [2] 

[1] 
https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L249
[2] 
https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L73

  was:
The configuration setting {{minPauseBetweenCheckpoints}} [1] is available in 
Flink to allow a grace period when checkpoints runtime is > checkpoint 
interval. 

This should be exposed in {{FlinkPipelineOptions}} and 
{{FlinkExecutionEnvironments}} to allow users to configure this.

The default for this value in Flink is 0ms [2] 

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.5/api/java/org/apache/flink/streaming/api/environment/CheckpointConfig.html#setMinPauseBetweenCheckpoints-long-
[2] 
https://ci.apache.org/projects/flink/flink-docs-release-1.5/api/java/constant-values.html#org.apache.flink.streaming.api.environment.CheckpointConfig.DEFAULT_MIN_PAUSE_BETWEEN_CHECKPOINTS


> [Flink Runner] Make failOnCheckpointingErrors setting available in 
> FlinkPipelineOptions
> ---
>
> Key: BEAM-6234
> URL: https://issues.apache.org/jira/browse/BEAM-6234
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Daniel Harper
>Assignee: Daniel Harper
>Priority: Trivial
> Fix For: 2.8.0
>
>
> The configuration setting {{failOnCheckpointingErrors}} [1] is available in 
> Flink to allow engineers to specify whether the job should fail in the event 
> of checkpoint failure. 
> This should be exposed in {{FlinkPipelineOptions}} and 
> {{FlinkExecutionEnvironments}} to allow users to configure this.
> The default for this value in Flink is true [2] 
> [1] 
> https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L249
> [2] 
> https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L73



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (BEAM-6234) [Flink Runner] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions

2018-12-14 Thread Daniel Harper (JIRA)

Daniel Harper created BEAM-6234:
---

 Summary: [Flink Runner] Make failOnCheckpointingErrors setting 
available in FlinkPipelineOptions
 Key: BEAM-6234
 URL: https://issues.apache.org/jira/browse/BEAM-6234
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Daniel Harper
Assignee: Daniel Harper
 Fix For: 2.8.0


The configuration setting {{minPauseBetweenCheckpoints}} [1] is available in 
Flink to allow a grace period when checkpoints runtime is > checkpoint 
interval. 

This should be exposed in {{FlinkPipelineOptions}} and 
{{FlinkExecutionEnvironments}} to allow users to configure this.

The default for this value in Flink is 0ms [2] 

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.5/api/java/org/apache/flink/streaming/api/environment/CheckpointConfig.html#setMinPauseBetweenCheckpoints-long-
[2] 
https://ci.apache.org/projects/flink/flink-docs-release-1.5/api/java/constant-values.html#org.apache.flink.streaming.api.environment.CheckpointConfig.DEFAULT_MIN_PAUSE_BETWEEN_CHECKPOINTS



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6229) BigQuery returns value error while retrieving load test metrics

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6229?focusedWorklogId=175374=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175374
 ]

ASF GitHub Bot logged work on BEAM-6229:


Author: ASF GitHub Bot
Created on: 14/Dec/18 15:24
Start Date: 14/Dec/18 15:24
Worklog Time Spent: 10m 
  Work Description: lgajowy closed pull request #7283: [BEAM-6229] Fix 
LoadTestResult to store propoer timestamp and runtime
URL: https://github.com/apache/beam/pull/7283
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/ConsoleResultPublisher.java
 
b/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/ConsoleResultPublisher.java
index c9eba52ca86a..789ae51d1617 100644
--- 
a/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/ConsoleResultPublisher.java
+++ 
b/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/ConsoleResultPublisher.java
@@ -22,6 +22,6 @@
 
   static void publish(LoadTestResult result) {
 System.out.println(String.format("Total bytes: %s", 
result.getTotalBytesCount()));
-System.out.println(String.format("Total time (millis): %s", 
result.getRuntime()));
+System.out.println(String.format("Total time (sec): %s", 
result.getRuntime()));
   }
 }
diff --git 
a/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/LoadTestResult.java
 
b/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/LoadTestResult.java
index b09ce882617f..705d14eccd41 100644
--- 
a/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/LoadTestResult.java
+++ 
b/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/LoadTestResult.java
@@ -39,12 +39,12 @@ private LoadTestResult(Long timestamp, Long runtime, Long 
totalBytesCount) {
   }
 
   /** Constructs {@link LoadTestResult} from {@link PipelineResult}. */
-  static LoadTestResult create(PipelineResult result, String namespace, long 
now) {
+  static LoadTestResult create(PipelineResult result, String namespace, long 
nowInMillis) {
 MetricsReader reader = new MetricsReader(result, namespace);
 
 return new LoadTestResult(
-now,
-reader.getEndTimeMetric("runtime") - 
reader.getStartTimeMetric("runtime"),
+nowInMillis / 1000,
+(reader.getEndTimeMetric("runtime") - 
reader.getStartTimeMetric("runtime")) / 1000,
 reader.getCounterMetric("totalBytes.count"));
   }
 
@@ -61,7 +61,7 @@ public Long getTotalBytesCount() {
 return ImmutableMap.builder()
 .put("timestamp", timestamp)
 .put("runtime", runtime)
-.put("totalBytesCount", totalBytesCount)
+.put("total_bytes_count", totalBytesCount)
 .build();
   }
 }


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175374)
Time Spent: 0.5h  (was: 20m)

> BigQuery returns value error while retrieving load test metrics
> ---
>
> Key: BEAM-6229
> URL: https://issues.apache.org/jira/browse/BEAM-6229
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> GroupByKeyLoadTest run on Dataflow saves the timestamp metric is saved in the 
> wrong format:
> {code}
> Cannot return an invalid timestamp value of 154472066651564 microseconds 
> relative to the Unix epoch. The range of valid timestamp values is [0001-01-1 
> 00:00:00, -12-31 23:59:59.99]; error in writing field timestamp
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-5985) Create jenkins jobs to run the load tests for Java SDK

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5985?focusedWorklogId=175376=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175376
 ]

ASF GitHub Bot logged work on BEAM-5985:


Author: ASF GitHub Bot
Created on: 14/Dec/18 15:28
Start Date: 14/Dec/18 15:28
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #7184: [BEAM-5985] Create 
jenkins jobs to run the load tests for Java SDK
URL: https://github.com/apache/beam/pull/7184#issuecomment-447358869
 
 
   @kkucharc please rebase before running the tests again - I provided fixes 
for BigQuery publishing code (https://github.com/apache/beam/pull/7283). Thanks!
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175376)
Time Spent: 7h 50m  (was: 7h 40m)

> Create jenkins jobs to run the load tests for Java SDK
> --
>
> Key: BEAM-5985
> URL: https://issues.apache.org/jira/browse/BEAM-5985
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Kasia Kucharczyk
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> How/how often/in what cases we run those tests is yet to be decided (this is 
> part of the task)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (BEAM-3949) IOIT's setup() and teardown() db connection attempt sometimes fail resulting in test flakiness

2018-12-14 Thread Lukasz Gajowy (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy closed BEAM-3949.
---
   Resolution: Fixed
Fix Version/s: 2.6.0

> IOIT's setup() and teardown() db connection attempt sometimes fail resulting 
> in test flakiness
> --
>
> Key: BEAM-3949
> URL: https://issues.apache.org/jira/browse/BEAM-3949
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Kasia Kucharczyk
>Priority: Major
> Fix For: 2.6.0
>
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> setup() and teardown() methods sometimes have trouble connecting database in 
> Performance tests. It results in test flakiness. 
> Example logs: 
> [https://builds.apache.org/view/A-D/view/Beam/job/beam_PerformanceTests_HadoopInputFormat/65/console]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (BEAM-4691) Rename (and reorganize?) jenkins jobs

2018-12-14 Thread Lukasz Gajowy (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-4691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy resolved BEAM-4691.
-
   Resolution: Fixed
Fix Version/s: Not applicable

Done only renaming. Reorganizing to different directories does not make much 
sense if we cannot import common .groovy files and have structure only 1 level 
deep. See this discussion for more context: 
[https://github.com/apache/beam/pull/5831] 

> Rename (and reorganize?) jenkins jobs
> -
>
> Key: BEAM-4691
> URL: https://issues.apache.org/jira/browse/BEAM-4691
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Link to discussion: 
> [https://lists.apache.org/thread.html/ebe220ec1cebc73c8fb7190cf115fb9b23165fdbf950d58e05db544d@%3Cdev.beam.apache.org%3E]
> Since jobs are Groovy files their names should be CamelCase. We could also 
> place them in subdirectories instead of prefixing job names. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6229) BigQuery returns value error while retrieving load test metrics

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6229?focusedWorklogId=175373=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175373
 ]

ASF GitHub Bot logged work on BEAM-6229:


Author: ASF GitHub Bot
Created on: 14/Dec/18 15:24
Start Date: 14/Dec/18 15:24
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #7283: [BEAM-6229] Fix 
LoadTestResult to store propoer timestamp and runtime
URL: https://github.com/apache/beam/pull/7283#issuecomment-447357639
 
 
   Thanks! Merging


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175373)
Time Spent: 20m  (was: 10m)

> BigQuery returns value error while retrieving load test metrics
> ---
>
> Key: BEAM-6229
> URL: https://issues.apache.org/jira/browse/BEAM-6229
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> GroupByKeyLoadTest run on Dataflow saves the timestamp metric is saved in the 
> wrong format:
> {code}
> Cannot return an invalid timestamp value of 154472066651564 microseconds 
> relative to the Unix epoch. The range of valid timestamp values is [0001-01-1 
> 00:00:00, -12-31 23:59:59.99]; error in writing field timestamp
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-3211) Add an integration test for TextIO ReadAll transform and dynamic writes

2018-12-14 Thread Lukasz Gajowy (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721516#comment-16721516
 ] 

Lukasz Gajowy commented on BEAM-3211:
-

is this still needed or should we close this ticket? We have TextIO running for 
~15Mb

> Add an integration test for TextIO ReadAll transform and dynamic writes
> ---
>
> Key: BEAM-3211
> URL: https://issues.apache.org/jira/browse/BEAM-3211
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Lukasz Gajowy
>Priority: Major
>
> We should add a small scale version of performance test available in 
> following file run as a part of 'beam_PostCommit_Java_MavenInstall' and 
> 'beam_PostCommit_Java_ValidatesRunner*' Jenkins test suites.
> https://github.com/apache/beam/blob/master/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (BEAM-981) Not possible to directly submit a pipeline on spark cluster

2018-12-14 Thread Lukasz Gajowy (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-981:
--

Assignee: Amit Sela  (was: Lukasz Gajowy)

> Not possible to directly submit a pipeline on spark cluster
> ---
>
> Key: BEAM-981
> URL: https://issues.apache.org/jira/browse/BEAM-981
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 0.6.0
>Reporter: Jean-Baptiste Onofré
>Assignee: Amit Sela
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It's not possible to directly run a pipeline on the spark runner (for 
> instance using {{mvn exec:java}}. It fails with:
> {code}
> [appclient-register-master-threadpool-0] INFO 
> org.apache.spark.deploy.client.AppClient$ClientEndpoint - Connecting to 
> master spark://10.200.118.197:7077...
> [shuffle-client-0] ERROR org.apache.spark.network.client.TransportClient - 
> Failed to send RPC 6813731522650020739 to /10.200.118.197:7077: 
> java.lang.AbstractMethodError: 
> org.apache.spark.network.protocol.MessageWithHeader.touch(Ljava/lang/Object;)Lio/netty/util/ReferenceCounted;
> java.lang.AbstractMethodError: 
> org.apache.spark.network.protocol.MessageWithHeader.touch(Ljava/lang/Object;)Lio/netty/util/ReferenceCounted;
> at io.netty.util.ReferenceCountUtil.touch(ReferenceCountUtil.java:73)
> at 
> io.netty.channel.DefaultChannelPipeline.touch(DefaultChannelPipeline.java:107)
> at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:820)
> at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:733)
> at 
> io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:111)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:748)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:740)
> at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:826)
> at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:733)
> at 
> io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:284)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:748)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:740)
> at 
> io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:38)
> at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1101)
> at 
> io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1148)
> at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1090)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.safeExecute(SingleThreadEventExecutor.java:451)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:418)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:401)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:877)
> at java.lang.Thread.run(Thread.java:745)
> [appclient-register-master-threadpool-0] WARN 
> org.apache.spark.deploy.client.AppClient$ClientEndpoint - Failed to connect 
> to master 10.200.118.197:7077
> java.io.IOException: Failed to send RPC 6813731522650020739 to 
> /10.200.118.197:7077: java.lang.AbstractMethodError: 
> org.apache.spark.network.protocol.MessageWithHeader.touch(Ljava/lang/Object;)Lio/netty/util/ReferenceCounted;
> at 
> org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
> at 
> org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:514)
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:507)
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:486)
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:427)
> at 
> io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:129)
> at 
> io.netty.channel.AbstractChannelHandlerContext.notifyOutboundHandlerException(AbstractChannelHandlerContext.java:845)
> at 
>

[jira] [Resolved] (BEAM-3747) Performance tests flaky due to database connection problems

2018-12-14 Thread Lukasz Gajowy (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy resolved BEAM-3747.
-
   Resolution: Fixed
Fix Version/s: Not applicable

Not flaky any more thanks to solutions provided in subtasks

> Performance tests flaky due to database connection problems
> ---
>
> Key: BEAM-3747
> URL: https://issues.apache.org/jira/browse/BEAM-3747
> Project: Beam
>  Issue Type: Test
>  Components: testing
>Reporter: Chamikara Jayalath
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>
> [https://builds.apache.org/view/A-D/view/Beam/job/beam_PerformanceTests_JDBC/]
> Latest failure is 
> [https://builds.apache.org/view/A-D/view/Beam/job/beam_PerformanceTests_JDBC/262/]
> [ERROR] org.apache.beam.sdk.io.jdbc.JdbcIOIT Time elapsed: 0 s <<< ERROR!
> org.postgresql.util.PSQLException: The connection attempt failed.
> Łukasz can you take a look ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-3216) Add an integration test for HadoopInputFormatIO Read transform

2018-12-14 Thread Lukasz Gajowy (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721518#comment-16721518
 ] 

Lukasz Gajowy commented on BEAM-3216:
-

is this ticket still valid?

> Add an integration test for HadoopInputFormatIO Read transform
> --
>
> Key: BEAM-3216
> URL: https://issues.apache.org/jira/browse/BEAM-3216
> Project: Beam
>  Issue Type: Test
>  Components: io-java-hadoop
>Reporter: Chamikara Jayalath
>Assignee: Lukasz Gajowy
>Priority: Major
>
> We should add small scale integration tests for HadoopInputFormatIO that can 
> be run as a part of 'beam_PostCommit_Java_MavenInstall' and 
> 'beam_PostCommit_Java_ValidatesRunner*' Jenkins test suites.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (BEAM-1603) Enable programmatic execution of spark pipelines.

2018-12-14 Thread Lukasz Gajowy (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy closed BEAM-1603.
---
   Resolution: Fixed
Fix Version/s: Not applicable

This is possible since https://issues.apache.org/jira/browse/BEAM-3371 was done

> Enable programmatic execution of spark pipelines.
> -
>
> Key: BEAM-1603
> URL: https://issues.apache.org/jira/browse/BEAM-1603
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, testing
>Reporter: Jason Kuster
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>
> In order to enable execution of Spark Integration Tests against a cluster, it 
> is necessary to have the ability to execute Spark pipelines via maven, rather 
> than spark-submit. The minimum necessary is to enable this in the 
> TestSparkRunner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (BEAM-5037) HashFunction is not intialized in SyntheticOptions

2018-12-14 Thread Lukasz Gajowy (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy closed BEAM-5037.
---
   Resolution: Fixed
Fix Version/s: 2.7.0

> HashFunction is not intialized in SyntheticOptions
> --
>
> Key: BEAM-5037
> URL: https://issues.apache.org/jira/browse/BEAM-5037
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: 2.7.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is due to fact that the field is transient hence not getting serialized 
> and then initialized again after deserialization. We need some other way of 
> initializing it, immune to the field's transiency.
> Stacktrace:
> {code:java}
> Class org.apache.beam.sdk.io.synthetic.GroupByKeyLoadIT
> all > org.apache.beam.sdk.io.synthetic > GroupByKeyLoadIT
> 1
> tests
> 1
> failures
> 0
> ignored
> 0.050s
> duration
> 0%
> successful
> Failed tests
> Tests
> Standard error
> groupByKeyLoadTest
> java.lang.IllegalArgumentException: hashFunction hasn't been initialized.
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
>   at 
> org.apache.beam.sdk.io.synthetic.SyntheticOptions.validate(SyntheticOptions.java:301)
>   at 
> org.apache.beam.sdk.io.synthetic.SyntheticBoundedIO$SyntheticSourceOptions.validate(SyntheticBoundedIO.java:285)
>   at 
> org.apache.beam.sdk.io.synthetic.SyntheticBoundedIO$SyntheticBoundedSource.validate(SyntheticBoundedIO.java:119)
>   at org.apache.beam.sdk.io.Read$Bounded.expand(Read.java:95)
>   at org.apache.beam.sdk.io.Read$Bounded.expand(Read.java:85)
>   at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
>   at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:471)
>   at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:44)
>   at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:167)
>   at 
> org.apache.beam.sdk.io.synthetic.GroupByKeyLoadIT.groupByKeyLoadTest(GroupByKeyLoadIT.java:81)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
>

[jira] [Resolved] (BEAM-6076) NEXMark flakiness: NPE thrown from BigQuery client library once in a while

2018-12-14 Thread Lukasz Gajowy (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy resolved BEAM-6076.
-
   Resolution: Fixed
Fix Version/s: 2.9.0

> NEXMark flakiness: NPE thrown from BigQuery client library once in a while
> --
>
> Key: BEAM-6076
> URL: https://issues.apache.org/jira/browse/BEAM-6076
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: 2.9.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> It seems that once in a while the library that is used to connect to BigQuery 
> throws exceptions like this: 
> {code:java}
> Exception in thread "main"
> java.lang.NullPointerException
> at 
> com.google.cloud.bigquery.StandardTableDefinition$StreamingBuffer.fromPb(StandardTableDefinition.java:116)
> at 
> com.google.cloud.bigquery.StandardTableDefinition.fromPb(StandardTableDefinition.java:225)
> at com.google.cloud.bigquery.TableDefinition.fromPb(TableDefinition.java:155)
> at com.google.cloud.bigquery.TableInfo$BuilderImpl.(TableInfo.java:183)
> at com.google.cloud.bigquery.Table.fromPb(Table.java:593)
> at com.google.cloud.bigquery.BigQueryImpl.getTable(BigQueryImpl.java:410)
> at 
> org.apache.beam.sdk.testutils.publishing.BigQueryClient.createTableIfNotExists(BigQueryClient.java:74)
> at org.apache.beam.sdk.nexmark.Main.savePerfsToBigQuery(Main.java:184)
> at org.apache.beam.sdk.nexmark.Main.runAll(Main.java:148)
> at org.apache.beam.sdk.nexmark.Main.runAll(Main.java:98)
> at org.apache.beam.sdk.nexmark.Main.main(Main.java:423){code}
> Similar error found in the network:  
> [https://github.com/googleapis/google-cloud-java/issues/1689]
> +Example logs:+ 
>  
> [https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_Nexmark_Spark/1085/console]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-5723) CassandraIO is broken because of use of bad relocation of guava

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5723?focusedWorklogId=175359=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175359
 ]

ASF GitHub Bot logged work on BEAM-5723:


Author: ASF GitHub Bot
Created on: 14/Dec/18 15:00
Start Date: 14/Dec/18 15:00
Worklog Time Spent: 10m 
  Work Description: iemejia commented on a change in pull request #7237: 
[BEAM-5723] Changed shadow plugin configuration to avoid relocating g…
URL: https://github.com/apache/beam/pull/7237#discussion_r241782155
 
 

 ##
 File path: sdks/java/io/cassandra/build.gradle
 ##
 @@ -17,7 +17,24 @@
  */
 
 apply plugin: org.apache.beam.gradle.BeamModulePlugin
-applyJavaNature()
+applyJavaNature(shadowClosure: {
 
 Review comment:
   This seems like the most restricted (and I would say proper way), my doubt 
is if there are some some other uses of Cassandra's guava that we could be 
missing and that can bite us later.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175359)
Time Spent: 2.5h  (was: 2h 20m)

> CassandraIO is broken because of use of bad relocation of guava
> ---
>
> Key: BEAM-5723
> URL: https://issues.apache.org/jira/browse/BEAM-5723
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra
>Affects Versions: 2.5.0, 2.6.0, 2.7.0
>Reporter: Arun sethia
>Assignee: João Cabrita
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> While using apache beam to run dataflow job to read data from BigQuery and 
> Store/Write to Cassandra with following libaries:
>  # beam-sdks-java-io-cassandra - 2.6.0
>  # beam-sdks-java-io-jdbc - 2.6.0
>  # beam-sdks-java-io-google-cloud-platform - 2.6.0
>  # beam-sdks-java-core - 2.6.0
>  # google-cloud-dataflow-java-sdk-all - 2.5.0
>  # google-api-client -1.25.0
>  
> I am getting following error at the time insert/save data to Cassandra.
> {code:java}
> [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work started] (BEAM-6229) BigQuery returns value error while retrieving load test metrics

2018-12-14 Thread Lukasz Gajowy (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-6229 started by Lukasz Gajowy.
---
> BigQuery returns value error while retrieving load test metrics
> ---
>
> Key: BEAM-6229
> URL: https://issues.apache.org/jira/browse/BEAM-6229
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> GroupByKeyLoadTest run on Dataflow saves the timestamp metric is saved in the 
> wrong format:
> {code}
> Cannot return an invalid timestamp value of 154472066651564 microseconds 
> relative to the Unix epoch. The range of valid timestamp values is [0001-01-1 
> 00:00:00, -12-31 23:59:59.99]; error in writing field timestamp
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-5723) CassandraIO is broken because of use of bad relocation of guava

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5723?focusedWorklogId=175357=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175357
 ]

ASF GitHub Bot logged work on BEAM-5723:


Author: ASF GitHub Bot
Created on: 14/Dec/18 14:55
Start Date: 14/Dec/18 14:55
Worklog Time Spent: 10m 
  Work Description: kewne commented on a change in pull request #7237: 
[BEAM-5723] Changed shadow plugin configuration to avoid relocating g…
URL: https://github.com/apache/beam/pull/7237#discussion_r241780606
 
 

 ##
 File path: sdks/java/io/cassandra/build.gradle
 ##
 @@ -17,7 +17,24 @@
  */
 
 apply plugin: org.apache.beam.gradle.BeamModulePlugin
-applyJavaNature()
+applyJavaNature(shadowClosure: {
 
 Review comment:
   I've actually gone in the opposite direction and replaced this with a 
blanket "exclude everything", since guava is now provided by the beam vendored 
version (except where it needs to use `ListenableFuture` from the driver, which 
comes from regular guava).
   This seems acceptable to me because the default rule is "include guava" (and 
exclude everything else) but let me know if you disagree.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175357)
Time Spent: 2h 20m  (was: 2h 10m)

> CassandraIO is broken because of use of bad relocation of guava
> ---
>
> Key: BEAM-5723
> URL: https://issues.apache.org/jira/browse/BEAM-5723
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra
>Affects Versions: 2.5.0, 2.6.0, 2.7.0
>Reporter: Arun sethia
>Assignee: João Cabrita
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> While using apache beam to run dataflow job to read data from BigQuery and 
> Store/Write to Cassandra with following libaries:
>  # beam-sdks-java-io-cassandra - 2.6.0
>  # beam-sdks-java-io-jdbc - 2.6.0
>  # beam-sdks-java-io-google-cloud-platform - 2.6.0
>  # beam-sdks-java-core - 2.6.0
>  # google-cloud-dataflow-java-sdk-all - 2.5.0
>  # google-api-client -1.25.0
>  
> I am getting following error at the time insert/save data to Cassandra.
> {code:java}
> [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6229) BigQuery returns value error while retrieving load test metrics

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6229?focusedWorklogId=175356=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175356
 ]

ASF GitHub Bot logged work on BEAM-6229:


Author: ASF GitHub Bot
Created on: 14/Dec/18 14:47
Start Date: 14/Dec/18 14:47
Worklog Time Spent: 10m 
  Work Description: lgajowy opened a new pull request #7283: [BEAM-6229] 
Fix LoadTestResult to store propoer timestamp and runtime
URL: https://github.com/apache/beam/pull/7283
 
 
   Bugfixing and changing total runtime to seconds to be concise with python 
suites (and we don't need greater resolution here) 
   
   R: @kkucharc 
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175356)
Time Spent: 10m
Remaining Estimate: 0h

> BigQuery returns value

[jira] [Closed] (BEAM-6187) Drop Scala suffix of FlinkRunner artifacts

2018-12-14 Thread Maximilian Michels (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed BEAM-6187.

Resolution: Fixed

The suffix has been dropped as part of upgrading to 1.6 with BEAM-5267. The 
suffix will be removed from the 1.5 version as soon as support for it is 
dropped.

> Drop Scala suffix of FlinkRunner artifacts
> --
>
> Key: BEAM-6187
> URL: https://issues.apache.org/jira/browse/BEAM-6187
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.10.0
>
>
> With BEAM-5419 we will build multiple versions of the Flink Runner against 
> different Flink versions. The new artifacts will lead to confusing names like 
> {{beam-runners-flink1.5_2.11}}. I think it is time to drop the Scala suffix 
> and just build against the most stable Flink Scala version. 
> Projects like Scio have the option to cross-compile to different Scala 
> versions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6227) FlinkRunner errors if GroupByKey contains null values (streaming mode only)

2018-12-14 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6227?focusedWorklogId=175352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175352
 ]

ASF GitHub Bot logged work on BEAM-6227:


Author: ASF GitHub Bot
Created on: 14/Dec/18 14:32
Start Date: 14/Dec/18 14:32
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #7282: [BEAM-6227] Fix 
GroupByKey with null values in Flink Runner
URL: https://github.com/apache/beam/pull/7282#issuecomment-447342445
 
 
   Run Portable_Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 175352)
Time Spent: 20m  (was: 10m)

> FlinkRunner errors if GroupByKey contains null values (streaming mode only)
> ---
>
> Key: BEAM-6227
> URL: https://issues.apache.org/jira/browse/BEAM-6227
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.9.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Apparently this passed ValidatesRunner in streaming mode although this is a 
> quite common operation:
> {noformat}
> FlinkPipelineOptions options = 
> PipelineOptionsFactory.as(FlinkPipelineOptions.class);
> options.setRunner(FlinkRunner.class);
> // force streaming mode
> options.setStreaming(true);
> Pipeline pipeline = Pipeline.create(options);
> pipeline
> .apply(GenerateSequence.from(0).to(100))
> .apply(Window.into(FixedWindows.of(Duration.millis(10
> .apply(ParDo.of(
> new DoFn>() {
>   @ProcessElement
>   public void processElement(ProcessContext pc) {
> pc.output(KV.of("hello", null));
>   }
> }
> ))
> .apply(GroupByKey.create());
> pipeline.run();
> {noformat}
> Throws:
> {noformat}
> Caused by: java.lang.RuntimeException: Error adding to bag state.
>   at 
> org.apache.beam.runners.flink.translation.wrappers.streaming.state.FlinkStateInternals$FlinkBagState.add(FlinkStateInternals.java:299)
>   at 
> org.apache.beam.runners.core.SystemReduceFn.processValue(SystemReduceFn.java:115)
>   at 
> org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:608)
>   at 
> org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349)
>   at 
> org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136)
> Caused by: java.lang.NullPointerException: You cannot add null to a ListState.
>   at 
> org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:75)
>   at 
> org.apache.flink.runtime.state.heap.HeapListState.add(HeapListState.java:89)
>   at 
> org.apache.beam.runners.flink.translation.wrappers.streaming.state.FlinkStateInternals$FlinkBagState.add(FlinkStateInternals.java:297)
>   at 
> org.apache.beam.runners.core.SystemReduceFn.processValue(SystemReduceFn.java:115)
>   at 
> org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:608)
>   at 
> org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349)
>   at 
> org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136)
>   at 
> org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn$DoFnInvoker.invokeProcessElement(Unknown
>  Source)
>   at 
> org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:275)
>   at 
> org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:240)
>   at 
> org.apache.beam.runners.core.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:80)
>   at 
> org.apache.beam.runners.flink.metrics.DoFnRunnerWithMetricsUpdate.processElement(DoFnRunnerWithMetricsUpdate.java:63)
>   at 
> org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.processElement(DoFnOperator.java:460)
>   at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:104)
>   at 
>

[jira] [Commented] (BEAM-3472) Create a callback triggered at the end of a batch in flink runner

2018-12-14 Thread Maximilian Michels (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721464#comment-16721464
 ] 

Maximilian Michels commented on BEAM-3472:
--

It this still relevant?

> Create a callback triggered at the end of a batch in flink runner
> -
>
> Key: BEAM-3472
> URL: https://issues.apache.org/jira/browse/BEAM-3472
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Etienne Chauchot
>Priority: Major
>
> In the future we might add new features to the runners for which we might 
> need to do some processing at the end of a batch. Currently there is not 
> unique place (a callback) to add this processing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-6218) TensorFlow Model Analysis Fails when using the portable Flink runner

2018-12-14 Thread Maximilian Michels (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-6218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721459#comment-16721459
 ] 

Maximilian Michels commented on BEAM-6218:
--

This looks like an error from the Python SDK harness.

> TensorFlow Model Analysis Fails when using the portable Flink runner
> 
>
> Key: BEAM-6218
> URL: https://issues.apache.org/jira/browse/BEAM-6218
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-py-harness
>Reporter: Andrew Packer
>Priority: Major
>
> Running a simple model analysis pipeline, trying to use the portable flink 
> runner running against a local cluster:
> {code:python}
> import apache_beam as beam
> import tensorflow_model_analysis as tfma
> from apache_beam.options.pipeline_options import PipelineOptions
> from apache_beam.runners.portability import portable_runner
> def pipeline(root):
>   data_location = './dataset/'
>   data = root | 'ReadData' >> beam.io.ReadFromTFRecord(data_location)
>   results = data | 'ExtractEvaluateAndWriteResults' >> 
> tfma.EvaluateAndWriteResults(
>   eval_saved_model_path='./model/15427633886/',
>   output_path='./output/',
>   display_only_data_location=data_location)
> def run(argv=None):
>   runner = portable_runner.PortableRunner()
>   pipeline_options = 
> PipelineOptions(experiments=['beam_fn_api'],sdk_location='container',job_endpoint='localhost:8099',setup_file='./setup.py')
>   runner.run(pipeline, pipeline_options)
> if __name__ == '__main__':
>   run()
> {code}
> Versions:
> Apache Beam 2.8.0
> TensorFlow Model Analysis: 0.9.2
> Apache Flink: 1.5.3
>  
> Stack Trace:
> {code}
> [flink-runner-job-server] ERROR 
> org.apache.beam.runners.flink.FlinkJobInvocation - Error during job 
> invocation 
> BeamApp-apacker-1212082216-2dd571ba_359d85b7-4e08-49f3-bdc7-34cdb0e779bf.
> org.apache.flink.client.program.ProgramInvocationException: Job 
> 22e7e9d229977f3f0518c37f507f5e07 failed.
> at 
> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:265)
> at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464)
> at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:452)
> at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
> at 
> org.apache.flink.client.RemoteExecutor.executePlanWithJars(RemoteExecutor.java:216)
> at 
> org.apache.flink.client.RemoteExecutor.executePlan(RemoteExecutor.java:193)
> at 
> org.apache.flink.api.java.RemoteEnvironment.execute(RemoteEnvironment.java:173)
> at 
> org.apache.beam.runners.flink.FlinkJobInvocation.runPipeline(FlinkJobInvocation.java:121)
> at 
> org.apache.beam.repackaged.beam_runners_flink_2.11.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
> at 
> org.apache.beam.repackaged.beam_runners_flink_2.11.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
> at 
> org.apache.beam.repackaged.beam_runners_flink_2.11.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job 
> completed with illegal application status: UNKNOWN.
> at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:150)
> at 
> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262)
> ... 13 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: Error received from SDK harness for instruction 
> 22: Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 131, in _execute
> response = task()
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 166, in 
> self._execute(lambda: worker.do_instruction(work), work)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 212, in do_instruction
> request.instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 231, in process_bundle
> self.data_channel_factory)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 343, in __init__
>

[jira] [Updated] (BEAM-6218) TensorFlow Model Analysis Fails when using the portable Flink runner

2018-12-14 Thread Maximilian Michels (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-6218:
-
Component/s: sdk-py-harness

> TensorFlow Model Analysis Fails when using the portable Flink runner
> 
>
> Key: BEAM-6218
> URL: https://issues.apache.org/jira/browse/BEAM-6218
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-py-harness
>Reporter: Andrew Packer
>Priority: Major
>
> Running a simple model analysis pipeline, trying to use the portable flink 
> runner running against a local cluster:
> {code:python}
> import apache_beam as beam
> import tensorflow_model_analysis as tfma
> from apache_beam.options.pipeline_options import PipelineOptions
> from apache_beam.runners.portability import portable_runner
> def pipeline(root):
>   data_location = './dataset/'
>   data = root | 'ReadData' >> beam.io.ReadFromTFRecord(data_location)
>   results = data | 'ExtractEvaluateAndWriteResults' >> 
> tfma.EvaluateAndWriteResults(
>   eval_saved_model_path='./model/15427633886/',
>   output_path='./output/',
>   display_only_data_location=data_location)
> def run(argv=None):
>   runner = portable_runner.PortableRunner()
>   pipeline_options = 
> PipelineOptions(experiments=['beam_fn_api'],sdk_location='container',job_endpoint='localhost:8099',setup_file='./setup.py')
>   runner.run(pipeline, pipeline_options)
> if __name__ == '__main__':
>   run()
> {code}
> Versions:
> Apache Beam 2.8.0
> TensorFlow Model Analysis: 0.9.2
> Apache Flink: 1.5.3
>  
> Stack Trace:
> {code}
> [flink-runner-job-server] ERROR 
> org.apache.beam.runners.flink.FlinkJobInvocation - Error during job 
> invocation 
> BeamApp-apacker-1212082216-2dd571ba_359d85b7-4e08-49f3-bdc7-34cdb0e779bf.
> org.apache.flink.client.program.ProgramInvocationException: Job 
> 22e7e9d229977f3f0518c37f507f5e07 failed.
> at 
> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:265)
> at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464)
> at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:452)
> at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
> at 
> org.apache.flink.client.RemoteExecutor.executePlanWithJars(RemoteExecutor.java:216)
> at 
> org.apache.flink.client.RemoteExecutor.executePlan(RemoteExecutor.java:193)
> at 
> org.apache.flink.api.java.RemoteEnvironment.execute(RemoteEnvironment.java:173)
> at 
> org.apache.beam.runners.flink.FlinkJobInvocation.runPipeline(FlinkJobInvocation.java:121)
> at 
> org.apache.beam.repackaged.beam_runners_flink_2.11.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
> at 
> org.apache.beam.repackaged.beam_runners_flink_2.11.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
> at 
> org.apache.beam.repackaged.beam_runners_flink_2.11.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job 
> completed with illegal application status: UNKNOWN.
> at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:150)
> at 
> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262)
> ... 13 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: Error received from SDK harness for instruction 
> 22: Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 131, in _execute
> response = task()
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 166, in 
> self._execute(lambda: worker.do_instruction(work), work)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 212, in do_instruction
> request.instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 231, in process_bundle
> self.data_channel_factory)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 343, in __init__
> self.ops =

1 2 >

1 - 100 of 137 matches

Mail list logo