[jira] [Commented] (BEAM-3201) ElasticsearchIO should allow the user to optionally pass id, type and index per document

2018-03-21 Thread Chet Aldrich (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409079#comment-16409079
 ] 

Chet Aldrich commented on BEAM-3201:


Hey all, sorry I kinda vanished, just been really busy. I'll get back on this. 
I'll open a PR as is to start and we can go from there. 

> ElasticsearchIO should allow the user to optionally pass id, type and index 
> per document
> 
>
> Key: BEAM-3201
> URL: https://issues.apache.org/jira/browse/BEAM-3201
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Etienne Chauchot
>Assignee: Chet Aldrich
>Priority: Major
>
> *Dynamic documents id*: Today the ESIO only inserts the payload of the ES 
> documents. Elasticsearch generates a document id for each record inserted. So 
> each new insertion is considered as a new document. Users want to be able to 
> update documents using the IO. So, for the write part of the IO, users should 
> be able to provide a document id so that they could update already stored 
> documents. Providing an id for the documents could also help the user on 
> indempotency.
> *Dynamic ES type and ES index*: In some cases (streaming pipeline with high 
> throughput) partitioning the PCollection to allow to plug to different ESIO 
> instances (pointing to different index/type) is not very practical, the users 
> would like to be able to set ES index/type per document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3339) Create post-release testing of the nightly snapshots

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3339?focusedWorklogId=83071=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83071
 ]

ASF GitHub Bot logged work on BEAM-3339:


Author: ASF GitHub Bot
Created on: 22/Mar/18 04:56
Start Date: 22/Mar/18 04:56
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4788: [BEAM-3339] Mobile 
gaming automation for Java nightly snapshot on core runners
URL: https://github.com/apache/beam/pull/4788#issuecomment-375179867
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83071)
Time Spent: 72h 50m  (was: 72h 40m)

> Create post-release testing of the nightly snapshots
> 
>
> Key: BEAM-3339
> URL: https://issues.apache.org/jira/browse/BEAM-3339
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Alan Myrvold
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 72h 50m
>  Remaining Estimate: 0h
>
> The nightly java snapshots in 
> https://repository.apache.org/content/groups/snapshots/org/apache/beam should 
> be verified by following the 
> https://beam.apache.org/get-started/quickstart-java/ instructions, to verify 
> that the release is usable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3738) Enable Py3 linting in Jenkins

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3738?focusedWorklogId=83064=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83064
 ]

ASF GitHub Bot logged work on BEAM-3738:


Author: ASF GitHub Bot
Created on: 22/Mar/18 04:26
Start Date: 22/Mar/18 04:26
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #4877: 
[BEAM-3738] Enable py3 lint and cleanup tox.ini.
URL: https://github.com/apache/beam/pull/4877#discussion_r176305597
 
 

 ##
 File path: sdks/python/tox.ini
 ##
 @@ -17,7 +17,8 @@
 
 [tox]
 # new environments will be excluded by default unless explicitly added to 
envlist.
-envlist = py27,py27-{gcp,cython,lint},py3-lint,docs
+#envlist = py27,py27-{gcp,cython,lint},py3-lint,docs
+envlist = py27-cython{,2,3}
 
 Review comment:
   left over comment?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83064)
Time Spent: 8h 10m  (was: 8h)

> Enable Py3 linting in Jenkins
> -
>
> Key: BEAM-3738
> URL: https://issues.apache.org/jira/browse/BEAM-3738
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: holdenk
>Assignee: Ahmet Altay
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> After BEAM-3671 is finished enable linting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3738) Enable Py3 linting in Jenkins

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3738?focusedWorklogId=83063=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83063
 ]

ASF GitHub Bot logged work on BEAM-3738:


Author: ASF GitHub Bot
Created on: 22/Mar/18 04:26
Start Date: 22/Mar/18 04:26
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #4877: 
[BEAM-3738] Enable py3 lint and cleanup tox.ini.
URL: https://github.com/apache/beam/pull/4877#discussion_r176305583
 
 

 ##
 File path: sdks/python/tox.ini
 ##
 @@ -27,44 +28,78 @@ select = E3
 
 # Shared environment options.
 [testenv]
-# Set [] options for pip install, e.g., pip install apache-beam[test].
+# Set [] options for pip installation of apache-beam tarball.
 extras = test
-# Always recreate the virtual environment.
-recreate = True
-# Pass these environment variables to the test environment.
-passenv = TRAVIS*
 # Don't warn that these commands aren't installed.
 whitelist_externals =
   find
   time
 
 [testenv:py27]
 commands =
+  python --version
+  pip --version
   {toxinidir}/run_tox_cleanup.sh
   python apache_beam/examples/complete/autocomplete_test.py
-  python setup.py test
+  #python setup.py test
   {toxinidir}/run_tox_cleanup.sh
 
 [testenv:py27-cython]
 
 Review comment:
   What is py27-cython, py27-cython2, py27-cython3 ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83063)
Time Spent: 8h  (was: 7h 50m)

> Enable Py3 linting in Jenkins
> -
>
> Key: BEAM-3738
> URL: https://issues.apache.org/jira/browse/BEAM-3738
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: holdenk
>Assignee: Ahmet Altay
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> After BEAM-3671 is finished enable linting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3738) Enable Py3 linting in Jenkins

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3738?focusedWorklogId=83062=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83062
 ]

ASF GitHub Bot logged work on BEAM-3738:


Author: ASF GitHub Bot
Created on: 22/Mar/18 04:26
Start Date: 22/Mar/18 04:26
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #4877: 
[BEAM-3738] Enable py3 lint and cleanup tox.ini.
URL: https://github.com/apache/beam/pull/4877#discussion_r176305503
 
 

 ##
 File path: sdks/python/tox.ini
 ##
 @@ -27,44 +28,78 @@ select = E3
 
 # Shared environment options.
 [testenv]
-# Set [] options for pip install, e.g., pip install apache-beam[test].
+# Set [] options for pip installation of apache-beam tarball.
 extras = test
-# Always recreate the virtual environment.
-recreate = True
-# Pass these environment variables to the test environment.
-passenv = TRAVIS*
 # Don't warn that these commands aren't installed.
 whitelist_externals =
   find
   time
 
 [testenv:py27]
 commands =
+  python --version
+  pip --version
   {toxinidir}/run_tox_cleanup.sh
   python apache_beam/examples/complete/autocomplete_test.py
-  python setup.py test
+  #python setup.py test
   {toxinidir}/run_tox_cleanup.sh
 
 [testenv:py27-cython]
+deps =
+  nose==1.3.7
+  grpcio-tools==1.3.5
+  cython==0.25.2
+commands =
+  python --version
+  pip --version
+  time {toxinidir}/run_pylint.sh
+
+[testenv:py27-cython2]
+# cython tests are only expected to work in linux (2.x and 3.x)
+# If we want to add other platforms in the future, it should be:
+# `platform = linux2|darwin|...`
+# See https://docs.python.org/2/library/sys.html#sys.platform for platform 
codes
+platform = linux2
+deps =
+  nose==1.3.7
+  grpcio-tools==1.3.5
+  cython==0.25.2
+commands =
+  python --version
+  pip --version
+  {toxinidir}/run_tox_cleanup.sh
+  python apache_beam/examples/complete/autocomplete_test.py
+  #python setup.py test
+  {toxinidir}/run_tox_cleanup.sh
+# TODO: remove if unneeded
+passenv = TRAVIS*
+
+[testenv:py27-cython3]
 # cython tests are only expected to work in linux (2.x and 3.x)
 # If we want to add other platforms in the future, it should be:
 # `platform = linux2|darwin|...`
 # See https://docs.python.org/2/library/sys.html#sys.platform for platform 
codes
 platform = linux2
 deps =
-  cython==0.26.1
+  cython==0.25.2
 commands =
+  python --version
+  pip --version
   {toxinidir}/run_tox_cleanup.sh
   python apache_beam/examples/complete/autocomplete_test.py
-  python setup.py test
+  #python setup.py test
   {toxinidir}/run_tox_cleanup.sh
+# TODO: remove if unneeded
+passenv = TRAVIS*
 
 [testenv:py27-gcp]
 extras = test,gcp
 commands =
+  python --version
+  pip --version
   {toxinidir}/run_tox_cleanup.sh
   python apache_beam/examples/complete/autocomplete_test.py
-  python setup.py test
+  #python setup.py test
 
 Review comment:
   left over?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83062)
Time Spent: 8h  (was: 7h 50m)

> Enable Py3 linting in Jenkins
> -
>
> Key: BEAM-3738
> URL: https://issues.apache.org/jira/browse/BEAM-3738
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: holdenk
>Assignee: Ahmet Altay
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> After BEAM-3671 is finished enable linting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3738) Enable Py3 linting in Jenkins

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3738?focusedWorklogId=83061=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83061
 ]

ASF GitHub Bot logged work on BEAM-3738:


Author: ASF GitHub Bot
Created on: 22/Mar/18 04:21
Start Date: 22/Mar/18 04:21
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #4877: 
[BEAM-3738] Enable py3 lint and cleanup tox.ini.
URL: https://github.com/apache/beam/pull/4877#discussion_r176305244
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -113,11 +113,8 @@ def get_version():
 'futures>=3.1.1,<4.0.0',
 ]
 
-REQUIRED_SETUP_PACKAGES = [
-'nose>=1.0',
 
 Review comment:
   Is there a reason for us to raise the required version?
   
   > Alternatively, should the process do pip install --upgrade pip?
   Which process? As far as I know, virtualenv brings a new version of pip with 
it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83061)
Time Spent: 7h 50m  (was: 7h 40m)

> Enable Py3 linting in Jenkins
> -
>
> Key: BEAM-3738
> URL: https://issues.apache.org/jira/browse/BEAM-3738
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: holdenk
>Assignee: Ahmet Altay
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> After BEAM-3671 is finished enable linting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3861) Build test infra for end-to-end streaming test in Python SDK

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3861?focusedWorklogId=83058=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83058
 ]

ASF GitHub Bot logged work on BEAM-3861:


Author: ASF GitHub Bot
Created on: 22/Mar/18 04:19
Start Date: 22/Mar/18 04:19
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #4930: 
[BEAM-3861] Complete streaming wordcount test in Python SDK
URL: https://github.com/apache/beam/pull/4930#discussion_r176304644
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/tests/pubsub_matcher.py
 ##
 @@ -0,0 +1,108 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PubSub verifier used for end-to-end test."""
+
+import logging
+import time
+from collections import Counter
+
+from hamcrest.core.base_matcher import BaseMatcher
+
+__all__ = ['PubSubMessageMatcher']
+
+
+# Protect against environments where pubsub library is not available.
+# pylint: disable=wrong-import-order, wrong-import-position
+try:
+  from google.cloud import pubsub
+except ImportError:
+  pubsub = None
+# pylint: enable=wrong-import-order, wrong-import-position
+
+DEFAULT_TIMEOUT = 5 * 60
+MAX_MESSAGES_IN_ONE_PULL = 50
+
+
+class PubSubMessageMatcher(BaseMatcher):
+  """Matcher that verifies messages from given subscription.
+
+  This matcher can block the test and keep pulling messages from given
+  subscription until all expected messages are shown or timeout.
+  """
+
+  def __init__(self, project, sub_name, expected_msg, timeout=DEFAULT_TIMEOUT):
+"""Initialize PubSubMessageMatcher object.
+
+Args:
+  project: A name string of project.
+  sub_name: A name string of subscription which is attached to output.
+  expected_msg: A string list that contains expected message data pulled
+from the subscription.
+  timeout: Timeout in seconds to wait for all expected messages appears.
+"""
+if pubsub is None:
+  raise ImportError(
+  'PubSub dependencies are not installed.')
+if not project:
+  raise ValueError('Invalid project %s.' % project)
+if not sub_name:
+  raise ValueError('Invalid subscription %s.' % sub_name)
+if not expected_msg or not isinstance(expected_msg, list):
+  raise ValueError('Invalid expected messages %s.' % expected_msg)
+
+self.project = project
+self.sub_name = sub_name
+self.expected_msg = expected_msg
+self.timeout = timeout
+self.messages = None
+
+  def _matches(self, _):
+if not self.messages:
+  subscription = (pubsub
+  .Client(project=self.project)
+  .subscription(self.sub_name))
+  self.messages = self._wait_for_messages(subscription,
 
 Review comment:
   Should this return `False` in case of a timeout, instead of raising an 
exception?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83058)

> Build test infra for end-to-end streaming test in Python SDK
> 
>
> Key: BEAM-3861
> URL: https://issues.apache.org/jira/browse/BEAM-3861
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3861) Build test infra for end-to-end streaming test in Python SDK

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3861?focusedWorklogId=83057=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83057
 ]

ASF GitHub Bot logged work on BEAM-3861:


Author: ASF GitHub Bot
Created on: 22/Mar/18 04:19
Start Date: 22/Mar/18 04:19
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #4930: 
[BEAM-3861] Complete streaming wordcount test in Python SDK
URL: https://github.com/apache/beam/pull/4930#discussion_r176304742
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/tests/pubsub_matcher.py
 ##
 @@ -0,0 +1,108 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PubSub verifier used for end-to-end test."""
+
+import logging
+import time
+from collections import Counter
+
+from hamcrest.core.base_matcher import BaseMatcher
+
+__all__ = ['PubSubMessageMatcher']
+
+
+# Protect against environments where pubsub library is not available.
+# pylint: disable=wrong-import-order, wrong-import-position
+try:
+  from google.cloud import pubsub
+except ImportError:
+  pubsub = None
+# pylint: enable=wrong-import-order, wrong-import-position
+
+DEFAULT_TIMEOUT = 5 * 60
+MAX_MESSAGES_IN_ONE_PULL = 50
+
+
+class PubSubMessageMatcher(BaseMatcher):
+  """Matcher that verifies messages from given subscription.
+
+  This matcher can block the test and keep pulling messages from given
+  subscription until all expected messages are shown or timeout.
+  """
+
+  def __init__(self, project, sub_name, expected_msg, timeout=DEFAULT_TIMEOUT):
+"""Initialize PubSubMessageMatcher object.
+
+Args:
+  project: A name string of project.
+  sub_name: A name string of subscription which is attached to output.
+  expected_msg: A string list that contains expected message data pulled
+from the subscription.
+  timeout: Timeout in seconds to wait for all expected messages appears.
+"""
+if pubsub is None:
+  raise ImportError(
+  'PubSub dependencies are not installed.')
+if not project:
+  raise ValueError('Invalid project %s.' % project)
+if not sub_name:
+  raise ValueError('Invalid subscription %s.' % sub_name)
+if not expected_msg or not isinstance(expected_msg, list):
+  raise ValueError('Invalid expected messages %s.' % expected_msg)
+
+self.project = project
+self.sub_name = sub_name
+self.expected_msg = expected_msg
+self.timeout = timeout
+self.messages = None
+
+  def _matches(self, _):
+if not self.messages:
+  subscription = (pubsub
+  .Client(project=self.project)
+  .subscription(self.sub_name))
+  self.messages = self._wait_for_messages(subscription,
+  len(self.expected_msg),
+  self.timeout)
+return Counter(self.messages) == Counter(self.expected_msg)
+
+  def _wait_for_messages(self, subscription, expected_num, timeout):
+"""Wait for messages from given subscription."""
+logging.debug('Start pulling messages from %s', subscription.full_name)
+total_messages = []
+start_time = time.time()
+while time.time() - start_time <= timeout:
+  pulled = subscription.pull(max_messages=MAX_MESSAGES_IN_ONE_PULL)
+  for ack_id, message in pulled:
+total_messages.append(message.data)
+subscription.acknowledge([ack_id])
+  if len(total_messages) >= expected_num:
 
 Review comment:
   This is a little strange. This matcher will be only useful when testing for 
the full output. It would not be useful for testing say elements x, y, z should 
be part of the whole output type of a use case.
   
   Is this what you intended to do?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83057)

> Build test 

[jira] [Work logged] (BEAM-3861) Build test infra for end-to-end streaming test in Python SDK

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3861?focusedWorklogId=83060=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83060
 ]

ASF GitHub Bot logged work on BEAM-3861:


Author: ASF GitHub Bot
Created on: 22/Mar/18 04:19
Start Date: 22/Mar/18 04:19
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #4930: 
[BEAM-3861] Complete streaming wordcount test in Python SDK
URL: https://github.com/apache/beam/pull/4930#discussion_r176304852
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/tests/pubsub_matcher.py
 ##
 @@ -0,0 +1,108 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PubSub verifier used for end-to-end test."""
+
+import logging
+import time
+from collections import Counter
+
+from hamcrest.core.base_matcher import BaseMatcher
+
+__all__ = ['PubSubMessageMatcher']
+
+
+# Protect against environments where pubsub library is not available.
+# pylint: disable=wrong-import-order, wrong-import-position
+try:
+  from google.cloud import pubsub
+except ImportError:
+  pubsub = None
+# pylint: enable=wrong-import-order, wrong-import-position
+
+DEFAULT_TIMEOUT = 5 * 60
+MAX_MESSAGES_IN_ONE_PULL = 50
+
+
+class PubSubMessageMatcher(BaseMatcher):
+  """Matcher that verifies messages from given subscription.
+
+  This matcher can block the test and keep pulling messages from given
+  subscription until all expected messages are shown or timeout.
+  """
+
+  def __init__(self, project, sub_name, expected_msg, timeout=DEFAULT_TIMEOUT):
+"""Initialize PubSubMessageMatcher object.
+
+Args:
+  project: A name string of project.
+  sub_name: A name string of subscription which is attached to output.
+  expected_msg: A string list that contains expected message data pulled
+from the subscription.
+  timeout: Timeout in seconds to wait for all expected messages appears.
+"""
+if pubsub is None:
+  raise ImportError(
+  'PubSub dependencies are not installed.')
+if not project:
+  raise ValueError('Invalid project %s.' % project)
+if not sub_name:
+  raise ValueError('Invalid subscription %s.' % sub_name)
+if not expected_msg or not isinstance(expected_msg, list):
+  raise ValueError('Invalid expected messages %s.' % expected_msg)
+
+self.project = project
+self.sub_name = sub_name
+self.expected_msg = expected_msg
+self.timeout = timeout
+self.messages = None
+
+  def _matches(self, _):
+if not self.messages:
+  subscription = (pubsub
+  .Client(project=self.project)
+  .subscription(self.sub_name))
+  self.messages = self._wait_for_messages(subscription,
+  len(self.expected_msg),
+  self.timeout)
+return Counter(self.messages) == Counter(self.expected_msg)
+
+  def _wait_for_messages(self, subscription, expected_num, timeout):
+"""Wait for messages from given subscription."""
+logging.debug('Start pulling messages from %s', subscription.full_name)
+total_messages = []
+start_time = time.time()
+while time.time() - start_time <= timeout:
+  pulled = subscription.pull(max_messages=MAX_MESSAGES_IN_ONE_PULL)
+  for ack_id, message in pulled:
+total_messages.append(message.data)
+subscription.acknowledge([ack_id])
+  if len(total_messages) >= expected_num:
+return total_messages
+  time.sleep(1)
+
+raise RuntimeError('Timeout after %d sec. Received %d messages from %s.' %
+   (timeout, len(total_messages), subscription.full_name))
+
+  def describe_to(self, description):
+description.append_text(
+'Expected %d messages.' % len(self.expected_msg))
+
+  def describe_mismatch(self, _, mismatch_description):
+diff = set(self.expected_msg) - set(self.messages)
 
 Review comment:
   don't you want to use a counter here to? That is what you used for matching. 
(For example if you are checking for messages x, x and only x appears, it will 
mismatch (correctly), however 

[jira] [Work logged] (BEAM-3861) Build test infra for end-to-end streaming test in Python SDK

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3861?focusedWorklogId=83056=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83056
 ]

ASF GitHub Bot logged work on BEAM-3861:


Author: ASF GitHub Bot
Created on: 22/Mar/18 04:19
Start Date: 22/Mar/18 04:19
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #4930: 
[BEAM-3861] Complete streaming wordcount test in Python SDK
URL: https://github.com/apache/beam/pull/4930#discussion_r176304163
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/tests/pubsub_matcher.py
 ##
 @@ -0,0 +1,108 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PubSub verifier used for end-to-end test."""
+
+import logging
+import time
+from collections import Counter
+
+from hamcrest.core.base_matcher import BaseMatcher
+
+__all__ = ['PubSubMessageMatcher']
+
+
+# Protect against environments where pubsub library is not available.
+# pylint: disable=wrong-import-order, wrong-import-position
+try:
+  from google.cloud import pubsub
+except ImportError:
+  pubsub = None
+# pylint: enable=wrong-import-order, wrong-import-position
+
+DEFAULT_TIMEOUT = 5 * 60
+MAX_MESSAGES_IN_ONE_PULL = 50
+
+
+class PubSubMessageMatcher(BaseMatcher):
+  """Matcher that verifies messages from given subscription.
+
+  This matcher can block the test and keep pulling messages from given
+  subscription until all expected messages are shown or timeout.
+  """
+
+  def __init__(self, project, sub_name, expected_msg, timeout=DEFAULT_TIMEOUT):
+"""Initialize PubSubMessageMatcher object.
+
+Args:
+  project: A name string of project.
+  sub_name: A name string of subscription which is attached to output.
+  expected_msg: A string list that contains expected message data pulled
+from the subscription.
+  timeout: Timeout in seconds to wait for all expected messages appears.
+"""
+if pubsub is None:
+  raise ImportError(
+  'PubSub dependencies are not installed.')
+if not project:
+  raise ValueError('Invalid project %s.' % project)
+if not sub_name:
+  raise ValueError('Invalid subscription %s.' % sub_name)
+if not expected_msg or not isinstance(expected_msg, list):
 
 Review comment:
   `if not isinstance(expected_msg, list):` should be enough?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83056)
Time Spent: 5h  (was: 4h 50m)

> Build test infra for end-to-end streaming test in Python SDK
> 
>
> Key: BEAM-3861
> URL: https://issues.apache.org/jira/browse/BEAM-3861
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3861) Build test infra for end-to-end streaming test in Python SDK

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3861?focusedWorklogId=83055=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83055
 ]

ASF GitHub Bot logged work on BEAM-3861:


Author: ASF GitHub Bot
Created on: 22/Mar/18 04:19
Start Date: 22/Mar/18 04:19
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #4930: 
[BEAM-3861] Complete streaming wordcount test in Python SDK
URL: https://github.com/apache/beam/pull/4930#discussion_r176305077
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py
 ##
 @@ -44,34 +44,41 @@ def run_pipeline(self, pipeline):
 
 self.result = super(TestDataflowRunner, self).run_pipeline(pipeline)
 if self.result.has_job:
-  project = pipeline._options.view_as(GoogleCloudOptions).project
-  region_id = pipeline._options.view_as(GoogleCloudOptions).region
-  job_id = self.result.job_id()
   # TODO(markflyhigh)(BEAM-1890): Use print since Nose dosen't show logs
   # in some cases.
-  print (
-  'Found: https://console.cloud.google.com/dataflow/jobsDetail'
-  '/locations/%s/jobs/%s?project=%s' % (region_id, job_id, project))
+  print('Found: %s.' % self.build_console_url(pipeline.options))
 
 if not options.view_as(StandardOptions).streaming:
   self.result.wait_until_finish()
 else:
-  # TODO: Ideally, we want to wait until workers start successfully.
-  self.wait_until_running()
+  self.wait_until_in_state(PipelineState.RUNNING)
 
 if on_success_matcher:
   from hamcrest import assert_that as hc_assert_that
   hc_assert_that(self.result, pickler.loads(on_success_matcher))
 
+if options.view_as(StandardOptions).streaming:
+  self.result.cancel()
 
 Review comment:
   Do you want to cancel the job immediately after it is reaching to running 
state? Should we wait a bit first?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83055)
Time Spent: 4h 50m  (was: 4h 40m)

> Build test infra for end-to-end streaming test in Python SDK
> 
>
> Key: BEAM-3861
> URL: https://issues.apache.org/jira/browse/BEAM-3861
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3861) Build test infra for end-to-end streaming test in Python SDK

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3861?focusedWorklogId=83059=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83059
 ]

ASF GitHub Bot logged work on BEAM-3861:


Author: ASF GitHub Bot
Created on: 22/Mar/18 04:19
Start Date: 22/Mar/18 04:19
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #4930: 
[BEAM-3861] Complete streaming wordcount test in Python SDK
URL: https://github.com/apache/beam/pull/4930#discussion_r176303844
 
 

 ##
 File path: sdks/python/apache_beam/examples/streaming_wordcount_it_test.py
 ##
 @@ -47,24 +50,32 @@ class StreamingWordCountIT(unittest.TestCase):
 
   def setUp(self):
 self.test_pipeline = TestPipeline(is_integration_test=True)
+self.project = self.test_pipeline.get_option('project')
+self.identifier = self._generate_identifier()
 
 Review comment:
   You could use uuid instead.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83059)
Time Spent: 5h  (was: 4h 50m)

> Build test infra for end-to-end streaming test in Python SDK
> 
>
> Key: BEAM-3861
> URL: https://issues.apache.org/jira/browse/BEAM-3861
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3098) Upgrade Java grpc version

2018-03-21 Thread Chamikara Jayalath (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409015#comment-16409015
 ] 

Chamikara Jayalath commented on BEAM-3098:
--

Yeah, netty shaded gRPC is not available for 1.2.0 so won't help in this case.  

> Upgrade Java grpc version
> -
>
> Key: BEAM-3098
> URL: https://issues.apache.org/jira/browse/BEAM-3098
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Solomon Duskis
>Priority: Major
>
> Beam Java currently depends on grpc 1.2, which was released in March.  It 
> would be great if the dependency could be update to something newer, like 
> grpc 1.7.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3824) Use WriteToBigQuery in Python mobile gaming examples.

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3824?focusedWorklogId=83053=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83053
 ]

ASF GitHub Bot logged work on BEAM-3824:


Author: ASF GitHub Bot
Created on: 22/Mar/18 04:02
Start Date: 22/Mar/18 04:02
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #4932: [BEAM-3824] Convert 
big query writes to beam.io.WriteToBigQuery in mobile gaming example
URL: https://github.com/apache/beam/pull/4932#issuecomment-375172664
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83053)
Time Spent: 40m  (was: 0.5h)

> Use  WriteToBigQuery in Python mobile gaming examples. 
> ---
>
> Key: BEAM-3824
> URL: https://issues.apache.org/jira/browse/BEAM-3824
> Project: Beam
>  Issue Type: Bug
>  Components: examples-python
>Reporter: Valentyn Tymofieiev
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> python -m apache_beam.examples.complete.game.hourly_team_score 
> --project=$PROJECT --dataset=beam_release_2_4_0 
> --input=gs://$BUCKET/mobile/first_5000_gaming_data.csv
> The pipeline fails with:
> INFO:root:finish  output_tags=['out'], 
> receivers=[ConsumerSet[WriteTeamScoreSums/WriteToBigQuery.out0, 
> coder=WindowedValueCoder[FastPrimitivesCoder], len(consumers)=0]]> 
> Traceback (most recent call last):
>  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
>  "__main__", fname, loader, pkg_name) 
>  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
>  exec code in run_globals 
>  File 
> "/tmp/release_testing/r2.4.0_env/lib/python2.7/site-packages/apache_beam/examples/complete/game/hourly_team_score.py",
>  line 276, in <
> module> 
>  run() 
>  File 
> "/tmp/release_testing/r2.4.0_env/lib/python2.7/site-packages/apache_beam/examples/complete/game/hourly_team_score.py",
>  line 270, in r
> un 
>  write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)) 
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 389, in __exit__
>  self.run().wait_until_finish() 
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 369, in run
>  self.to_runner_api(), self.runner, self._options).run(False) 
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 382, in run
>  return self.runner.run_pipeline(self) 
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/direct/direct_runner.py",
>  line 129, in run_pip
> eline 
>  return runner.run_pipeline(pipeline)
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 215, in ru
> n_pipeline 
>  return self.run_via_runner_api(pipeline.to_runner_api())
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 218, in ru
> n_via_runner_api 
>  return self.run_stages(*self.create_stages(pipeline_proto))
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 837, in ru
> n_stages 
>  pcoll_buffers, safe_coders).process_bundle.metrics
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 938, in ru
> n_stage 
>  self._progress_frequency).process_bundle(data_input, data_output)
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 1110, in p
> rocess_bundle 
>  result_future = self._controller.control_handler.push(process_bundle)
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 1003, in p
> ush 
>  response = self.worker.do_instruction(request)
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 185, in do_instruc
> tion 
>  request.instruction_id) 
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 202, in process_bu
> ndle 
>  processor.process_bundle(instruction_id)
>  File 
> 

[jira] [Work logged] (BEAM-3824) Use WriteToBigQuery in Python mobile gaming examples.

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3824?focusedWorklogId=83052=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83052
 ]

ASF GitHub Bot logged work on BEAM-3824:


Author: ASF GitHub Bot
Created on: 22/Mar/18 04:02
Start Date: 22/Mar/18 04:02
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #4932: [BEAM-3824] Convert 
big query writes to beam.io.WriteToBigQuery in mobile gaming example
URL: https://github.com/apache/beam/pull/4932#issuecomment-375172664
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83052)
Time Spent: 0.5h  (was: 20m)

> Use  WriteToBigQuery in Python mobile gaming examples. 
> ---
>
> Key: BEAM-3824
> URL: https://issues.apache.org/jira/browse/BEAM-3824
> Project: Beam
>  Issue Type: Bug
>  Components: examples-python
>Reporter: Valentyn Tymofieiev
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> python -m apache_beam.examples.complete.game.hourly_team_score 
> --project=$PROJECT --dataset=beam_release_2_4_0 
> --input=gs://$BUCKET/mobile/first_5000_gaming_data.csv
> The pipeline fails with:
> INFO:root:finish  output_tags=['out'], 
> receivers=[ConsumerSet[WriteTeamScoreSums/WriteToBigQuery.out0, 
> coder=WindowedValueCoder[FastPrimitivesCoder], len(consumers)=0]]> 
> Traceback (most recent call last):
>  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
>  "__main__", fname, loader, pkg_name) 
>  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
>  exec code in run_globals 
>  File 
> "/tmp/release_testing/r2.4.0_env/lib/python2.7/site-packages/apache_beam/examples/complete/game/hourly_team_score.py",
>  line 276, in <
> module> 
>  run() 
>  File 
> "/tmp/release_testing/r2.4.0_env/lib/python2.7/site-packages/apache_beam/examples/complete/game/hourly_team_score.py",
>  line 270, in r
> un 
>  write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)) 
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 389, in __exit__
>  self.run().wait_until_finish() 
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 369, in run
>  self.to_runner_api(), self.runner, self._options).run(False) 
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 382, in run
>  return self.runner.run_pipeline(self) 
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/direct/direct_runner.py",
>  line 129, in run_pip
> eline 
>  return runner.run_pipeline(pipeline)
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 215, in ru
> n_pipeline 
>  return self.run_via_runner_api(pipeline.to_runner_api())
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 218, in ru
> n_via_runner_api 
>  return self.run_stages(*self.create_stages(pipeline_proto))
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 837, in ru
> n_stages 
>  pcoll_buffers, safe_coders).process_bundle.metrics
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 938, in ru
> n_stage 
>  self._progress_frequency).process_bundle(data_input, data_output)
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 1110, in p
> rocess_bundle 
>  result_future = self._controller.control_handler.push(process_bundle)
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 1003, in p
> ush 
>  response = self.worker.do_instruction(request)
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 185, in do_instruc
> tion 
>  request.instruction_id) 
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 202, in process_bu
> ndle 
>  processor.process_bundle(instruction_id)
>  File 

Jenkins build became unstable: beam_PostCommit_Java_MavenInstall #6265

2018-03-21 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-3851) Support element timestamps while publishing to Kafka.

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3851?focusedWorklogId=83048=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83048
 ]

ASF GitHub Bot logged work on BEAM-3851:


Author: ASF GitHub Bot
Created on: 22/Mar/18 03:26
Start Date: 22/Mar/18 03:26
Worklog Time Spent: 10m 
  Work Description: rangadi commented on issue #4868: [BEAM-3851] Option to 
preserve element timestamp while publishing to Kafka.
URL: https://github.com/apache/beam/pull/4868#issuecomment-375167740
 
 
   @XuMingmin, yep, they seem unrelated. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83048)
Time Spent: 1h 20m  (was: 1h 10m)

> Support element timestamps while publishing to Kafka.
> -
>
> Key: BEAM-3851
> URL: https://issues.apache.org/jira/browse/BEAM-3851
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Affects Versions: 2.3.0
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> KafkaIO sink should support using input element timestamp for the message 
> published to Kafka. Otherwise there is no way for user to influence the 
> timestamp of the messages in Kafka sink.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PerformanceTests_Spark #1495

2018-03-21 Thread Apache Jenkins Server
See 


Changes:

[Pablo] Fixing check for sideinput_io_metrics experiment flag.

[herohde] [BEAM-3897] Add Go wordcount example with multi output DoFns

[iemejia] Remove testing package-info from main package for GCP IO

[iemejia] Update maven failsafe/surefire plugin to version 2.21.0

[iemejia] [BEAM-3873] Update commons-compress to version 1.16.1 (fix

[iemejia] Remove maven warnings

[tgroh] Add Side Inputs to ExecutableStage

[herohde] [BEAM-3866] Remove windowed value requirement for External

[markliu] Clean up terminal state check in TestDataflowRunner

[tgroh] Make RemoteEnvironment public

--
[...truncated 83.99 KB...]
'apache-beam-testing:bqjob_r82ec824d5dabf3_01624bb113f2_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)

2018-03-22 03:12:00,936 e351253c MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-03-22 03:12:22,805 e351253c MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-03-22 03:12:25,781 e351253c MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1
STDOUT: Upload complete.
Waiting on bqjob_r75bac8462de3dfe1_01624bb174db_1 ... (0s) Current status: 
RUNNING 
 Waiting on bqjob_r75bac8462de3dfe1_01624bb174db_1 ... (0s) 
Current status: DONE   
BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_r75bac8462de3dfe1_01624bb174db_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)

2018-03-22 03:12:25,782 e351253c MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-03-22 03:12:41,849 e351253c MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-03-22 03:12:44,967 e351253c MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1
STDOUT: Upload complete.
Waiting on bqjob_r23c073770ec81244_01624bb1bfba_1 ... (0s) Current status: 
RUNNING 
 Waiting on bqjob_r23c073770ec81244_01624bb1bfba_1 ... (0s) 
Current status: DONE   
BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_r23c073770ec81244_01624bb1bfba_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)

2018-03-22 03:12:44,967 e351253c MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-03-22 03:13:00,003 e351253c MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-03-22 03:13:02,835 e351253c MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=83045=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83045
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 22/Mar/18 03:04
Start Date: 22/Mar/18 03:04
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#4901: [BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r176297607
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/pubsub.py
 ##
 @@ -44,8 +68,8 @@
   pubsub_pb2 = None
 
 
-__all__ = ['PubsubMessage', 'ReadMessagesFromPubSub', 'ReadStringsFromPubSub',
-   'WriteStringsToPubSub']
+__all__ = ['PubsubMessage', 'ReadMessagesFromPubSub', 'ReadPayloadsFromPubSub',
 
 Review comment:
   No, we don't need it. It was simply a tentative API choice at the time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83045)
Time Spent: 7h 20m  (was: 7h 10m)

> Support full PubsubMessages
> ---
>
> Key: BEAM-3744
> URL: https://issues.apache.org/jira/browse/BEAM-3744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> Tracking changes to Pubsub support in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PerformanceTests_Python #1051

2018-03-21 Thread Apache Jenkins Server
See 


Changes:

[Pablo] Fixing check for sideinput_io_metrics experiment flag.

[herohde] [BEAM-3897] Add Go wordcount example with multi output DoFns

[iemejia] Remove testing package-info from main package for GCP IO

[iemejia] Update maven failsafe/surefire plugin to version 2.21.0

[iemejia] [BEAM-3873] Update commons-compress to version 1.16.1 (fix

[iemejia] Remove maven warnings

[tgroh] Add Side Inputs to ExecutableStage

[herohde] [BEAM-3866] Remove windowed value requirement for External

[markliu] Clean up terminal state check in TestDataflowRunner

[tgroh] Make RemoteEnvironment public

--
[...truncated 1.05 KB...]
Commit message: "[BEAM-3897] Add Go wordcount example with multi output DoFns"
 > git rev-list --no-walk ebf7f91a6c94fd73c66b097c6ed45d2626acf08f # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins6987772529404678364.sh
+ rm -rf PerfKitBenchmarker
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins3971058154825014745.sh
+ rm -rf .env
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins8568209132913073680.sh
+ virtualenv .env --system-site-packages
New python executable in .env/bin/python
Installing setuptools, pip...done.
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins9024817220414058226.sh
+ .env/bin/pip install --upgrade setuptools pip
Downloading/unpacking setuptools from 
https://pypi.python.org/packages/20/d7/04a0b689d3035143e2ff288f4b9ee4bf6ed80585cc121c90bfd85a1a8c2e/setuptools-39.0.1-py2.py3-none-any.whl#md5=ca299c7acd13a72e1171a3697f2b99bc
Downloading/unpacking pip from 
https://pypi.python.org/packages/ac/95/a05b56bb975efa78d3557efa36acaf9cf5d2fd0ee0062060493687432e03/pip-9.0.3-py2.py3-none-any.whl#md5=d512ceb964f38ba31addb8142bc657cb
Installing collected packages: setuptools, pip
  Found existing installation: setuptools 2.2
Uninstalling setuptools:
  Successfully uninstalled setuptools
  Found existing installation: pip 1.5.4
Uninstalling pip:
  Successfully uninstalled pip
Successfully installed setuptools pip
Cleaning up...
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins3176182257469235520.sh
+ git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git
Cloning into 'PerfKitBenchmarker'...
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins1486540663133030853.sh
+ .env/bin/pip install -r PerfKitBenchmarker/requirements.txt
Requirement already satisfied: absl-py in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied: jinja2>=2.7 in 
/usr/local/lib/python2.7/dist-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied: setuptools in ./.env/lib/python2.7/site-packages 
(from -r PerfKitBenchmarker/requirements.txt (line 16))
Requirement already satisfied: colorlog[windows]==2.6.0 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 17))
Requirement already satisfied: blinker>=1.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 18))
Requirement already satisfied: futures>=3.0.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 19))
Requirement already satisfied: PyYAML==3.12 in 
/usr/local/lib/python2.7/dist-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 20))
Requirement already satisfied: pint>=0.7 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 21))
Requirement already satisfied: numpy==1.13.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 22))
Requirement already satisfied: functools32 in 
/usr/local/lib/python2.7/dist-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 23))
Requirement already satisfied: contextlib2>=0.5.1 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 24))
Requirement already satisfied: pywinrm in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: six in /usr/local/lib/python2.7/dist-packages 
(from absl-py->-r PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied: MarkupSafe>=0.23 in 
/usr/local/lib/python2.7/dist-packages (from jinja2>=2.7->-r 

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #5197

2018-03-21 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-3098) Upgrade Java grpc version

2018-03-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408970#comment-16408970
 ] 

Ismaël Mejía commented on BEAM-3098:


Actually the issue is that grpc is leaking netty on all the runners and this 
has conflicts with Spark at runtime because they use netty 4.0 by default 
instead of 4.1, This is the case of BEAM-3519

I saw your PR for that, but I have the impression that both issues are 
intertwined, because using the shaded version of gRPC would require to upgrade 
it, no?

> Upgrade Java grpc version
> -
>
> Key: BEAM-3098
> URL: https://issues.apache.org/jira/browse/BEAM-3098
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Solomon Duskis
>Priority: Major
>
> Beam Java currently depends on grpc 1.2, which was released in March.  It 
> would be great if the dependency could be update to something newer, like 
> grpc 1.7.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PerformanceTests_Compressed_TextIOIT #280

2018-03-21 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-3098) Upgrade Java grpc version

2018-03-21 Thread Chamikara Jayalath (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408963#comment-16408963
 ] 

Chamikara Jayalath commented on BEAM-3098:
--

BTW there is a gRPC version with netty shaded now: 
https://mvnrepository.com/artifact/io.grpc/grpc-netty-shaded

> Upgrade Java grpc version
> -
>
> Key: BEAM-3098
> URL: https://issues.apache.org/jira/browse/BEAM-3098
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Solomon Duskis
>Priority: Major
>
> Beam Java currently depends on grpc 1.2, which was released in March.  It 
> would be great if the dependency could be update to something newer, like 
> grpc 1.7.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3098) Upgrade Java grpc version

2018-03-21 Thread Chamikara Jayalath (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408952#comment-16408952
 ] 

Chamikara Jayalath commented on BEAM-3098:
--

This is something that has been in our radar but has been de-prioritized due to 
other high priority issues. Looks like pom.xml in runners/spark module does not 
specify gRPC as a dependency. Am I correct to assume that this is only an issue 
if a user use both a Beam module that depend on gRPC (for example, 
io/google-cloud-platform) and runners/spark ?

> Upgrade Java grpc version
> -
>
> Key: BEAM-3098
> URL: https://issues.apache.org/jira/browse/BEAM-3098
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Solomon Duskis
>Priority: Major
>
> Beam Java currently depends on grpc 1.2, which was released in March.  It 
> would be great if the dependency could be update to something newer, like 
> grpc 1.7.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3731) Enable tests to run in Python 3

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3731?focusedWorklogId=83041=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83041
 ]

ASF GitHub Bot logged work on BEAM-3731:


Author: ASF GitHub Bot
Created on: 22/Mar/18 02:36
Start Date: 22/Mar/18 02:36
Worklog Time Spent: 10m 
  Work Description: luke-zhu closed pull request #4730: [BEAM-3731] Enable 
tests to run in Python 3
URL: https://github.com/apache/beam/pull/4730
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/coders/coder_impl.pxd 
b/sdks/python/apache_beam/coders/coder_impl.pxd
index 8af394b6686..91768432d91 100644
--- a/sdks/python/apache_beam/coders/coder_impl.pxd
+++ b/sdks/python/apache_beam/coders/coder_impl.pxd
@@ -68,7 +68,6 @@ cdef class DeterministicFastPrimitivesCoderImpl(CoderImpl):
   cdef bint _check_safe(self, value) except -1
 
 
-cdef object NoneType
 cdef char UNKNOWN_TYPE, NONE_TYPE, INT_TYPE, FLOAT_TYPE, BOOL_TYPE
 cdef char STR_TYPE, UNICODE_TYPE, LIST_TYPE, TUPLE_TYPE, DICT_TYPE, SET_TYPE
 
diff --git a/sdks/python/apache_beam/coders/coder_impl.py 
b/sdks/python/apache_beam/coders/coder_impl.py
index b5b17899601..4bfb19a52a1 100644
--- a/sdks/python/apache_beam/coders/coder_impl.py
+++ b/sdks/python/apache_beam/coders/coder_impl.py
@@ -28,7 +28,7 @@
 """
 from __future__ import absolute_import
 
-from types import NoneType
+import six
 
 from apache_beam.coders import observable
 from apache_beam.utils import windowed_value
@@ -197,7 +197,7 @@ def __init__(self, coder, step_label):
 self._step_label = step_label
 
   def _check_safe(self, value):
-if isinstance(value, (str, unicode, long, int, float)):
+if isinstance(value, (str, six.text_type, long, int, float)):
   pass
 elif value is None:
   pass
@@ -277,7 +277,7 @@ def get_estimated_size_and_observables(self, value, 
nested=False):
 
   def encode_to_stream(self, value, stream, nested):
 t = type(value)
-if t is NoneType:
+if value is None:
   stream.write_byte(NONE_TYPE)
 elif t is int:
   stream.write_byte(INT_TYPE)
@@ -288,7 +288,7 @@ def encode_to_stream(self, value, stream, nested):
 elif t is str:
   stream.write_byte(STR_TYPE)
   stream.write(value, nested)
-elif t is unicode:
+elif t is six.text_type:
   unicode_value = value  # for typing
   stream.write_byte(UNICODE_TYPE)
   stream.write(unicode_value.encode('utf-8'), nested)
@@ -302,7 +302,7 @@ def encode_to_stream(self, value, stream, nested):
   dict_value = value  # for typing
   stream.write_byte(DICT_TYPE)
   stream.write_var_int64(len(dict_value))
-  for k, v in dict_value.iteritems():
+  for k, v in six.iteritems(dict_value):
 self.encode_to_stream(k, stream, True)
 self.encode_to_stream(v, stream, True)
 elif t is bool:
diff --git a/sdks/python/apache_beam/coders/coders.py 
b/sdks/python/apache_beam/coders/coders.py
index f7662586987..f3c99f730f3 100644
--- a/sdks/python/apache_beam/coders/coders.py
+++ b/sdks/python/apache_beam/coders/coders.py
@@ -22,17 +22,22 @@
 from __future__ import absolute_import
 
 import base64
-import cPickle as pickle
 
+# pylint: disable=ungrouped-imports
 import google.protobuf
+import six
 from google.protobuf import wrappers_pb2
 
+import six.moves.cPickle as pickle
 from apache_beam.coders import coder_impl
 from apache_beam.portability import common_urns
 from apache_beam.portability import python_urns
 from apache_beam.portability.api import beam_runner_api_pb2
 from apache_beam.utils import proto_utils
 
+# pylint: enable=ungrouped-imports
+
+
 # pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports
 try:
   from .stream import get_varint_size
@@ -309,7 +314,7 @@ class ToStringCoder(Coder):
   """A default string coder used if no sink coder is specified."""
 
   def encode(self, value):
-if isinstance(value, unicode):
+if isinstance(value, six.text_type):
   return value.encode('utf-8')
 elif isinstance(value, str):
   return value
diff --git a/sdks/python/apache_beam/coders/fast_coders_test.py 
b/sdks/python/apache_beam/coders/fast_coders_test.py
index a13334a2c26..32089ced0e0 100644
--- a/sdks/python/apache_beam/coders/fast_coders_test.py
+++ b/sdks/python/apache_beam/coders/fast_coders_test.py
@@ -20,7 +20,6 @@
 import logging
 import unittest
 
-
 # Run all the standard coder test cases.
 from apache_beam.coders.coders_test_common import *
 
diff --git a/sdks/python/apache_beam/coders/slow_coders_test.py 
b/sdks/python/apache_beam/coders/slow_coders_test.py
index 97aa39ca094..ff4693e16d7 100644
--- 

[jira] [Comment Edited] (BEAM-3098) Upgrade Java grpc version

2018-03-21 Thread Chamikara Jayalath (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16406856#comment-16406856
 ] 

Chamikara Jayalath edited comment on BEAM-3098 at 3/22/18 2:31 AM:
---

DataflowRunner currently overrides and shades gRPC version internally. So at 
lest for that component we have to maintain the current version till the runner 
is upgraded.


was (Author: chamikara):
DataflowRunner currently overrides and shards gRPC version internally. So at 
lest for that component we have to maintain the current version till the runner 
is upgraded.

> Upgrade Java grpc version
> -
>
> Key: BEAM-3098
> URL: https://issues.apache.org/jira/browse/BEAM-3098
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Solomon Duskis
>Priority: Major
>
> Beam Java currently depends on grpc 1.2, which was released in March.  It 
> would be great if the dependency could be update to something newer, like 
> grpc 1.7.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Python_Verify #4478

2018-03-21 Thread Apache Jenkins Server
See 


Changes:

[herohde] [BEAM-3897] Add Go wordcount example with multi output DoFns

[herohde] [BEAM-3866] Remove windowed value requirement for External

[markliu] Clean up terminal state check in TestDataflowRunner

[tgroh] Make RemoteEnvironment public

--
[...truncated 1.13 MB...]
root: INFO: 2018-03-22T02:22:43.709Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/InitializeWrite into write/Write/WriteImpl/DoOnce/Read
root: INFO: 2018-03-22T02:22:43.741Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/FinalizeWrite/MapToVoidKey2 into 
write/Write/WriteImpl/PreFinalize/PreFinalize
root: INFO: 2018-03-22T02:22:43.775Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/FinalizeWrite/MapToVoidKey2 into 
write/Write/WriteImpl/PreFinalize/PreFinalize
root: INFO: 2018-03-22T02:22:43.810Z: JOB_MESSAGE_DEBUG: Workflow config is 
missing a default resource spec.
root: INFO: 2018-03-22T02:22:43.842Z: JOB_MESSAGE_DEBUG: Adding StepResource 
setup and teardown to workflow graph.
root: INFO: 2018-03-22T02:22:43.876Z: JOB_MESSAGE_DEBUG: Adding workflow start 
and stop steps.
root: INFO: 2018-03-22T02:22:43.911Z: JOB_MESSAGE_DEBUG: Assigning stage ids.
root: INFO: 2018-03-22T02:22:44.091Z: JOB_MESSAGE_DEBUG: Executing wait step 
start26
root: INFO: 2018-03-22T02:22:44.177Z: JOB_MESSAGE_BASIC: Executing operation 
write/Write/WriteImpl/DoOnce/Read+write/Write/WriteImpl/InitializeWrite+write/Write/WriteImpl/PreFinalize/MapToVoidKey0+write/Write/WriteImpl/FinalizeWrite/MapToVoidKey0+write/Write/WriteImpl/WriteBundles/MapToVoidKey0+write/Write/WriteImpl/PreFinalize/MapToVoidKey0+write/Write/WriteImpl/FinalizeWrite/MapToVoidKey0+write/Write/WriteImpl/WriteBundles/MapToVoidKey0
root: INFO: 2018-03-22T02:22:44.209Z: JOB_MESSAGE_BASIC: Executing operation 
write/Write/WriteImpl/GroupByKey/Create
root: INFO: 2018-03-22T02:22:44.223Z: JOB_MESSAGE_DEBUG: Starting worker pool 
setup.
root: INFO: 2018-03-22T02:22:44.225Z: JOB_MESSAGE_BASIC: Executing operation 
group/Create
root: INFO: 2018-03-22T02:22:44.246Z: JOB_MESSAGE_BASIC: Starting 1 workers in 
us-central1-f...
root: INFO: Job 2018-03-21_19_22_38-2399826686858835030 is in state 
JOB_STATE_RUNNING
root: INFO: 2018-03-22T02:22:51.766Z: JOB_MESSAGE_DETAILED: Autoscaling: Raised 
the number of workers to 0 based on the rate of progress in the currently 
running step(s).
root: INFO: 2018-03-22T02:22:51.840Z: JOB_MESSAGE_DEBUG: Value 
"write/Write/WriteImpl/GroupByKey/Session" materialized.
root: INFO: 2018-03-22T02:22:51.863Z: JOB_MESSAGE_DEBUG: Value "group/Session" 
materialized.
root: INFO: 2018-03-22T02:22:51.924Z: JOB_MESSAGE_BASIC: Executing operation 
read/Read+split+pair_with_one+group/Reify+group/Write
root: INFO: 2018-03-22T02:22:57.073Z: JOB_MESSAGE_BASIC: Autoscaling: Resizing 
worker pool from 1 to 2.
root: INFO: 2018-03-22T02:23:09.044Z: JOB_MESSAGE_DETAILED: Autoscaling: Raised 
the number of workers to 1 based on the rate of progress in the currently 
running step(s).
root: INFO: 2018-03-22T02:23:09.075Z: JOB_MESSAGE_DETAILED: Resized worker pool 
to 1, though goal was 2.  This could be a quota issue.
root: INFO: 2018-03-22T02:23:22.955Z: JOB_MESSAGE_DETAILED: Workers have 
started successfully.
root: INFO: 2018-03-22T02:23:24.974Z: JOB_MESSAGE_DETAILED: Autoscaling: Raised 
the number of workers to 2 based on the rate of progress in the currently 
running step(s).
root: INFO: 2018-03-22T02:27:45.472Z: JOB_MESSAGE_BASIC: Executing operation 
group/Close
root: INFO: 2018-03-22T02:27:48.714Z: JOB_MESSAGE_DEBUG: Value 
"write/Write/WriteImpl/DoOnce/Read.out" materialized.
root: INFO: 2018-03-22T02:27:48.735Z: JOB_MESSAGE_DEBUG: Value 
"write/Write/WriteImpl/PreFinalize/MapToVoidKey0.out" materialized.
root: INFO: 2018-03-22T02:27:48.758Z: JOB_MESSAGE_DEBUG: Value 
"write/Write/WriteImpl/FinalizeWrite/MapToVoidKey0.out" materialized.
root: INFO: 2018-03-22T02:27:48.789Z: JOB_MESSAGE_DEBUG: Value 
"write/Write/WriteImpl/WriteBundles/MapToVoidKey0.out" materialized.
root: INFO: 2018-03-22T02:27:48.824Z: JOB_MESSAGE_BASIC: Executing operation 
write/Write/WriteImpl/PreFinalize/_DataflowIterableSideInput(MapToVoidKey0.out.0)
root: INFO: 2018-03-22T02:27:48.856Z: JOB_MESSAGE_BASIC: Executing operation 
write/Write/WriteImpl/FinalizeWrite/_DataflowIterableSideInput(MapToVoidKey0.out.0)
root: INFO: 2018-03-22T02:27:48.891Z: JOB_MESSAGE_BASIC: Executing operation 
write/Write/WriteImpl/WriteBundles/_DataflowIterableSideInput(MapToVoidKey0.out.0)
root: INFO: 2018-03-22T02:27:48.930Z: JOB_MESSAGE_DEBUG: Value 
"write/Write/WriteImpl/PreFinalize/_DataflowIterableSideInput(MapToVoidKey0.out.0).output"
 materialized.
root: INFO: 2018-03-22T02:27:48.961Z: JOB_MESSAGE_DEBUG: Value 
"write/Write/WriteImpl/FinalizeWrite/_DataflowIterableSideInput(MapToVoidKey0.out.0).output"
 materialized.
root: INFO: 

[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=83038=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83038
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 22/Mar/18 02:25
Start Date: 22/Mar/18 02:25
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#4901: [BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r176293458
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/pubsub.py
 ##
 @@ -44,8 +68,8 @@
   pubsub_pb2 = None
 
 
-__all__ = ['PubsubMessage', 'ReadMessagesFromPubSub', 'ReadStringsFromPubSub',
-   'WriteStringsToPubSub']
+__all__ = ['PubsubMessage', 'ReadMessagesFromPubSub', 'ReadPayloadsFromPubSub',
 
 Review comment:
   @lukecwik @charlesccychen can you comment on whether we actually need 
'ReadStringsFromPubSub' for some reason that we missed here ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83038)
Time Spent: 7h 10m  (was: 7h)

> Support full PubsubMessages
> ---
>
> Key: BEAM-3744
> URL: https://issues.apache.org/jira/browse/BEAM-3744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Tracking changes to Pubsub support in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3706) Update CombinePayload to improved model for Portability

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3706?focusedWorklogId=83037=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83037
 ]

ASF GitHub Bot logged work on BEAM-3706:


Author: ASF GitHub Bot
Created on: 22/Mar/18 02:08
Start Date: 22/Mar/18 02:08
Worklog Time Spent: 10m 
  Work Description: youngoli commented on a change in pull request #4924: 
[BEAM-3706] Removing side inputs from CombinePayload proto.
URL: https://github.com/apache/beam/pull/4924#discussion_r176291758
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CombineTranslation.java
 ##
 @@ -81,10 +73,14 @@ public String getUrn(Combine.PerKey transform) {
 public FunctionSpec translate(
 AppliedPTransform> transform, 
SdkComponents components)
 throws IOException {
-  return FunctionSpec.newBuilder()
-  .setUrn(COMBINE_TRANSFORM_URN)
-  .setPayload(payloadForCombine((AppliedPTransform) transform, 
components).toByteString())
-  .build();
+  if (transform.getTransform().getSideInputs().isEmpty()) {
+return FunctionSpec.newBuilder()
+.setUrn(COMBINE_TRANSFORM_URN)
+.setPayload(payloadForCombine((AppliedPTransform) transform, 
components).toByteString())
+.build();
+  } else {
+return null;
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83037)
Time Spent: 50m  (was: 40m)

> Update CombinePayload to improved model for Portability
> ---
>
> Key: BEAM-3706
> URL: https://issues.apache.org/jira/browse/BEAM-3706
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Minor
>  Labels: portability
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This will mean changing the proto definition in beam_runner_api, most likely 
> trimming out fields that are no longer necessary and adding any new ones that 
> could be useful. The majority of work will probably be in investigating if 
> some existing fields can actually be removed (SideInputs and Parameters for 
> example).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3706) Update CombinePayload to improved model for Portability

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3706?focusedWorklogId=83034=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83034
 ]

ASF GitHub Bot logged work on BEAM-3706:


Author: ASF GitHub Bot
Created on: 22/Mar/18 02:00
Start Date: 22/Mar/18 02:00
Worklog Time Spent: 10m 
  Work Description: youngoli commented on a change in pull request #4924: 
[BEAM-3706] Removing side inputs from CombinePayload proto.
URL: https://github.com/apache/beam/pull/4924#discussion_r176290781
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java
 ##
 @@ -196,10 +196,13 @@ private PTransformTranslation() {}
 transformBuilder.setSpec(spec);
   }
 } else if (KNOWN_PAYLOAD_TRANSLATORS.containsKey(transform.getClass())) {
-  transformBuilder.setSpec(
+  FunctionSpec spec =
   KNOWN_PAYLOAD_TRANSLATORS
   .get(transform.getClass())
-  .translate(appliedPTransform, components));
+  .translate(appliedPTransform, components);
+  if (spec != null) {
 
 Review comment:
   There is, setting the spec to null causes an exception, whereas leaving it 
missing is fine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83034)
Time Spent: 40m  (was: 0.5h)

> Update CombinePayload to improved model for Portability
> ---
>
> Key: BEAM-3706
> URL: https://issues.apache.org/jira/browse/BEAM-3706
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Minor
>  Labels: portability
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This will mean changing the proto definition in beam_runner_api, most likely 
> trimming out fields that are no longer necessary and adding any new ones that 
> could be useful. The majority of work will probably be in investigating if 
> some existing fields can actually be removed (SideInputs and Parameters for 
> example).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=83026=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83026
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 22/Mar/18 01:50
Start Date: 22/Mar/18 01:50
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #4901: 
[BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r176270878
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/pubsub.py
 ##
 @@ -44,8 +68,8 @@
   pubsub_pb2 = None
 
 
-__all__ = ['PubsubMessage', 'ReadMessagesFromPubSub', 'ReadStringsFromPubSub',
-   'WriteStringsToPubSub']
+__all__ = ['PubsubMessage', 'ReadMessagesFromPubSub', 'ReadPayloadsFromPubSub',
 
 Review comment:
   I am fine with supporting only 2 transforms: one for payload and the other 
for payload with metadata.
   If that's the case, then I argue we should get rid of ReadStringsFromPubSub, 
since ReadPayloadsFromPubSub is more useful (and you could always add a decode 
step to your pipeline).
   
   If we agree on this, I'll remove ReadStringsFromPubSub and rename 
WriteStringsToPubSub to WritePayloadsToPubSub in a subsequent PR (which will 
also include a PTransform to write payloads with attributes).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83026)
Time Spent: 6h  (was: 5h 50m)

> Support full PubsubMessages
> ---
>
> Key: BEAM-3744
> URL: https://issues.apache.org/jira/browse/BEAM-3744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Tracking changes to Pubsub support in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=83030=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83030
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 22/Mar/18 01:50
Start Date: 22/Mar/18 01:50
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #4901: 
[BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r176273891
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/transform_evaluator.py
 ##
 @@ -428,24 +428,41 @@ def _read_from_pubsub(self):
 self._subscription, return_immediately=True,
 max_messages=10) as results:
   def _get_element(message):
-if self.source.with_attributes:
-  return PubsubMessage._from_message(message)
+parsed_message = PubsubMessage._from_message(message)
+if timestamp_attribute:
+  try:
+rfc3339_or_milli = parsed_message.attributes[timestamp_attribute]
+  except KeyError:
+raise KeyError('Timestamp attribute not found: %s' %
 
 Review comment:
   done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83030)
Time Spent: 6h 40m  (was: 6.5h)

> Support full PubsubMessages
> ---
>
> Key: BEAM-3744
> URL: https://issues.apache.org/jira/browse/BEAM-3744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Tracking changes to Pubsub support in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=83031=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83031
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 22/Mar/18 01:50
Start Date: 22/Mar/18 01:50
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #4901: 
[BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r176274594
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/pubsub_test.py
 ##
 @@ -276,16 +283,27 @@ def topic(self, name):
 return FakePubsubTopic(name, self)
 
 
+def create_client_message(payload, message_id, attributes, publish_time):
+  """Returns a message as it would be returned from Pubsub client."""
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83031)
Time Spent: 6h 40m  (was: 6.5h)

> Support full PubsubMessages
> ---
>
> Key: BEAM-3744
> URL: https://issues.apache.org/jira/browse/BEAM-3744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Tracking changes to Pubsub support in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=83028=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83028
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 22/Mar/18 01:50
Start Date: 22/Mar/18 01:50
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #4901: 
[BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r176271020
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/pubsub.py
 ##
 @@ -280,25 +276,15 @@ def parse_subscription(full_subscription):
 class _PubSubSource(dataflow_io.NativeSource):
   """Source for the payload of a message as bytes from a Cloud Pub/Sub topic.
 
+  This ``PTransform`` is overridden by a native Pubsub implementation.
 
 Review comment:
   done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83028)
Time Spent: 6h 20m  (was: 6h 10m)

> Support full PubsubMessages
> ---
>
> Key: BEAM-3744
> URL: https://issues.apache.org/jira/browse/BEAM-3744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Tracking changes to Pubsub support in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=83032=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83032
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 22/Mar/18 01:50
Start Date: 22/Mar/18 01:50
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #4901: 
[BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r176272905
 
 

 ##
 File path: sdks/python/apache_beam/utils/timestamp.py
 ##
 @@ -38,6 +40,12 @@ class Timestamp(object):
   """
 
   def __init__(self, seconds=0, micros=0):
+if not isinstance(seconds, (int, float, long)):
+  raise TypeError('Cannot interpret %s %s as seconds.' % (
+seconds, type(seconds)))
 
 Review comment:
   Do you want the type before the value? I think it's fine the way it is.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83032)
Time Spent: 6h 50m  (was: 6h 40m)

> Support full PubsubMessages
> ---
>
> Key: BEAM-3744
> URL: https://issues.apache.org/jira/browse/BEAM-3744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Tracking changes to Pubsub support in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=83027=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83027
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 22/Mar/18 01:50
Start Date: 22/Mar/18 01:50
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #4901: 
[BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r176273904
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/transform_evaluator.py
 ##
 @@ -428,24 +428,41 @@ def _read_from_pubsub(self):
 self._subscription, return_immediately=True,
 max_messages=10) as results:
   def _get_element(message):
-if self.source.with_attributes:
-  return PubsubMessage._from_message(message)
+parsed_message = PubsubMessage._from_message(message)
+if timestamp_attribute:
+  try:
+rfc3339_or_milli = parsed_message.attributes[timestamp_attribute]
+  except KeyError:
+raise KeyError('Timestamp attribute not found: %s' %
+   self.source.timestamp_attribute)
+  try:
+timestamp = Timestamp.from_rfc3339(rfc3339_or_milli)
+  except ValueError:
+try:
+  timestamp = Timestamp(micros=int(rfc3339_or_milli) * 1000)
+except ValueError:
+  raise ValueError('Invalid timestamp value: %s', rfc3339_or_milli)
 
 Review comment:
   done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83027)
Time Spent: 6h 10m  (was: 6h)

> Support full PubsubMessages
> ---
>
> Key: BEAM-3744
> URL: https://issues.apache.org/jira/browse/BEAM-3744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Tracking changes to Pubsub support in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=83029=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83029
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 22/Mar/18 01:50
Start Date: 22/Mar/18 01:50
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #4901: 
[BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r176275145
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/pubsub.py
 ##
 @@ -338,7 +325,7 @@ def display_data(self):
 
   def reader(self):
 raise NotImplementedError(
-'PubSubPayloadSource is not supported in local execution.')
+'PubSubSource is not supported in local execution.')
 
 Review comment:
   It's a bug if this exception is raised. Comment removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83029)
Time Spent: 6.5h  (was: 6h 20m)

> Support full PubsubMessages
> ---
>
> Key: BEAM-3744
> URL: https://issues.apache.org/jira/browse/BEAM-3744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Tracking changes to Pubsub support in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=83033=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83033
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 22/Mar/18 01:50
Start Date: 22/Mar/18 01:50
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #4901: 
[BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r176274830
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -742,7 +742,7 @@ def run_Read(self, transform_node):
   standard_options = (
   transform_node.inputs[0].pipeline.options.view_as(StandardOptions))
   if not standard_options.streaming:
-raise ValueError('PubSubPayloadSource is currently available for use '
+raise ValueError('PubSubSource is currently available for use '
 
 Review comment:
   done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83033)
Time Spent: 7h  (was: 6h 50m)

> Support full PubsubMessages
> ---
>
> Key: BEAM-3744
> URL: https://issues.apache.org/jira/browse/BEAM-3744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Tracking changes to Pubsub support in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Python_ValidatesRunner_Dataflow #1159

2018-03-21 Thread Apache Jenkins Server
See 


Changes:

[herohde] [BEAM-3897] Add Go wordcount example with multi output DoFns

[herohde] [BEAM-3866] Remove windowed value requirement for External

[markliu] Clean up terminal state check in TestDataflowRunner

[tgroh] Make RemoteEnvironment public

--
[...truncated 773.04 KB...]
"serialized_fn": "", 
"user_name": "assert_that/Match"
  }
}, 
{
  "kind": "ParallelDo", 
  "name": "s16", 
  "properties": {
"display_data": [
  {
"key": "fn", 
"label": "Transform Function", 
"namespace": "apache_beam.transforms.core.CallableWrapperDoFn", 
"type": "STRING", 
"value": ""
  }, 
  {
"key": "fn", 
"label": "Transform Function", 
"namespace": "apache_beam.transforms.core.ParDo", 
"shortValue": "CallableWrapperDoFn", 
"type": "STRING", 
"value": "apache_beam.transforms.core.CallableWrapperDoFn"
  }
], 
"non_parallel_inputs": {}, 
"output_info": [
  {
"encoding": {
  "@type": "kind:windowed_value", 
  "component_encodings": [
{
  "@type": "kind:pair", 
  "component_encodings": [
{
  "@type": "kind:bytes"
}, 
{
  "@type": 
"VarIntCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxhiUWeeSXOIA5XIYNmYyFjbSFTkh4A89cR+g==",
 
  "component_encodings": []
}
  ], 
  "is_pair_like": true
}, 
{
  "@type": "kind:global_window"
}
  ], 
  "is_wrapper": true
}, 
"output_name": "out", 
"user_name": "compute/MapToVoidKey0.out"
  }
], 
"parallel_input": {
  "@type": "OutputReference", 
  "output_name": "out", 
  "step_name": "s2"
}, 
"serialized_fn": "", 
"user_name": "compute/MapToVoidKey0"
  }
}
  ], 
  "type": "JOB_TYPE_BATCH"
}
root: INFO: Create job: 
root: INFO: Created job with id: [2018-03-21_18_30_45-4564196773643158513]
root: INFO: To access the Dataflow monitoring console, please navigate to 
https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-21_18_30_45-4564196773643158513?project=apache-beam-testing
root: INFO: Job 2018-03-21_18_30_45-4564196773643158513 is in state 
JOB_STATE_PENDING
root: INFO: 2018-03-22T01:30:45.099Z: JOB_MESSAGE_WARNING: Job 
2018-03-21_18_30_45-4564196773643158513 might autoscale up to 250 workers.
root: INFO: 2018-03-22T01:30:45.119Z: JOB_MESSAGE_DETAILED: Autoscaling is 
enabled for job 2018-03-21_18_30_45-4564196773643158513. The number of workers 
will be between 1 and 250.
root: INFO: 2018-03-22T01:30:45.139Z: JOB_MESSAGE_DETAILED: Autoscaling was 
automatically enabled for job 2018-03-21_18_30_45-4564196773643158513.
root: INFO: 2018-03-22T01:30:48.816Z: JOB_MESSAGE_DETAILED: Checking required 
Cloud APIs are enabled.
root: INFO: 2018-03-22T01:30:48.984Z: JOB_MESSAGE_DETAILED: Checking 
permissions granted to controller Service Account.
root: INFO: 2018-03-22T01:30:49.703Z: JOB_MESSAGE_DETAILED: Expanding 
CoGroupByKey operations into optimizable parts.
root: INFO: 2018-03-22T01:30:49.733Z: JOB_MESSAGE_DEBUG: Combiner lifting 
skipped for step assert_that/Group/GroupByKey: GroupByKey not followed by a 
combiner.
root: INFO: 2018-03-22T01:30:49.756Z: JOB_MESSAGE_DETAILED: Expanding 
GroupByKey operations into optimizable parts.
root: INFO: 2018-03-22T01:30:49.789Z: JOB_MESSAGE_DETAILED: Lifting 
ValueCombiningMappingFns into MergeBucketsMappingFns
root: INFO: 2018-03-22T01:30:49.818Z: JOB_MESSAGE_DEBUG: Annotating graph with 
Autotuner information.
root: INFO: 2018-03-22T01:30:49.859Z: JOB_MESSAGE_DETAILED: Fusing adjacent 
ParDo, Read, Write, and Flatten operations
root: INFO: 2018-03-22T01:30:49.890Z: JOB_MESSAGE_DETAILED: Unzipping flatten 
s11 for input s10.out
root: INFO: 2018-03-22T01:30:49.918Z: JOB_MESSAGE_DETAILED: Fusing unzipped 
copy of assert_that/Group/GroupByKey/Reify, through flatten 
assert_that/Group/Flatten, into producer assert_that/Group/pair_with_1
root: INFO: 2018-03-22T01:30:49.940Z: JOB_MESSAGE_DETAILED: Fusing consumer 
assert_that/Group/GroupByKey/GroupByWindow into 
assert_that/Group/GroupByKey/Read
root: INFO: 2018-03-22T01:30:49.976Z: JOB_MESSAGE_DETAILED: Fusing consumer 
assert_that/Unkey into assert_that/Group/Map(_merge_tagged_vals_under_key)
root: INFO: 2018-03-22T01:30:50.008Z: JOB_MESSAGE_DETAILED: Fusing consumer 
assert_that/Match into assert_that/Unkey
root: INFO: 

[jira] [Commented] (BEAM-3098) Upgrade Java grpc version

2018-03-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408904#comment-16408904
 ] 

Ismaël Mejía commented on BEAM-3098:


Any chance this can happen anytime soon ? I am particularly interested in 
version 1.10.0 because it perfectly aligns netty with the version used by the 
Spark runner of course this is not a long term fix (or real) to the real 
problem of shading but it will help a bit to ease the pain.

> Upgrade Java grpc version
> -
>
> Key: BEAM-3098
> URL: https://issues.apache.org/jira/browse/BEAM-3098
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Solomon Duskis
>Priority: Major
>
> Beam Java currently depends on grpc 1.2, which was released in March.  It 
> would be great if the dependency could be update to something newer, like 
> grpc 1.7.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3824) Use WriteToBigQuery in Python mobile gaming examples.

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3824?focusedWorklogId=83015=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83015
 ]

ASF GitHub Bot logged work on BEAM-3824:


Author: ASF GitHub Bot
Created on: 22/Mar/18 01:13
Start Date: 22/Mar/18 01:13
Worklog Time Spent: 10m 
  Work Description: aaltay opened a new pull request #4932: [BEAM-3824] 
Convert big query writes to beam.io.WriteToBigQuery in mobile gaming example
URL: https://github.com/apache/beam/pull/4932
 
 
   - Replaced `BigQuerySink` with `beam.io.WriteToBigQuery`. Examples could be 
cleaned up a little more, removing the example wrapper for `WriteToBigQuery` 
but this is not as important as removing the use of native sink.
   - Removed the warnings related to DataflowRunner as this is no longer 
applicable.
   
   Tested 4 examples on both Direct and Dataflow runners.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83015)
Time Spent: 20m  (was: 10m)

> Use  WriteToBigQuery in Python mobile gaming examples. 
> ---
>
> Key: BEAM-3824
> URL: https://issues.apache.org/jira/browse/BEAM-3824
> Project: Beam
>  Issue Type: Bug
>  Components: examples-python
>Reporter: Valentyn Tymofieiev
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> python -m apache_beam.examples.complete.game.hourly_team_score 
> --project=$PROJECT --dataset=beam_release_2_4_0 
> --input=gs://$BUCKET/mobile/first_5000_gaming_data.csv
> The pipeline fails with:
> INFO:root:finish  output_tags=['out'], 
> receivers=[ConsumerSet[WriteTeamScoreSums/WriteToBigQuery.out0, 
> coder=WindowedValueCoder[FastPrimitivesCoder], len(consumers)=0]]> 
> Traceback (most recent call last):
>  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
>  "__main__", fname, loader, pkg_name) 
>  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
>  exec code in run_globals 
>  File 
> "/tmp/release_testing/r2.4.0_env/lib/python2.7/site-packages/apache_beam/examples/complete/game/hourly_team_score.py",
>  line 276, in <
> module> 
>  run() 
>  File 
> "/tmp/release_testing/r2.4.0_env/lib/python2.7/site-packages/apache_beam/examples/complete/game/hourly_team_score.py",
>  line 270, in r
> un 
>  write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)) 
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 389, in __exit__
>  self.run().wait_until_finish() 
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 369, in run
>  self.to_runner_api(), self.runner, self._options).run(False) 
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 382, in run
>  return self.runner.run_pipeline(self) 
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/direct/direct_runner.py",
>  line 129, in run_pip
> eline 
>  return runner.run_pipeline(pipeline)
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 215, in ru
> n_pipeline 
>  return self.run_via_runner_api(pipeline.to_runner_api())
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 218, in ru
> n_via_runner_api 
>  return self.run_stages(*self.create_stages(pipeline_proto))
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 837, in ru
> n_stages 
>  pcoll_buffers, safe_coders).process_bundle.metrics
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 938, in ru
> n_stage 
>  self._progress_frequency).process_bundle(data_input, data_output)
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 1110, in p
> rocess_bundle 
>  result_future = self._controller.control_handler.push(process_bundle)
>  File 
> "/tmp/release_testing/r2.4.0_env/local/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 1003, in p
> ush 
>  response = self.worker.do_instruction(request)
>  File 
> 

[jira] [Work logged] (BEAM-3909) Add tests for Flink DoFnOperator side-input checkpointing

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3909?focusedWorklogId=83014=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83014
 ]

ASF GitHub Bot logged work on BEAM-3909:


Author: ASF GitHub Bot
Created on: 22/Mar/18 01:12
Start Date: 22/Mar/18 01:12
Worklog Time Spent: 10m 
  Work Description: aljoscha opened a new pull request #4931: [BEAM-3909] 
Add tests for Flink DoFnOperator side-input checkpointing
URL: https://github.com/apache/beam/pull/4931
 
 
   We test both checkpointing of pushed-back data and of side-input data.
   
   This extends `PushbackSideInputDoFnRunner` to allow resetting any cached 
information we keep about ready windows. Otherwise, it will not consider side 
inputs that might have become ready in the same bundle.
   
   Other than that, this only adds tests. The structure of the tests is:
- Feed in some data
- Checkpoint the operator and throw it away
- Restore the operator
- Send in some new data and verify that we get what we expect
   
   In the first two tests we first send in side-input data, which tests that 
side-inputs are correctly checkpointed. In the second set of tests we first 
send in main-input data, which is retained because side inputs are not ready.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83014)
Time Spent: 10m
Remaining Estimate: 0h

> Add tests for Flink DoFnOperator side-input checkpointing
> -
>
> Key: BEAM-3909
> URL: https://issues.apache.org/jira/browse/BEAM-3909
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3855) Add Go SDK support for protobuf coder

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3855?focusedWorklogId=83013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83013
 ]

ASF GitHub Bot logged work on BEAM-3855:


Author: ASF GitHub Bot
Created on: 22/Mar/18 01:09
Start Date: 22/Mar/18 01:09
Worklog Time Spent: 10m 
  Work Description: herohde commented on issue #4908: BEAM-3855: Add 
Protocol Buffer support
URL: https://github.com/apache/beam/pull/4908#issuecomment-375145473
 
 
   Yep


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83013)
Time Spent: 1h 40m  (was: 1.5h)

> Add Go SDK support for protobuf coder
> -
>
> Key: BEAM-3855
> URL: https://issues.apache.org/jira/browse/BEAM-3855
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Willy Lulciuc
>Assignee: Bill Neubauer
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This JIRA is for something functional. We might want to use the coder 
> registry for a more general solution, when implemented. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-3910) Support floating point values in Go SDK

2018-03-21 Thread Bill Neubauer (JIRA)
Bill Neubauer created BEAM-3910:
---

 Summary: Support floating point values in Go SDK
 Key: BEAM-3910
 URL: https://issues.apache.org/jira/browse/BEAM-3910
 Project: Beam
  Issue Type: New Feature
  Components: sdk-go
Reporter: Bill Neubauer
Assignee: Bill Neubauer


The Go SDK supports all the integer types of the language, but does not support 
floats.

My plan for coding is to use the same technique the gob package uses, which 
results in a compact encoding for simple values.

[https://golang.org/src/encoding/gob/encode.go?#L210|https://golang.org/src/encoding/gob/encode.go#L210]
 with rationale explained in 
https://golang.org/pkg/encoding/gob/#hdr-Encoding_Details

The resulting uint is then encoded using the existing coders in coderx.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3855) Add Go SDK support for protobuf coder

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3855?focusedWorklogId=83012=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83012
 ]

ASF GitHub Bot logged work on BEAM-3855:


Author: ASF GitHub Bot
Created on: 22/Mar/18 01:03
Start Date: 22/Mar/18 01:03
Worklog Time Spent: 10m 
  Work Description: wcn3 commented on issue #4908: BEAM-3855: Add Protocol 
Buffer support
URL: https://github.com/apache/beam/pull/4908#issuecomment-375144582
 
 
   Got it! Said differently, Concrete will just mean the set of types that can 
be encoded "natively" and the new type, say UserCoded, will capture the concept 
that's been introduced into IsConcrete.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83012)
Time Spent: 1.5h  (was: 1h 20m)

> Add Go SDK support for protobuf coder
> -
>
> Key: BEAM-3855
> URL: https://issues.apache.org/jira/browse/BEAM-3855
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Willy Lulciuc
>Assignee: Bill Neubauer
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This JIRA is for something functional. We might want to use the coder 
> registry for a more general solution, when implemented. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-3909) Add tests for Flink DoFnOperator side-input checkpointing

2018-03-21 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created BEAM-3909:
--

 Summary: Add tests for Flink DoFnOperator side-input checkpointing
 Key: BEAM-3909
 URL: https://issues.apache.org/jira/browse/BEAM-3909
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3855) Add Go SDK support for protobuf coder

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3855?focusedWorklogId=83010=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83010
 ]

ASF GitHub Bot logged work on BEAM-3855:


Author: ASF GitHub Bot
Created on: 22/Mar/18 00:59
Start Date: 22/Mar/18 00:59
Worklog Time Spent: 10m 
  Work Description: herohde commented on issue #4908: BEAM-3855: Add 
Protocol Buffer support
URL: https://github.com/apache/beam/pull/4908#issuecomment-375144050
 
 
   LGTM.
   
   By "non-concrete codeable types", I meant types that require a user-provided 
coder to work (= are currently classified as Invalid). Protobuf is one example. 
I expect some interfaces will also fall into this category. While protobufs 
might play well with JSON as fields, it won't be the case in general. That's 
why I'm thinking we may want to add a new class to handle such types. But that 
is something to consider as a part of a proper coder registry.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83010)
Time Spent: 1h 20m  (was: 1h 10m)

> Add Go SDK support for protobuf coder
> -
>
> Key: BEAM-3855
> URL: https://issues.apache.org/jira/browse/BEAM-3855
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Willy Lulciuc
>Assignee: Bill Neubauer
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This JIRA is for something functional. We might want to use the coder 
> registry for a more general solution, when implemented. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] 01/01: [BEAM-3897] Add Go wordcount example with multi output DoFns

2018-03-21 Thread lcwik
This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit c45a2b2cfba282ec59479791c7931372a28e9e56
Merge: 9a94ade c23ed34
Author: Lukasz Cwik 
AuthorDate: Wed Mar 21 17:43:24 2018 -0700

[BEAM-3897] Add Go wordcount example with multi output DoFns

 sdks/go/examples/multiout/multiout.go | 79 +++
 1 file changed, 79 insertions(+)

-- 
To stop receiving notification emails like this one, please contact
lc...@apache.org.


[jira] [Work logged] (BEAM-3897) Flink runners fails on multioutput portable pipeline

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3897?focusedWorklogId=83005=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83005
 ]

ASF GitHub Bot logged work on BEAM-3897:


Author: ASF GitHub Bot
Created on: 22/Mar/18 00:43
Start Date: 22/Mar/18 00:43
Worklog Time Spent: 10m 
  Work Description: lukecwik closed pull request #4913: [BEAM-3897] Add Go 
wordcount example with multi output DoFns
URL: https://github.com/apache/beam/pull/4913
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/go/examples/multiout/multiout.go 
b/sdks/go/examples/multiout/multiout.go
new file mode 100644
index 000..43b480dbf5d
--- /dev/null
+++ b/sdks/go/examples/multiout/multiout.go
@@ -0,0 +1,79 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// multiout is a wordcount variation that uses a multi-outout DoFn
+// and writes 2 output files.
+package main
+
+import (
+   "context"
+   "flag"
+   "fmt"
+   "log"
+   "regexp"
+
+   "github.com/apache/beam/sdks/go/pkg/beam"
+   "github.com/apache/beam/sdks/go/pkg/beam/io/textio"
+   "github.com/apache/beam/sdks/go/pkg/beam/transforms/stats"
+   "github.com/apache/beam/sdks/go/pkg/beam/x/beamx"
+)
+
+var (
+   input = flag.String("input", 
"gs://apache-beam-samples/shakespeare/kinglear.txt", "File(s) to read.")
+   small = flag.String("small", "", "Output file for small words 
(required).")
+   big   = flag.String("big", "", "Output file for big words (required).")
+)
+
+var wordRE = regexp.MustCompile(`[a-zA-Z]+('[a-z])?`)
+
+func splitFn(line string, big, small func(string)) {
+   for _, word := range wordRE.FindAllString(line, -1) {
+   if len(word) > 5 {
+   big(word)
+   } else {
+   small(word)
+   }
+   }
+}
+
+func formatFn(w string, c int) string {
+   return fmt.Sprintf("%s: %v", w, c)
+}
+
+func writeCounts(s beam.Scope, col beam.PCollection, filename string) {
+   counted := stats.Count(s, col)
+   textio.Write(s, filename, beam.ParDo(s, formatFn, counted))
+}
+
+func main() {
+   flag.Parse()
+   beam.Init()
+
+   if *small == "" || *big == "" {
+   log.Fatal("No outputs provided")
+   }
+
+   p := beam.NewPipeline()
+   s := p.Root()
+
+   lines := textio.Read(s, *input)
+   bcol, scol := beam.ParDo2(s, splitFn, lines)
+   writeCounts(s, bcol, *big)
+   writeCounts(s, scol, *small)
+
+   if err := beamx.Run(context.Background(), p); err != nil {
+   log.Fatalf("Failed to execute job: %v", err)
+   }
+}


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83005)
Time Spent: 0.5h  (was: 20m)

> Flink runners fails on multioutput portable pipeline
> 
>
> Key: BEAM-3897
> URL: https://issues.apache.org/jira/browse/BEAM-3897
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Henning Rohde
>Assignee: Ben Sidhom
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When trying a Go pipeline with a 2-output DoFn on the hacking branch, I see:
> [flink-runner-job-server] ERROR 
> org.apache.beam.runners.flink.FlinkJobInvocation - Error during job 
> invocation go-job-1521582585657843000_-2121541089_763230090.
> java.lang.RuntimeException: Pipeline execution failed
>   at 

[beam] branch master updated (9a94ade -> c45a2b2)

2018-03-21 Thread lcwik
This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 9a94ade  Clean up terminal state check in TestDataflowRunner
 add c23ed34  [BEAM-3897] Add Go wordcount example with multi output DoFns
 new c45a2b2  [BEAM-3897] Add Go wordcount example with multi output DoFns

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../{wordcap/wordcap.go => multiout/multiout.go}   | 58 --
 1 file changed, 31 insertions(+), 27 deletions(-)
 copy sdks/go/examples/{wordcap/wordcap.go => multiout/multiout.go} (54%)

-- 
To stop receiving notification emails like this one, please contact
lc...@apache.org.


Jenkins build is back to stable : beam_PostCommit_Java_MavenInstall #6264

2018-03-21 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-3855) Add Go SDK support for protobuf coder

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3855?focusedWorklogId=83004=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83004
 ]

ASF GitHub Bot logged work on BEAM-3855:


Author: ASF GitHub Bot
Created on: 22/Mar/18 00:39
Start Date: 22/Mar/18 00:39
Worklog Time Spent: 10m 
  Work Description: wcn3 commented on issue #4908: BEAM-3855: Add Protocol 
Buffer support
URL: https://github.com/apache/beam/pull/4908#issuecomment-375141079
 
 
   This version has been tested internally to Google. The decode routine did 
have a bug where the instantiated type wasn't correct. This is resolved and 
I've validated the messages are serialized and deserialized using the proto 
codings.
   
   I'll still continue to develop a Cloud Dataflow suite, but if we want to 
submit this now, I feel confident in this code.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83004)
Time Spent: 1h 10m  (was: 1h)

> Add Go SDK support for protobuf coder
> -
>
> Key: BEAM-3855
> URL: https://issues.apache.org/jira/browse/BEAM-3855
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Willy Lulciuc
>Assignee: Bill Neubauer
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This JIRA is for something functional. We might want to use the coder 
> registry for a more general solution, when implemented. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3861) Build test infra for end-to-end streaming test in Python SDK

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3861?focusedWorklogId=83001=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83001
 ]

ASF GitHub Bot logged work on BEAM-3861:


Author: ASF GitHub Bot
Created on: 22/Mar/18 00:33
Start Date: 22/Mar/18 00:33
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #4930: [BEAM-3861] 
Complete streaming wordcount test in Python SDK
URL: https://github.com/apache/beam/pull/4930#issuecomment-375140129
 
 
   +R: @aaltay PTAL


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83001)
Time Spent: 4.5h  (was: 4h 20m)

> Build test infra for end-to-end streaming test in Python SDK
> 
>
> Key: BEAM-3861
> URL: https://issues.apache.org/jira/browse/BEAM-3861
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3861) Build test infra for end-to-end streaming test in Python SDK

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3861?focusedWorklogId=83002=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83002
 ]

ASF GitHub Bot logged work on BEAM-3861:


Author: ASF GitHub Bot
Created on: 22/Mar/18 00:33
Start Date: 22/Mar/18 00:33
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #4930: [BEAM-3861] 
Complete streaming wordcount test in Python SDK
URL: https://github.com/apache/beam/pull/4930#issuecomment-375140129
 
 
   +R: @aaltay Ready to review. PTAL


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83002)
Time Spent: 4h 40m  (was: 4.5h)

> Build test infra for end-to-end streaming test in Python SDK
> 
>
> Key: BEAM-3861
> URL: https://issues.apache.org/jira/browse/BEAM-3861
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3861) Build test infra for end-to-end streaming test in Python SDK

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3861?focusedWorklogId=83000=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83000
 ]

ASF GitHub Bot logged work on BEAM-3861:


Author: ASF GitHub Bot
Created on: 22/Mar/18 00:30
Start Date: 22/Mar/18 00:30
Worklog Time Spent: 10m 
  Work Description: markflyhigh opened a new pull request #4930: 
[BEAM-3861] Complete streaming wordcount test in Python SDK
URL: https://github.com/apache/beam/pull/4930
 
 
   Complete Python end-to-end testing framework to support streaming pipelines 
that use PubSub. 
   
   Improvements are:
   - Add `PubSubMessageMatcher` which is a customized hamcrest matcher to pull 
messages from given subscription and verify content.
   - Auto-cancel streaming pipeline by test framework after all matchers are 
verified.
   - Few minor improvements and cleanup in `TestDataflowRunner`
   
   `StreamingWordCountITTest` can verify output messages and cancel pipeline 
after this change. The overall test workflow is:
   
   1. Set up PubSub topics and subscriptions.
   1. Build test options including required pipeline options and test verifiers.
   1. Inject data to input topic.
   1. Call `StreamingWordCount.run` with above options to start the pipeline.
   1. Wait until pipeline start running.
   1. Run verifiers.
   1. Cancel streaming job.
   1. Cleanup PubSub topics and subscriptions.
   
   Note: If other two improvement PRs 
([PR-4921](https://github.com/apache/beam/pull/4921), 
[PR-4922](https://github.com/apache/beam/pull/4922)) are merged before this 
one, rebase is required before merging.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [x] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [x] Write a pull request description that is detailed enough to 
understand:
  - [ ] What the pull request does
  - [ ] Why it does it
  - [ ] How it does it
  - [ ] Why this approach
- [ ] Each commit in the pull request should have a meaningful subject line 
and body.
- [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83000)
Time Spent: 4h 20m  (was: 4h 10m)

> Build test infra for end-to-end streaming test in Python SDK
> 
>
> Key: BEAM-3861
> URL: https://issues.apache.org/jira/browse/BEAM-3861
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3908) Leaderboard / gamestats leaking Dataflow Jobs

2018-03-21 Thread Luke Cwik (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-3908:

Description: 
I found that the leaderboard/gamestats Dataflow streaming jobs weren't being 
cleaned up by the test infrastructure which lead to quota issues because all 
the VMs/disks/memory being consumed causing other Jenkins jobs to fail.

I manually stopped all the jobs that had been running for more then 12 hrs. 
There were about 20 jobs like this.

Example links:
https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-06_22_19_25-7861256924404398606?project=apache-beam-testing=433637338589
https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-06_18_49_54-7185486205606862436?project=apache-beam-testing=433637338589
https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-06_23_14_26-6599347078760080693?project=apache-beam-testing=433637338589
https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-06_18_32_33-7276493109541122240?project=apache-beam-testing=433637338589


  was:
I found that the leaderboard/gamestats Dataflow streaming jobs weren't being 
cleaned up by the test infrastructure which lead to quota issues because all 
the VMs/disks/memory being consumed causing other Jenkins jobs to fail.

I manually stopped all the jobs that had been running for more then 12 hrs.

Example links:
https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-06_22_19_25-7861256924404398606?project=apache-beam-testing=433637338589
https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-06_18_49_54-7185486205606862436?project=apache-beam-testing=433637338589
https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-06_23_14_26-6599347078760080693?project=apache-beam-testing=433637338589
https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-06_18_32_33-7276493109541122240?project=apache-beam-testing=433637338589



> Leaderboard / gamestats leaking Dataflow Jobs
> -
>
> Key: BEAM-3908
> URL: https://issues.apache.org/jira/browse/BEAM-3908
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, testing
>Affects Versions: Not applicable
>Reporter: Luke Cwik
>Assignee: Alan Myrvold
>Priority: Critical
>
> I found that the leaderboard/gamestats Dataflow streaming jobs weren't being 
> cleaned up by the test infrastructure which lead to quota issues because all 
> the VMs/disks/memory being consumed causing other Jenkins jobs to fail.
> I manually stopped all the jobs that had been running for more then 12 hrs. 
> There were about 20 jobs like this.
> Example links:
> https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-06_22_19_25-7861256924404398606?project=apache-beam-testing=433637338589
> https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-06_18_49_54-7185486205606862436?project=apache-beam-testing=433637338589
> https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-06_23_14_26-6599347078760080693?project=apache-beam-testing=433637338589
> https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-06_18_32_33-7276493109541122240?project=apache-beam-testing=433637338589



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-3908) Leaderboard / gamestats leaking Dataflow Jobs

2018-03-21 Thread Luke Cwik (JIRA)
Luke Cwik created BEAM-3908:
---

 Summary: Leaderboard / gamestats leaking Dataflow Jobs
 Key: BEAM-3908
 URL: https://issues.apache.org/jira/browse/BEAM-3908
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow, testing
Affects Versions: Not applicable
Reporter: Luke Cwik
Assignee: Alan Myrvold


I found that the leaderboard/gamestats Dataflow streaming jobs weren't being 
cleaned up by the test infrastructure which lead to quota issues because all 
the VMs/disks/memory being consumed causing other Jenkins jobs to fail.

I manually stopped all the jobs that had been running for more then 12 hrs.

Example links:
https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-06_22_19_25-7861256924404398606?project=apache-beam-testing=433637338589
https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-06_18_49_54-7185486205606862436?project=apache-beam-testing=433637338589
https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-06_23_14_26-6599347078760080693?project=apache-beam-testing=433637338589
https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-06_18_32_33-7276493109541122240?project=apache-beam-testing=433637338589




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-3907) Clarify how watermark is estimated for watchForNewFiles() transforms

2018-03-21 Thread Chamikara Jayalath (JIRA)
Chamikara Jayalath created BEAM-3907:


 Summary: Clarify how watermark is estimated for watchForNewFiles() 
transforms
 Key: BEAM-3907
 URL: https://issues.apache.org/jira/browse/BEAM-3907
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Chamikara Jayalath
Assignee: Eugene Kirpichov


For example 
[https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java#L488]

It's not clear how the watermark will be estimated/incremented when using these 
transforms. 

Other source implementations seems to be describing this. For example: 
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSource.java#L89



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3565) Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3565?focusedWorklogId=82996=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82996
 ]

ASF GitHub Bot logged work on BEAM-3565:


Author: ASF GitHub Bot
Created on: 21/Mar/18 23:46
Start Date: 21/Mar/18 23:46
Worklog Time Spent: 10m 
  Work Description: tgroh commented on issue #4777: [BEAM-3565] Add 
FusedPipeline#toPipeline
URL: https://github.com/apache/beam/pull/4777#issuecomment-375132231
 
 
   Done to all


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82996)
Time Spent: 18h  (was: 17h 50m)

> Add utilities for producing a collection of PTransforms that can execute in a 
> single SDK Harness
> 
>
> Key: BEAM-3565
> URL: https://issues.apache.org/jira/browse/BEAM-3565
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
> Fix For: 2.4.0
>
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> An SDK Harness executes some ("fused") collection of PTransforms. The java 
> runner libraries should provide some way to take a Pipeline that executes in 
> both a runner and an environment and construct a collection of transforms 
> which can execute within a single environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-3906) Get Python Wheel Validation Automated

2018-03-21 Thread yifan zou (JIRA)
yifan zou created BEAM-3906:
---

 Summary: Get Python Wheel Validation Automated
 Key: BEAM-3906
 URL: https://issues.apache.org/jira/browse/BEAM-3906
 Project: Beam
  Issue Type: Sub-task
  Components: examples-python, testing
Reporter: yifan zou
Assignee: yifan zou






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3565) Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3565?focusedWorklogId=82987=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82987
 ]

ASF GitHub Bot logged work on BEAM-3565:


Author: ASF GitHub Bot
Created on: 21/Mar/18 23:18
Start Date: 21/Mar/18 23:18
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #4777: 
[BEAM-3565] Add FusedPipeline#toPipeline
URL: https://github.com/apache/beam/pull/4777#discussion_r176269191
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/FusedPipelineTest.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.core.construction.graph;
+
+import static com.google.common.base.Preconditions.checkState;
+import static org.hamcrest.Matchers.allOf;
+import static org.hamcrest.Matchers.equalTo;
+import static org.hamcrest.Matchers.hasItem;
+import static org.hamcrest.Matchers.hasItems;
+import static org.hamcrest.Matchers.hasKey;
+import static org.hamcrest.Matchers.not;
+import static org.hamcrest.Matchers.startsWith;
+import static org.junit.Assert.assertThat;
+
+import com.google.common.collect.ImmutableList;
+import java.io.Serializable;
+import java.util.HashSet;
+import java.util.Set;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform;
+import org.apache.beam.runners.core.construction.PipelineTranslation;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.transforms.GroupByKey;
+import org.apache.beam.sdk.transforms.Impulse;
+import org.apache.beam.sdk.transforms.MapElements;
+import org.apache.beam.sdk.transforms.Values;
+import org.apache.beam.sdk.transforms.WithKeys;
+import org.apache.beam.sdk.values.TypeDescriptors;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link FusedPipeline}. */
+@RunWith(JUnit4.class)
+public class FusedPipelineTest implements Serializable {
+  @Test
+  public void testToProto() {
+Pipeline p = Pipeline.create();
+p.apply("impulse", Impulse.create())
+.apply("map", MapElements.into(TypeDescriptors.integers()).via(bytes 
-> bytes.length))
+.apply("key", WithKeys.of("foo"))
+.apply("gbk", GroupByKey.create())
+.apply("values", Values.create());
+
+RunnerApi.Pipeline protoPipeline = PipelineTranslation.toProto(p);
+checkState(
+protoPipeline
+.getRootTransformIdsList()
+.containsAll(ImmutableList.of("impulse", "map", "key", "gbk", 
"values")),
+"Unexpected Root Transform IDs %s",
+protoPipeline.getRootTransformIdsList());
+
+FusedPipeline fused = GreedyPipelineFuser.fuse(protoPipeline);
+checkState(
+fused.getRunnerExecutedTransforms().size() == 2,
+"Unexpected number of runner transforms %s",
+fused.getRunnerExecutedTransforms());
+checkState(
+fused.getFusedStages().size() == 2,
+"Unexpected number of fused stages %s",
+fused.getFusedStages());
+RunnerApi.Pipeline fusedProto = 
fused.toPipeline(protoPipeline.getComponents());
+
+assertThat(
+"Root Transforms should all be present in the Pipeline Components",
+fusedProto.getComponents().getTransformsMap().keySet(),
+hasItems(fusedProto.getRootTransformIdsList().toArray(new String[0])));
+assertThat(
+"Should contain Impulse, GroupByKey, and two Environment Stages",
+fusedProto.getRootTransformIdsCount(),
+equalTo(4));
+assertThat(fusedProto.getRootTransformIdsList(), hasItems("impulse", 
"gbk"));
+assertRootsInTopologicalOrder(fusedProto);
+// Since MapElements, WithKeys, and Values are all composites of a ParDo, 
we do prefix matching
+// instead of looking at the inside of their expansions
+assertThat(
+"Fused transforms should be present in the components",
+

[jira] [Work logged] (BEAM-3565) Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3565?focusedWorklogId=82986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82986
 ]

ASF GitHub Bot logged work on BEAM-3565:


Author: ASF GitHub Bot
Created on: 21/Mar/18 23:18
Start Date: 21/Mar/18 23:18
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #4777: 
[BEAM-3565] Add FusedPipeline#toPipeline
URL: https://github.com/apache/beam/pull/4777#discussion_r176267337
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/FusedPipeline.java
 ##
 @@ -36,25 +36,45 @@
 @AutoValue
 public abstract class FusedPipeline {
   static FusedPipeline of(
-  Set environmentalStages, Set 
runnerStages) {
-return new AutoValue_FusedPipeline(environmentalStages, runnerStages);
+  Components components,
+  Set environmentalStages,
+  Set runnerStages) {
+return new AutoValue_FusedPipeline(components, environmentalStages, 
runnerStages);
   }
 
+  abstract Components getComponents();
+
   /** The {@link ExecutableStage executable stages} that are executed by SDK 
harnesses. */
   public abstract Set getFusedStages();
 
   /** The {@link PTransform PTransforms} that a runner is responsible for 
executing. */
   public abstract Set getRunnerExecutedTransforms();
 
-  public RunnerApi.Pipeline toPipeline(Components initialComponents) {
-Map executableTransforms = 
getExecutableTransforms(initialComponents);
-Components fusedComponents = initialComponents.toBuilder()
-.putAllTransforms(executableTransforms)
-.putAllTransforms(getFusedTransforms())
-.build();
+  /**
+   * Returns the {@link RunnerApi.Pipeline} representation of this {@link 
FusedPipeline}.
+   *
+   * The {@link Components} of the returned pipeline will contain all of 
the {@link PTransform
+   * PTransforms} present in the original Pipeline that this {@link 
FusedPipeline} was created from,
+   * plus all of the {@link ExecutableStage ExecutableStages} contained within 
this {@link
+   * FusedPipeline}. The Root Transform IDs will contain all of the runner 
executed transforms and
 
 Review comment:
   The upper casing on Root Transform IDs is strange, would you rather link the 
Pipeline root transform ids method?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82986)
Time Spent: 17h 40m  (was: 17.5h)

> Add utilities for producing a collection of PTransforms that can execute in a 
> single SDK Harness
> 
>
> Key: BEAM-3565
> URL: https://issues.apache.org/jira/browse/BEAM-3565
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
> Fix For: 2.4.0
>
>  Time Spent: 17h 40m
>  Remaining Estimate: 0h
>
> An SDK Harness executes some ("fused") collection of PTransforms. The java 
> runner libraries should provide some way to take a Pipeline that executes in 
> both a runner and an environment and construct a collection of transforms 
> which can execute within a single environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3565) Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3565?focusedWorklogId=82984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82984
 ]

ASF GitHub Bot logged work on BEAM-3565:


Author: ASF GitHub Bot
Created on: 21/Mar/18 23:18
Start Date: 21/Mar/18 23:18
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #4777: 
[BEAM-3565] Add FusedPipeline#toPipeline
URL: https://github.com/apache/beam/pull/4777#discussion_r176266684
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/FusedPipeline.java
 ##
 @@ -36,25 +36,45 @@
 @AutoValue
 public abstract class FusedPipeline {
   static FusedPipeline of(
-  Set environmentalStages, Set 
runnerStages) {
-return new AutoValue_FusedPipeline(environmentalStages, runnerStages);
+  Components components,
+  Set environmentalStages,
+  Set runnerStages) {
+return new AutoValue_FusedPipeline(components, environmentalStages, 
runnerStages);
   }
 
+  abstract Components getComponents();
+
   /** The {@link ExecutableStage executable stages} that are executed by SDK 
harnesses. */
   public abstract Set getFusedStages();
 
   /** The {@link PTransform PTransforms} that a runner is responsible for 
executing. */
   public abstract Set getRunnerExecutedTransforms();
 
-  public RunnerApi.Pipeline toPipeline(Components initialComponents) {
-Map executableTransforms = 
getExecutableTransforms(initialComponents);
-Components fusedComponents = initialComponents.toBuilder()
-.putAllTransforms(executableTransforms)
-.putAllTransforms(getFusedTransforms())
-.build();
+  /**
+   * Returns the {@link RunnerApi.Pipeline} representation of this {@link 
FusedPipeline}.
+   *
+   * The {@link Components} of the returned pipeline will contain all of 
the {@link PTransform
+   * PTransforms} present in the original Pipeline that this {@link 
FusedPipeline} was created from,
+   * plus all of the {@link ExecutableStage ExecutableStages} contained within 
this {@link
+   * FusedPipeline}. The Root Transform IDs will contain all of the runner 
executed transforms and
+   * all of the ExecutableStages contained within the Pipeline.
 
 Review comment:
   `ExecutableStages` -> `{@link ExecutableStage executable stages}`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82984)
Time Spent: 17h 20m  (was: 17h 10m)

> Add utilities for producing a collection of PTransforms that can execute in a 
> single SDK Harness
> 
>
> Key: BEAM-3565
> URL: https://issues.apache.org/jira/browse/BEAM-3565
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
> Fix For: 2.4.0
>
>  Time Spent: 17h 20m
>  Remaining Estimate: 0h
>
> An SDK Harness executes some ("fused") collection of PTransforms. The java 
> runner libraries should provide some way to take a Pipeline that executes in 
> both a runner and an environment and construct a collection of transforms 
> which can execute within a single environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #6263

2018-03-21 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-3565) Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3565?focusedWorklogId=82985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82985
 ]

ASF GitHub Bot logged work on BEAM-3565:


Author: ASF GitHub Bot
Created on: 21/Mar/18 23:18
Start Date: 21/Mar/18 23:18
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #4777: 
[BEAM-3565] Add FusedPipeline#toPipeline
URL: https://github.com/apache/beam/pull/4777#discussion_r176265936
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/FusedPipeline.java
 ##
 @@ -19,54 +19,83 @@
 package org.apache.beam.runners.core.construction.graph;
 
 import com.google.auto.value.AutoValue;
+import com.google.common.collect.Sets;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
 import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.StreamSupport;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
 import org.apache.beam.model.pipeline.v1.RunnerApi.Components;
-import org.apache.beam.model.pipeline.v1.RunnerApi.Components.Builder;
 import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform;
 import org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline;
 import 
org.apache.beam.runners.core.construction.graph.PipelineNode.PTransformNode;
 
-/**
- * A {@link Pipeline} which has been separated into collections of executable 
components.
- */
+/** A {@link Pipeline} which has been separated into collections of executable 
components. */
 @AutoValue
 public abstract class FusedPipeline {
   static FusedPipeline of(
   Set environmentalStages, Set 
runnerStages) {
 return new AutoValue_FusedPipeline(environmentalStages, runnerStages);
   }
 
-  /**
-   * The {@link ExecutableStage executable stages} that are executed by SDK 
harnesses.
-   */
+  /** The {@link ExecutableStage executable stages} that are executed by SDK 
harnesses. */
   public abstract Set getFusedStages();
 
-  /**
-   * The {@link PTransform PTransforms} that a runner is responsible for 
executing.
-   */
+  /** The {@link PTransform PTransforms} that a runner is responsible for 
executing. */
   public abstract Set getRunnerExecutedTransforms();
 
+  public RunnerApi.Pipeline toPipeline(Components initialComponents) {
+Map executableTransforms = 
getExecutableTransforms(initialComponents);
+Components fusedComponents = initialComponents.toBuilder()
+.putAllTransforms(executableTransforms)
+.putAllTransforms(getFusedTransforms())
+.build();
+List rootTransformIds =
+StreamSupport.stream(
+QueryablePipeline.forTransforms(executableTransforms.keySet(), 
fusedComponents)
+.getTopologicallyOrderedTransforms()
+.spliterator(),
+false)
+.map(PTransformNode::getId)
+.collect(Collectors.toList());
+return Pipeline.newBuilder()
+.setComponents(fusedComponents)
+.addAllRootTransformIds(rootTransformIds)
+.build();
+  }
+
   /**
-   * Return a {@link Components} like the {@code base} components, but with 
the only transforms
-   * equal to this fused pipeline.
+   * Return a {@link Components} like the {@code base} components, but with 
the set of transforms to
 
 Review comment:
   This comment still seems out of date since it refers to `base` components.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82985)
Time Spent: 17.5h  (was: 17h 20m)

> Add utilities for producing a collection of PTransforms that can execute in a 
> single SDK Harness
> 
>
> Key: BEAM-3565
> URL: https://issues.apache.org/jira/browse/BEAM-3565
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
> Fix For: 2.4.0
>
>  Time Spent: 17.5h
>  Remaining Estimate: 0h
>
> An SDK Harness executes some ("fused") collection of PTransforms. The java 
> runner libraries should provide some way to take a Pipeline that executes in 
> both a runner and an environment and construct a collection of transforms 
> which can execute within a single environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #5196

2018-03-21 Thread Apache Jenkins Server
See 




[beam] 01/01: Merge pull request #4927: Make RemoteEnvironment public

2018-03-21 Thread tgroh
This is an automated email from the ASF dual-hosted git repository.

tgroh pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit dff8f9f4887285c3c6fa316ee982e9eee25fdfa0
Merge: 75c58a0 bab17ba
Author: Thomas Groh 
AuthorDate: Wed Mar 21 16:03:18 2018 -0700

Merge pull request #4927: Make RemoteEnvironment public

 .../apache/beam/runners/fnexecution/environment/RemoteEnvironment.java  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
To stop receiving notification emails like this one, please contact
tg...@apache.org.


[jira] [Created] (BEAM-3905) Update Flink Runner to Flink 1.5.0

2018-03-21 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created BEAM-3905:
--

 Summary: Update Flink Runner to Flink 1.5.0
 Key: BEAM-3905
 URL: https://issues.apache.org/jira/browse/BEAM-3905
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (75c58a0 -> dff8f9f)

2018-03-21 Thread tgroh
This is an automated email from the ASF dual-hosted git repository.

tgroh pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 75c58a0  Merge pull request #4919: Remove windowed value requirement 
for Go SDK External
 add bab17ba  Make RemoteEnvironment public
 new dff8f9f  Merge pull request #4927: Make RemoteEnvironment public

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../apache/beam/runners/fnexecution/environment/RemoteEnvironment.java  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
To stop receiving notification emails like this one, please contact
tg...@apache.org.


Build failed in Jenkins: beam_PostCommit_Python_Verify #4477

2018-03-21 Thread Apache Jenkins Server
See 


Changes:

[Pablo] Fixing check for sideinput_io_metrics experiment flag.

[iemejia] Remove testing package-info from main package for GCP IO

[iemejia] Update maven failsafe/surefire plugin to version 2.21.0

[iemejia] [BEAM-3873] Update commons-compress to version 1.16.1 (fix

[iemejia] Remove maven warnings

[tgroh] Add Side Inputs to ExecutableStage

--
[...truncated 1.12 MB...]
 stageStates: []
 steps: []
 tempFiles: []
 type: TypeValueValuesEnum(JOB_TYPE_BATCH, 1)>
root: INFO: Created job with id: [2018-03-21_15_53_16-214525430219642121]
root: INFO: To access the Dataflow monitoring console, please navigate to 
https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-21_15_53_16-214525430219642121?project=apache-beam-testing
root: INFO: Job 2018-03-21_15_53_16-214525430219642121 is in state 
JOB_STATE_PENDING
root: INFO: 2018-03-21T22:53:16.074Z: JOB_MESSAGE_WARNING: Job 
2018-03-21_15_53_16-214525430219642121 might autoscale up to 250 workers.
root: INFO: 2018-03-21T22:53:16.111Z: JOB_MESSAGE_DETAILED: Autoscaling is 
enabled for job 2018-03-21_15_53_16-214525430219642121. The number of workers 
will be between 1 and 250.
root: INFO: 2018-03-21T22:53:16.139Z: JOB_MESSAGE_DETAILED: Autoscaling was 
automatically enabled for job 2018-03-21_15_53_16-214525430219642121.
root: INFO: 2018-03-21T22:53:19.047Z: JOB_MESSAGE_DETAILED: Checking required 
Cloud APIs are enabled.
root: INFO: 2018-03-21T22:53:19.158Z: JOB_MESSAGE_DETAILED: Checking 
permissions granted to controller Service Account.
root: INFO: 2018-03-21T22:53:19.459Z: JOB_MESSAGE_DETAILED: Expanding 
CoGroupByKey operations into optimizable parts.
root: INFO: 2018-03-21T22:53:19.475Z: JOB_MESSAGE_DEBUG: Combiner lifting 
skipped for step write/Write/WriteImpl/GroupByKey: GroupByKey not followed by a 
combiner.
root: INFO: 2018-03-21T22:53:19.513Z: JOB_MESSAGE_DEBUG: Combiner lifting 
skipped for step group: GroupByKey not followed by a combiner.
root: INFO: 2018-03-21T22:53:19.549Z: JOB_MESSAGE_DETAILED: Expanding 
GroupByKey operations into optimizable parts.
root: INFO: 2018-03-21T22:53:19.571Z: JOB_MESSAGE_DETAILED: Lifting 
ValueCombiningMappingFns into MergeBucketsMappingFns
root: INFO: 2018-03-21T22:53:19.605Z: JOB_MESSAGE_DEBUG: Annotating graph with 
Autotuner information.
root: INFO: 2018-03-21T22:53:19.649Z: JOB_MESSAGE_DETAILED: Fusing adjacent 
ParDo, Read, Write, and Flatten operations
root: INFO: 2018-03-21T22:53:19.678Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/PreFinalize/MapToVoidKey1 into 
write/Write/WriteImpl/Extract
root: INFO: 2018-03-21T22:53:19.710Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/FinalizeWrite/MapToVoidKey1 into 
write/Write/WriteImpl/Extract
root: INFO: 2018-03-21T22:53:19.732Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/PreFinalize/MapToVoidKey1 into 
write/Write/WriteImpl/Extract
root: INFO: 2018-03-21T22:53:19.756Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/FinalizeWrite/MapToVoidKey1 into 
write/Write/WriteImpl/Extract
root: INFO: 2018-03-21T22:53:19.789Z: JOB_MESSAGE_DETAILED: Fusing consumer 
pair_with_one into split
root: INFO: 2018-03-21T22:53:19.829Z: JOB_MESSAGE_DETAILED: Fusing consumer 
group/Reify into pair_with_one
root: INFO: 2018-03-21T22:53:19.859Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/GroupByKey/Reify into 
write/Write/WriteImpl/WindowInto(WindowIntoFn)
root: INFO: 2018-03-21T22:53:19.889Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/WindowInto(WindowIntoFn) into write/Write/WriteImpl/Pair
root: INFO: 2018-03-21T22:53:19.916Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/WriteBundles/WriteBundles into format
root: INFO: 2018-03-21T22:53:19.948Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/Pair into write/Write/WriteImpl/WriteBundles/WriteBundles
root: INFO: 2018-03-21T22:53:19.972Z: JOB_MESSAGE_DETAILED: Fusing consumer 
split into read/Read
root: INFO: 2018-03-21T22:53:20.009Z: JOB_MESSAGE_DETAILED: Fusing consumer 
count into group/GroupByWindow
root: INFO: 2018-03-21T22:53:20.051Z: JOB_MESSAGE_DETAILED: Fusing consumer 
format into count
root: INFO: 2018-03-21T22:53:20.085Z: JOB_MESSAGE_DETAILED: Fusing consumer 
group/Write into group/Reify
root: INFO: 2018-03-21T22:53:20.118Z: JOB_MESSAGE_DETAILED: Fusing consumer 
group/GroupByWindow into group/Read
root: INFO: 2018-03-21T22:53:20.151Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/Extract into 
write/Write/WriteImpl/GroupByKey/GroupByWindow
root: INFO: 2018-03-21T22:53:20.187Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/GroupByKey/Write into 
write/Write/WriteImpl/GroupByKey/Reify
root: INFO: 2018-03-21T22:53:20.220Z: JOB_MESSAGE_DETAILED: Fusing consumer 

Build failed in Jenkins: beam_PostCommit_Python_ValidatesRunner_Dataflow #1158

2018-03-21 Thread Apache Jenkins Server
See 


Changes:

[Pablo] Fixing check for sideinput_io_metrics experiment flag.

[iemejia] Remove testing package-info from main package for GCP IO

[iemejia] Update maven failsafe/surefire plugin to version 2.21.0

[iemejia] [BEAM-3873] Update commons-compress to version 1.16.1 (fix

[iemejia] Remove maven warnings

[tgroh] Add Side Inputs to ExecutableStage

--
[...truncated 758.22 KB...]
  {
"key": "fn", 
"label": "Transform Function", 
"namespace": "apache_beam.transforms.core.CallableWrapperDoFn", 
"type": "STRING", 
"value": ""
  }, 
  {
"key": "fn", 
"label": "Transform Function", 
"namespace": "apache_beam.transforms.core.ParDo", 
"shortValue": "CallableWrapperDoFn", 
"type": "STRING", 
"value": "apache_beam.transforms.core.CallableWrapperDoFn"
  }
], 
"non_parallel_inputs": {}, 
"output_info": [
  {
"encoding": {
  "@type": "kind:windowed_value", 
  "component_encodings": [
{
  "@type": "kind:pair", 
  "component_encodings": [
{
  "@type": "kind:bytes"
}, 
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": [
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": []
}, 
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": []
}
  ], 
  "is_pair_like": true
}
  ], 
  "is_pair_like": true
}, 
{
  "@type": "kind:global_window"
}
  ], 
  "is_wrapper": true
}, 
"output_name": "out", 
"user_name": "compute/MapToVoidKey0.out"
  }
], 
"parallel_input": {
  "@type": "OutputReference", 
  "output_name": "out", 
  "step_name": "s2"
}, 
"serialized_fn": "", 
"user_name": "compute/MapToVoidKey0"
  }
}
  ], 
  "type": "JOB_TYPE_BATCH"
}
root: INFO: Create job: 
root: INFO: Created job with id: [2018-03-21_15_47_38-5872482624041273330]
root: INFO: To access the Dataflow monitoring console, please navigate to 
https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-21_15_47_38-5872482624041273330?project=apache-beam-testing
root: INFO: Job 2018-03-21_15_47_38-5872482624041273330 is in state 
JOB_STATE_PENDING
root: INFO: 2018-03-21T22:47:38.816Z: JOB_MESSAGE_WARNING: Job 
2018-03-21_15_47_38-5872482624041273330 might autoscale up to 250 workers.
root: INFO: 2018-03-21T22:47:38.844Z: JOB_MESSAGE_DETAILED: Autoscaling is 
enabled for job 2018-03-21_15_47_38-5872482624041273330. The number of workers 
will be between 1 and 250.
root: INFO: 2018-03-21T22:47:38.873Z: JOB_MESSAGE_DETAILED: Autoscaling was 
automatically enabled for job 2018-03-21_15_47_38-5872482624041273330.
root: INFO: 2018-03-21T22:47:41.640Z: JOB_MESSAGE_DETAILED: Checking required 
Cloud APIs are enabled.
root: INFO: 2018-03-21T22:47:41.817Z: JOB_MESSAGE_DETAILED: Checking 
permissions granted to controller Service Account.
root: INFO: 2018-03-21T22:47:42.618Z: JOB_MESSAGE_DETAILED: Expanding 
CoGroupByKey operations into optimizable parts.
root: INFO: 2018-03-21T22:47:42.648Z: JOB_MESSAGE_DEBUG: Combiner lifting 
skipped for step assert_that/Group/GroupByKey: GroupByKey not followed by a 
combiner.
root: INFO: 2018-03-21T22:47:42.670Z: JOB_MESSAGE_DETAILED: Expanding 
GroupByKey operations into optimizable parts.
root: INFO: 2018-03-21T22:47:42.694Z: JOB_MESSAGE_DETAILED: Lifting 
ValueCombiningMappingFns into MergeBucketsMappingFns
root: INFO: 2018-03-21T22:47:42.724Z: JOB_MESSAGE_DEBUG: Annotating graph with 
Autotuner information.
root: INFO: 2018-03-21T22:47:42.758Z: JOB_MESSAGE_DETAILED: Fusing adjacent 
ParDo, Read, Write, and Flatten operations
root: INFO: 2018-03-21T22:47:42.787Z: JOB_MESSAGE_DETAILED: Fusing consumer 
compute/MapToVoidKey0 into side/Read
root: INFO: 

[jira] [Work logged] (BEAM-3702) Support system properties source for pipeline options

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3702?focusedWorklogId=82979=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82979
 ]

ASF GitHub Bot logged work on BEAM-3702:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:52
Start Date: 21/Mar/18 22:52
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #4683: [BEAM-3702] adding 
fromJvm to create pipelineoptions from the system properties
URL: https://github.com/apache/beam/pull/4683#issuecomment-375121943
 
 
   Please fix the javadoc errors:
   ```
   [ERROR] 
/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_MavenInstall/src/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java:273:
 Redundant  tag. [JavadocParagraph]
   Audit done.
   2018-03-21T21:43:26.135 [INFO] There is 1 error reported by Checkstyle 8.7 
with beam/checkstyle.xml ruleset.
   2018-03-21T21:43:26.138 [ERROR] 
src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java:[273] 
(javadoc) JavadocParagraph: Redundant  tag.
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82979)
Time Spent: 10h 50m  (was: 10h 40m)

> Support system properties source for pipeline options
> -
>
> Key: BEAM-3702
> URL: https://issues.apache.org/jira/browse/BEAM-3702
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Romain Manni-Bucau
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3851) Support element timestamps while publishing to Kafka.

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3851?focusedWorklogId=82978=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82978
 ]

ASF GitHub Bot logged work on BEAM-3851:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:49
Start Date: 21/Mar/18 22:49
Worklog Time Spent: 10m 
  Work Description: XuMingmin commented on issue #4868: [BEAM-3851] Option 
to preserve element timestamp while publishing to Kafka.
URL: https://github.com/apache/beam/pull/4868#issuecomment-375121299
 
 
   Seems the failure is not related with this change, @rangadi can you help to 
double check?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82978)
Time Spent: 1h 10m  (was: 1h)

> Support element timestamps while publishing to Kafka.
> -
>
> Key: BEAM-3851
> URL: https://issues.apache.org/jira/browse/BEAM-3851
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Affects Versions: 2.3.0
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> KafkaIO sink should support using input element timestamp for the message 
> published to Kafka. Otherwise there is no way for user to influence the 
> timestamp of the messages in Kafka sink.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3866) Move Go SDK to not use WindowedValue for PCollections

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3866?focusedWorklogId=82977=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82977
 ]

ASF GitHub Bot logged work on BEAM-3866:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:47
Start Date: 21/Mar/18 22:47
Worklog Time Spent: 10m 
  Work Description: tgroh closed pull request #4919: [BEAM-3866] Remove 
windowed value requirement for Go SDK External
URL: https://github.com/apache/beam/pull/4919
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/go/README.md b/sdks/go/README.md
index ed0a669dd53..f9a08e0b6db 100644
--- a/sdks/go/README.md
+++ b/sdks/go/README.md
@@ -37,39 +37,34 @@ are parameterized by Go flags. For example, to run 
wordcount do:
 $ pwd
 [...]/sdks/go
 $ go run examples/wordcount/wordcount.go --output=/tmp/result.txt
-2017/12/05 10:46:37 Pipeline:
-2017/12/05 10:46:37 Nodes: {1: W<[]uint8>/GW/W!GW}
-{2: W/GW/W!GW}
-{3: W/GW/W!GW}
-{4: W/GW/W!GW}
-{5: W/GW/W!GW}
-{6: W>/GW/W>!GW}
-{7: W>/GW/W>!GW}
-{8: W>/GW/W>!GW}
-{9: W/GW/W!GW}
-Edges: 1: Impulse [] -> [Out: W<[]uint8> -> {1: W<[]uint8>/GW/W!GW}]
-2: ParDo [In(Main): W<[]uint8> <- {1: W<[]uint8>/GW/W!GW}] -> [Out: 
W -> {2: W/GW/W!GW}]
-3: ParDo [In(Main): W <- {2: W/GW/W!GW}] -> [Out: 
W -> {3: W/GW/W!GW}]
-4: ParDo [In(Main): W <- {3: W/GW/W!GW}] -> [Out: 
W -> {4: W/GW/W!GW}]
-5: ParDo [In(Main): W <- {4: W/GW/W!GW}] -> [Out: 
W -> {5: W/GW/W!GW}]
-6: ParDo [In(Main): W <- {5: W/GW/W!GW}] -> [Out: 
W> -> {6: W>/GW/W>!GW}]
-7: GBK [In(Main): KV <- {6: 
W>/GW/W>!GW}] -> [Out: GBK -> {7: 
W>/GW/W>!GW}]
-8: Combine [In(Main): W <- {7: 
W>/GW/W>!GW}] -> [Out: W> 
-> {8: W>/GW/W>!GW}]
-9: ParDo [In(Main): W> <- {8: 
W>/GW/W>!GW}] -> [Out: W -> {9: 
W/GW/W!GW}]
-10: ParDo [In(Main): W <- {9: W/GW/W!GW}] -> []
-2017/12/05 10:46:37 Execution units:
-2017/12/05 10:46:37 1: Impulse[0]
-2017/12/05 10:46:37 2: ParDo[beam.createFn] Out:[3]
-2017/12/05 10:46:37 3: ParDo[textio.expandFn] Out:[4]
-2017/12/05 10:46:37 4: ParDo[textio.readFn] Out:[5]
-2017/12/05 10:46:37 5: ParDo[main.extractFn] Out:[6]
-2017/12/05 10:46:37 6: ParDo[stats.mapFn] Out:[7]
-2017/12/05 10:46:37 7: GBK. Out:8
-2017/12/05 10:46:37 8: Combine[stats.sumIntFn] Keyed:true (Use:false) Out:[9]
-2017/12/05 10:46:37 9: ParDo[main.formatFn] Out:[10]
-2017/12/05 10:46:37 10: ParDo[textio.writeFileFn] Out:[]
-2017/12/05 10:46:37 Reading from 
gs://apache-beam-samples/shakespeare/kinglear.txt
-2017/12/05 10:46:38 Writing to /tmp/result.txt
+[{6: KV/GW/KV}]
+[{10: KV/GW/KV}]
+2018/03/21 09:39:03 Pipeline:
+2018/03/21 09:39:03 Nodes: {1: []uint8/GW/bytes}
+{2: string/GW/bytes}
+{3: string/GW/bytes}
+{4: string/GW/bytes}
+{5: string/GW/bytes}
+{6: KV/GW/KV}
+{7: CoGBK/GW/CoGBK}
+{8: KV/GW/KV}
+{9: string/GW/bytes}
+{10: KV/GW/KV}
+{11: CoGBK/GW/CoGBK}
+Edges: 1: Impulse [] -> [Out: []uint8 -> {1: []uint8/GW/bytes}]
+2: ParDo [In(Main): []uint8 <- {1: []uint8/GW/bytes}] -> [Out: T -> {2: 
string/GW/bytes}]
+3: ParDo [In(Main): string <- {2: string/GW/bytes}] -> [Out: string -> {3: 
string/GW/bytes}]
+4: ParDo [In(Main): string <- {3: string/GW/bytes}] -> [Out: string -> {4: 
string/GW/bytes}]
+5: ParDo [In(Main): string <- {4: string/GW/bytes}] -> [Out: string -> {5: 
string/GW/bytes}]
+6: ParDo [In(Main): T <- {5: string/GW/bytes}] -> [Out: KV -> {6: 
KV/GW/KV}]
+7: CoGBK [In(Main): KV <- {6: 
KV/GW/KV}] -> [Out: CoGBK -> {7: 
CoGBK/GW/CoGBK}]
+8: Combine [In(Main): int <- {7: 
CoGBK/GW/CoGBK}] -> [Out: KV -> {8: 
KV/GW/KV}]
+9: ParDo [In(Main): KV <- {8: 
KV/GW/KV}] -> [Out: string -> {9: 
string/GW/bytes}]
+10: ParDo [In(Main): T <- {9: string/GW/bytes}] -> [Out: KV -> {10: 
KV/GW/KV}]
+11: CoGBK [In(Main): KV <- {10: 
KV/GW/KV}] -> [Out: CoGBK -> {11: 

[beam] 01/01: Merge pull request #4919: Remove windowed value requirement for Go SDK External

2018-03-21 Thread tgroh
This is an automated email from the ASF dual-hosted git repository.

tgroh pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 75c58a0b59854214ea0aa890e77052930e0ab74a
Merge: bb410e2 7b235f1
Author: Thomas Groh 
AuthorDate: Wed Mar 21 15:47:41 2018 -0700

Merge pull request #4919: Remove windowed value requirement for Go SDK 
External

[BEAM-3866]

 sdks/go/README.md| 61 
 sdks/go/pkg/beam/coder.go|  3 +--
 sdks/go/pkg/beam/create.go   |  8 +++---
 sdks/go/pkg/beam/external.go |  6 -
 sdks/go/pkg/beam/gbk.go  |  4 +--
 sdks/go/pkg/beam/impulse.go  |  4 +--
 sdks/go/pkg/beam/util.go |  1 +
 7 files changed, 38 insertions(+), 49 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
tg...@apache.org.


[beam] branch master updated (bb410e2 -> 75c58a0)

2018-03-21 Thread tgroh
This is an automated email from the ASF dual-hosted git repository.

tgroh pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from bb410e2  Merge pull request #4912: [BEAM-3042] Fixing check for 
sideinput_io_metrics experiment flag
 add 7b235f1  [BEAM-3866] Remove windowed value requirement for External
 new 75c58a0  Merge pull request #4919: Remove windowed value requirement 
for Go SDK External

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/go/README.md| 61 
 sdks/go/pkg/beam/coder.go|  3 +--
 sdks/go/pkg/beam/create.go   |  8 +++---
 sdks/go/pkg/beam/external.go |  6 -
 sdks/go/pkg/beam/gbk.go  |  4 +--
 sdks/go/pkg/beam/impulse.go  |  4 +--
 sdks/go/pkg/beam/util.go |  1 +
 7 files changed, 38 insertions(+), 49 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
tg...@apache.org.


[jira] [Work logged] (BEAM-3866) Move Go SDK to not use WindowedValue for PCollections

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3866?focusedWorklogId=82976=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82976
 ]

ASF GitHub Bot logged work on BEAM-3866:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:45
Start Date: 21/Mar/18 22:45
Worklog Time Spent: 10m 
  Work Description: tgroh commented on issue #4919: [BEAM-3866] Remove 
windowed value requirement for Go SDK External
URL: https://github.com/apache/beam/pull/4919#issuecomment-375120481
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82976)
Time Spent: 1h 40m  (was: 1.5h)

> Move Go SDK to not use WindowedValue for PCollections
> -
>
> Key: BEAM-3866
> URL: https://issues.apache.org/jira/browse/BEAM-3866
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
>  Labels: portability
> Fix For: 2.5.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The windowing information is part of the gRPC instructions. Dataflow still 
> expects the old way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=82967=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82967
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:26
Start Date: 21/Mar/18 22:26
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#4901: [BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r176257008
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/transform_evaluator.py
 ##
 @@ -428,24 +428,41 @@ def _read_from_pubsub(self):
 self._subscription, return_immediately=True,
 max_messages=10) as results:
   def _get_element(message):
-if self.source.with_attributes:
-  return PubsubMessage._from_message(message)
+parsed_message = PubsubMessage._from_message(message)
+if timestamp_attribute:
+  try:
+rfc3339_or_milli = parsed_message.attributes[timestamp_attribute]
+  except KeyError:
+raise KeyError('Timestamp attribute not found: %s' %
+   self.source.timestamp_attribute)
+  try:
+timestamp = Timestamp.from_rfc3339(rfc3339_or_milli)
+  except ValueError:
+try:
+  timestamp = Timestamp(micros=int(rfc3339_or_milli) * 1000)
+except ValueError:
+  raise ValueError('Invalid timestamp value: %s', rfc3339_or_milli)
 
 Review comment:
   Raise/preserve original exception ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82967)

> Support full PubsubMessages
> ---
>
> Key: BEAM-3744
> URL: https://issues.apache.org/jira/browse/BEAM-3744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Tracking changes to Pubsub support in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=82973=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82973
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:26
Start Date: 21/Mar/18 22:26
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#4901: [BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r176254488
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/pubsub.py
 ##
 @@ -280,25 +276,15 @@ def parse_subscription(full_subscription):
 class _PubSubSource(dataflow_io.NativeSource):
   """Source for the payload of a message as bytes from a Cloud Pub/Sub topic.
 
+  This ``PTransform`` is overridden by a native Pubsub implementation.
 
 Review comment:
   This is not a PTransform.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82973)
Time Spent: 5h 50m  (was: 5h 40m)

> Support full PubsubMessages
> ---
>
> Key: BEAM-3744
> URL: https://issues.apache.org/jira/browse/BEAM-3744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Tracking changes to Pubsub support in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=82966=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82966
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:26
Start Date: 21/Mar/18 22:26
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#4901: [BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r176254992
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/pubsub_test.py
 ##
 @@ -276,16 +283,27 @@ def topic(self, name):
 return FakePubsubTopic(name, self)
 
 
+def create_client_message(payload, message_id, attributes, publish_time):
+  """Returns a message as it would be returned from Pubsub client."""
 
 Review comment:
   Cloud Pub/Sub client


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82966)
Time Spent: 5h 10m  (was: 5h)

> Support full PubsubMessages
> ---
>
> Key: BEAM-3744
> URL: https://issues.apache.org/jira/browse/BEAM-3744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Tracking changes to Pubsub support in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=82965=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82965
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:26
Start Date: 21/Mar/18 22:26
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#4901: [BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r176254733
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/pubsub.py
 ##
 @@ -338,7 +325,7 @@ def display_data(self):
 
   def reader(self):
 raise NotImplementedError(
-'PubSubPayloadSource is not supported in local execution.')
+'PubSubSource is not supported in local execution.')
 
 Review comment:
   PubSubSource is an implementation detail. "Reading form Cloud Pub/Sub is not 
supported in local execution" ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82965)
Time Spent: 5h  (was: 4h 50m)

> Support full PubsubMessages
> ---
>
> Key: BEAM-3744
> URL: https://issues.apache.org/jira/browse/BEAM-3744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Tracking changes to Pubsub support in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=82968=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82968
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:26
Start Date: 21/Mar/18 22:26
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#4901: [BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r176258113
 
 

 ##
 File path: sdks/python/apache_beam/utils/timestamp.py
 ##
 @@ -38,6 +40,12 @@ class Timestamp(object):
   """
 
   def __init__(self, seconds=0, micros=0):
+if not isinstance(seconds, (int, float, long)):
+  raise TypeError('Cannot interpret %s %s as seconds.' % (
+seconds, type(seconds)))
 
 Review comment:
   type is the first parameter ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82968)
Time Spent: 5h 20m  (was: 5h 10m)

> Support full PubsubMessages
> ---
>
> Key: BEAM-3744
> URL: https://issues.apache.org/jira/browse/BEAM-3744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Tracking changes to Pubsub support in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=82964=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82964
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:26
Start Date: 21/Mar/18 22:26
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#4901: [BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r176249053
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/pubsub.py
 ##
 @@ -44,8 +68,8 @@
   pubsub_pb2 = None
 
 
-__all__ = ['PubsubMessage', 'ReadMessagesFromPubSub', 'ReadStringsFromPubSub',
-   'WriteStringsToPubSub']
+__all__ = ['PubsubMessage', 'ReadMessagesFromPubSub', 'ReadPayloadsFromPubSub',
 
 Review comment:
   +1 for simplifying to two transforms. BTW is implementation of 
'ReadStringsFromPubSub' more efficient than 'ReadMessagesFromPubSub' followed 
by a 'DoFn' to filter out the payload (for example, do we get a simplified 
stream from Pub/Sub in this case) ? If so we should mention that in py docs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82964)
Time Spent: 4h 50m  (was: 4h 40m)

> Support full PubsubMessages
> ---
>
> Key: BEAM-3744
> URL: https://issues.apache.org/jira/browse/BEAM-3744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Tracking changes to Pubsub support in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=82969=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82969
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:26
Start Date: 21/Mar/18 22:26
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#4901: [BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r176256157
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -742,7 +742,7 @@ def run_Read(self, transform_node):
   standard_options = (
   transform_node.inputs[0].pipeline.options.view_as(StandardOptions))
   if not standard_options.streaming:
-raise ValueError('PubSubPayloadSource is currently available for use '
+raise ValueError('PubSubSource is currently available for use '
 
 Review comment:
   Ditto.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82969)
Time Spent: 5.5h  (was: 5h 20m)

> Support full PubsubMessages
> ---
>
> Key: BEAM-3744
> URL: https://issues.apache.org/jira/browse/BEAM-3744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Tracking changes to Pubsub support in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=82972=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82972
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:26
Start Date: 21/Mar/18 22:26
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#4901: [BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r176258172
 
 

 ##
 File path: sdks/python/apache_beam/utils/timestamp.py
 ##
 @@ -38,6 +40,12 @@ class Timestamp(object):
   """
 
   def __init__(self, seconds=0, micros=0):
+if not isinstance(seconds, (int, float, long)):
+  raise TypeError('Cannot interpret %s %s as seconds.' % (
+seconds, type(seconds)))
+if not isinstance(micros, (int, float, long)):
+  raise TypeError('Cannot interpret %s %s as micros.' % (
 
 Review comment:
   Ditto.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82972)
Time Spent: 5h 50m  (was: 5h 40m)

> Support full PubsubMessages
> ---
>
> Key: BEAM-3744
> URL: https://issues.apache.org/jira/browse/BEAM-3744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Tracking changes to Pubsub support in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=82970=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82970
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:26
Start Date: 21/Mar/18 22:26
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#4901: [BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r176256974
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/transform_evaluator.py
 ##
 @@ -428,24 +428,41 @@ def _read_from_pubsub(self):
 self._subscription, return_immediately=True,
 max_messages=10) as results:
   def _get_element(message):
-if self.source.with_attributes:
-  return PubsubMessage._from_message(message)
+parsed_message = PubsubMessage._from_message(message)
+if timestamp_attribute:
+  try:
+rfc3339_or_milli = parsed_message.attributes[timestamp_attribute]
+  except KeyError:
+raise KeyError('Timestamp attribute not found: %s' %
 
 Review comment:
   Raise/preserve original exception ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82970)
Time Spent: 5h 40m  (was: 5.5h)

> Support full PubsubMessages
> ---
>
> Key: BEAM-3744
> URL: https://issues.apache.org/jira/browse/BEAM-3744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Tracking changes to Pubsub support in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=82971=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82971
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:26
Start Date: 21/Mar/18 22:26
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#4901: [BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r176258270
 
 

 ##
 File path: sdks/python/apache_beam/utils/timestamp.py
 ##
 @@ -53,12 +61,52 @@ def of(seconds):
   Corresponding Timestamp object.
 """
 
-if isinstance(seconds, Duration):
-  raise TypeError('Can\'t interpret %s as Timestamp.' % seconds)
+if not isinstance(seconds, (int, float, Timestamp)):
+  raise TypeError('Cannot interpret %s %s as Timestamp.' % (
+seconds, type(seconds)))
 
 Review comment:
   Ditto.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82971)
Time Spent: 5h 50m  (was: 5h 40m)

> Support full PubsubMessages
> ---
>
> Key: BEAM-3744
> URL: https://issues.apache.org/jira/browse/BEAM-3744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Tracking changes to Pubsub support in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to stable : beam_PostCommit_Java_ValidatesRunner_Flink #5292

2018-03-21 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-3565) Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3565?focusedWorklogId=82955=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82955
 ]

ASF GitHub Bot logged work on BEAM-3565:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:08
Start Date: 21/Mar/18 22:08
Worklog Time Spent: 10m 
  Work Description: tgroh commented on a change in pull request #4777: 
[BEAM-3565] Add FusedPipeline#toPipeline
URL: https://github.com/apache/beam/pull/4777#discussion_r176251543
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/FusedPipeline.java
 ##
 @@ -19,54 +19,83 @@
 package org.apache.beam.runners.core.construction.graph;
 
 import com.google.auto.value.AutoValue;
+import com.google.common.collect.Sets;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
 import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.StreamSupport;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
 import org.apache.beam.model.pipeline.v1.RunnerApi.Components;
-import org.apache.beam.model.pipeline.v1.RunnerApi.Components.Builder;
 import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform;
 import org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline;
 import 
org.apache.beam.runners.core.construction.graph.PipelineNode.PTransformNode;
 
-/**
- * A {@link Pipeline} which has been separated into collections of executable 
components.
- */
+/** A {@link Pipeline} which has been separated into collections of executable 
components. */
 @AutoValue
 public abstract class FusedPipeline {
   static FusedPipeline of(
   Set environmentalStages, Set 
runnerStages) {
 return new AutoValue_FusedPipeline(environmentalStages, runnerStages);
   }
 
-  /**
-   * The {@link ExecutableStage executable stages} that are executed by SDK 
harnesses.
-   */
+  /** The {@link ExecutableStage executable stages} that are executed by SDK 
harnesses. */
   public abstract Set getFusedStages();
 
-  /**
-   * The {@link PTransform PTransforms} that a runner is responsible for 
executing.
-   */
+  /** The {@link PTransform PTransforms} that a runner is responsible for 
executing. */
   public abstract Set getRunnerExecutedTransforms();
 
+  public RunnerApi.Pipeline toPipeline(Components initialComponents) {
+Map executableTransforms = 
getExecutableTransforms(initialComponents);
+Components fusedComponents = initialComponents.toBuilder()
+.putAllTransforms(executableTransforms)
+.putAllTransforms(getFusedTransforms())
+.build();
+List rootTransformIds =
+StreamSupport.stream(
+QueryablePipeline.forTransforms(executableTransforms.keySet(), 
fusedComponents)
+.getTopologicallyOrderedTransforms()
+.spliterator(),
+false)
+.map(PTransformNode::getId)
+.collect(Collectors.toList());
+return Pipeline.newBuilder()
+.setComponents(fusedComponents)
+.addAllRootTransformIds(rootTransformIds)
+.build();
+  }
+
   /**
-   * Return a {@link Components} like the {@code base} components, but with 
the only transforms
-   * equal to this fused pipeline.
+   * Return a {@link Components} like the {@code base} components, but with 
the set of transforms to
+   * be executed by the runner.
*
-   * The only composites will be the stages returned by {@link 
#getFusedStages()}.
+   * The transforms that are present in the returned map are the union of 
the results of {@link
+   * #getRunnerExecutedTransforms()} and {@link #getFusedStages()}, where each 
{@link
+   * ExecutableStage}.
 
 Review comment:
   RM'd


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82955)
Time Spent: 16h 50m  (was: 16h 40m)

> Add utilities for producing a collection of PTransforms that can execute in a 
> single SDK Harness
> 
>
> Key: BEAM-3565
> URL: https://issues.apache.org/jira/browse/BEAM-3565
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
> Fix For: 2.4.0
>
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> An SDK Harness executes some ("fused") collection of PTransforms. The java 
> runner libraries should 

[jira] [Work logged] (BEAM-3565) Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3565?focusedWorklogId=82958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82958
 ]

ASF GitHub Bot logged work on BEAM-3565:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:08
Start Date: 21/Mar/18 22:08
Worklog Time Spent: 10m 
  Work Description: tgroh commented on a change in pull request #4777: 
[BEAM-3565] Add FusedPipeline#toPipeline
URL: https://github.com/apache/beam/pull/4777#discussion_r176251681
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/QueryablePipeline.java
 ##
 @@ -183,6 +195,19 @@ private static boolean isPrimitiveTransform(PTransform 
transform) {
 .collect(Collectors.toSet());
   }
 
+  /**
+   * Get the PCollections which are not consumed by any {@link PTransformNode} 
in this {@link
+   * QueryablePipeline}.
+   */
+  private Set getLeafPCollections() {
 
 Review comment:
   Pulled in accidentally, I think.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82958)
Time Spent: 17h 10m  (was: 17h)

> Add utilities for producing a collection of PTransforms that can execute in a 
> single SDK Harness
> 
>
> Key: BEAM-3565
> URL: https://issues.apache.org/jira/browse/BEAM-3565
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
> Fix For: 2.4.0
>
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> An SDK Harness executes some ("fused") collection of PTransforms. The java 
> runner libraries should provide some way to take a Pipeline that executes in 
> both a runner and an environment and construct a collection of transforms 
> which can execute within a single environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3855) Add Go SDK support for protobuf coder

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3855?focusedWorklogId=82961=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82961
 ]

ASF GitHub Bot logged work on BEAM-3855:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:12
Start Date: 21/Mar/18 22:12
Worklog Time Spent: 10m 
  Work Description: wcn3 commented on issue #4908: BEAM-3855: Add Protocol 
Buffer support
URL: https://github.com/apache/beam/pull/4908#issuecomment-375113157
 
 
   Added the registrations.
   
   I am still working on a test case running against Cloud Dataflow to verify 
the serialization actually works. The direct runner only requires the 
IsConcrete method to be working in order to handle protocol buffers; I need to 
test against a runner that will actually serialize data.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82961)
Time Spent: 1h  (was: 50m)

> Add Go SDK support for protobuf coder
> -
>
> Key: BEAM-3855
> URL: https://issues.apache.org/jira/browse/BEAM-3855
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Willy Lulciuc
>Assignee: Bill Neubauer
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This JIRA is for something functional. We might want to use the coder 
> registry for a more general solution, when implemented. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3565) Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3565?focusedWorklogId=82953=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82953
 ]

ASF GitHub Bot logged work on BEAM-3565:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:08
Start Date: 21/Mar/18 22:08
Worklog Time Spent: 10m 
  Work Description: tgroh commented on a change in pull request #4777: 
[BEAM-3565] Add FusedPipeline#toPipeline
URL: https://github.com/apache/beam/pull/4777#discussion_r176251864
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/FusedPipelineTest.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.core.construction.graph;
+
+import static com.google.common.base.Preconditions.checkState;
+import static org.hamcrest.Matchers.allOf;
+import static org.hamcrest.Matchers.equalTo;
+import static org.hamcrest.Matchers.hasItem;
+import static org.hamcrest.Matchers.hasItems;
+import static org.hamcrest.Matchers.hasKey;
+import static org.hamcrest.Matchers.not;
+import static org.hamcrest.Matchers.startsWith;
+import static org.junit.Assert.assertThat;
+
+import com.google.common.collect.ImmutableList;
+import java.io.Serializable;
+import java.util.HashSet;
+import java.util.Set;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform;
+import org.apache.beam.runners.core.construction.PipelineTranslation;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.transforms.GroupByKey;
+import org.apache.beam.sdk.transforms.Impulse;
+import org.apache.beam.sdk.transforms.MapElements;
+import org.apache.beam.sdk.transforms.Values;
+import org.apache.beam.sdk.transforms.WithKeys;
+import org.apache.beam.sdk.values.TypeDescriptors;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link FusedPipeline}. */
+@RunWith(JUnit4.class)
+public class FusedPipelineTest implements Serializable {
+  @Test
+  public void testToProto() {
+Pipeline p = Pipeline.create();
+p.apply("impulse", Impulse.create())
+.apply("map", MapElements.into(TypeDescriptors.integers()).via(bytes 
-> bytes.length))
+.apply("key", WithKeys.of("foo"))
+.apply("gbk", GroupByKey.create())
+.apply("values", Values.create());
+
+RunnerApi.Pipeline protoPipeline = PipelineTranslation.toProto(p);
+checkState(
+protoPipeline
+.getRootTransformIdsList()
+.containsAll(ImmutableList.of("impulse", "map", "key", "gbk", 
"values")),
+"Unexpected Root Transform IDs %s",
+protoPipeline.getRootTransformIdsList());
+
+FusedPipeline fused = GreedyPipelineFuser.fuse(protoPipeline);
+checkState(
+fused.getRunnerExecutedTransforms().size() == 2,
+"Unexpected number of runner transforms %s",
+fused.getRunnerExecutedTransforms());
+checkState(
+fused.getFusedStages().size() == 2,
+"Unexpected number of fused stages %s",
+fused.getFusedStages());
+RunnerApi.Pipeline fusedProto = 
fused.toPipeline(protoPipeline.getComponents());
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82953)
Time Spent: 16.5h  (was: 16h 20m)

> Add utilities for producing a collection of PTransforms that can execute in a 
> single SDK Harness
> 
>
> Key: BEAM-3565
> URL: https://issues.apache.org/jira/browse/BEAM-3565
> Project: Beam
>  Issue Type: Bug
>

[jira] [Work logged] (BEAM-3565) Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3565?focusedWorklogId=82956=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82956
 ]

ASF GitHub Bot logged work on BEAM-3565:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:08
Start Date: 21/Mar/18 22:08
Worklog Time Spent: 10m 
  Work Description: tgroh commented on a change in pull request #4777: 
[BEAM-3565] Add FusedPipeline#toPipeline
URL: https://github.com/apache/beam/pull/4777#discussion_r176251523
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/FusedPipeline.java
 ##
 @@ -19,54 +19,83 @@
 package org.apache.beam.runners.core.construction.graph;
 
 import com.google.auto.value.AutoValue;
+import com.google.common.collect.Sets;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
 import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.StreamSupport;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
 import org.apache.beam.model.pipeline.v1.RunnerApi.Components;
-import org.apache.beam.model.pipeline.v1.RunnerApi.Components.Builder;
 import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform;
 import org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline;
 import 
org.apache.beam.runners.core.construction.graph.PipelineNode.PTransformNode;
 
-/**
- * A {@link Pipeline} which has been separated into collections of executable 
components.
- */
+/** A {@link Pipeline} which has been separated into collections of executable 
components. */
 @AutoValue
 public abstract class FusedPipeline {
   static FusedPipeline of(
   Set environmentalStages, Set 
runnerStages) {
 return new AutoValue_FusedPipeline(environmentalStages, runnerStages);
   }
 
-  /**
-   * The {@link ExecutableStage executable stages} that are executed by SDK 
harnesses.
-   */
+  /** The {@link ExecutableStage executable stages} that are executed by SDK 
harnesses. */
   public abstract Set getFusedStages();
 
-  /**
-   * The {@link PTransform PTransforms} that a runner is responsible for 
executing.
-   */
+  /** The {@link PTransform PTransforms} that a runner is responsible for 
executing. */
   public abstract Set getRunnerExecutedTransforms();
 
+  public RunnerApi.Pipeline toPipeline(Components initialComponents) {
+Map executableTransforms = 
getExecutableTransforms(initialComponents);
+Components fusedComponents = initialComponents.toBuilder()
+.putAllTransforms(executableTransforms)
+.putAllTransforms(getFusedTransforms())
+.build();
+List rootTransformIds =
+StreamSupport.stream(
+QueryablePipeline.forTransforms(executableTransforms.keySet(), 
fusedComponents)
+.getTopologicallyOrderedTransforms()
+.spliterator(),
+false)
+.map(PTransformNode::getId)
+.collect(Collectors.toList());
+return Pipeline.newBuilder()
+.setComponents(fusedComponents)
+.addAllRootTransformIds(rootTransformIds)
+.build();
+  }
+
   /**
-   * Return a {@link Components} like the {@code base} components, but with 
the only transforms
-   * equal to this fused pipeline.
+   * Return a {@link Components} like the {@code base} components, but with 
the set of transforms to
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82956)
Time Spent: 17h  (was: 16h 50m)

> Add utilities for producing a collection of PTransforms that can execute in a 
> single SDK Harness
> 
>
> Key: BEAM-3565
> URL: https://issues.apache.org/jira/browse/BEAM-3565
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
> Fix For: 2.4.0
>
>  Time Spent: 17h
>  Remaining Estimate: 0h
>
> An SDK Harness executes some ("fused") collection of PTransforms. The java 
> runner libraries should provide some way to take a Pipeline that executes in 
> both a runner and an environment and construct a collection of transforms 
> which can execute within a single environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3565) Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3565?focusedWorklogId=82957=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82957
 ]

ASF GitHub Bot logged work on BEAM-3565:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:08
Start Date: 21/Mar/18 22:08
Worklog Time Spent: 10m 
  Work Description: tgroh commented on a change in pull request #4777: 
[BEAM-3565] Add FusedPipeline#toPipeline
URL: https://github.com/apache/beam/pull/4777#discussion_r176250895
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/FusedPipeline.java
 ##
 @@ -19,54 +19,83 @@
 package org.apache.beam.runners.core.construction.graph;
 
 import com.google.auto.value.AutoValue;
+import com.google.common.collect.Sets;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
 import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.StreamSupport;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
 import org.apache.beam.model.pipeline.v1.RunnerApi.Components;
-import org.apache.beam.model.pipeline.v1.RunnerApi.Components.Builder;
 import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform;
 import org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline;
 import 
org.apache.beam.runners.core.construction.graph.PipelineNode.PTransformNode;
 
-/**
- * A {@link Pipeline} which has been separated into collections of executable 
components.
- */
+/** A {@link Pipeline} which has been separated into collections of executable 
components. */
 @AutoValue
 public abstract class FusedPipeline {
   static FusedPipeline of(
   Set environmentalStages, Set 
runnerStages) {
 return new AutoValue_FusedPipeline(environmentalStages, runnerStages);
   }
 
-  /**
-   * The {@link ExecutableStage executable stages} that are executed by SDK 
harnesses.
-   */
+  /** The {@link ExecutableStage executable stages} that are executed by SDK 
harnesses. */
   public abstract Set getFusedStages();
 
-  /**
-   * The {@link PTransform PTransforms} that a runner is responsible for 
executing.
-   */
+  /** The {@link PTransform PTransforms} that a runner is responsible for 
executing. */
   public abstract Set getRunnerExecutedTransforms();
 
+  public RunnerApi.Pipeline toPipeline(Components initialComponents) {
 
 Review comment:
   Made into a property.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82957)
Time Spent: 17h 10m  (was: 17h)

> Add utilities for producing a collection of PTransforms that can execute in a 
> single SDK Harness
> 
>
> Key: BEAM-3565
> URL: https://issues.apache.org/jira/browse/BEAM-3565
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
> Fix For: 2.4.0
>
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> An SDK Harness executes some ("fused") collection of PTransforms. The java 
> runner libraries should provide some way to take a Pipeline that executes in 
> both a runner and an environment and construct a collection of transforms 
> which can execute within a single environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3565) Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3565?focusedWorklogId=82954=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82954
 ]

ASF GitHub Bot logged work on BEAM-3565:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:08
Start Date: 21/Mar/18 22:08
Worklog Time Spent: 10m 
  Work Description: tgroh commented on a change in pull request #4777: 
[BEAM-3565] Add FusedPipeline#toPipeline
URL: https://github.com/apache/beam/pull/4777#discussion_r176252057
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/FusedPipelineTest.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.core.construction.graph;
+
+import static com.google.common.base.Preconditions.checkState;
+import static org.hamcrest.Matchers.allOf;
+import static org.hamcrest.Matchers.equalTo;
+import static org.hamcrest.Matchers.hasItem;
+import static org.hamcrest.Matchers.hasItems;
+import static org.hamcrest.Matchers.hasKey;
+import static org.hamcrest.Matchers.not;
+import static org.hamcrest.Matchers.startsWith;
+import static org.junit.Assert.assertThat;
+
+import com.google.common.collect.ImmutableList;
+import java.io.Serializable;
+import java.util.HashSet;
+import java.util.Set;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform;
+import org.apache.beam.runners.core.construction.PipelineTranslation;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.transforms.GroupByKey;
+import org.apache.beam.sdk.transforms.Impulse;
+import org.apache.beam.sdk.transforms.MapElements;
+import org.apache.beam.sdk.transforms.Values;
+import org.apache.beam.sdk.transforms.WithKeys;
+import org.apache.beam.sdk.values.TypeDescriptors;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link FusedPipeline}. */
+@RunWith(JUnit4.class)
+public class FusedPipelineTest implements Serializable {
+  @Test
+  public void testToProto() {
+Pipeline p = Pipeline.create();
+p.apply("impulse", Impulse.create())
+.apply("map", MapElements.into(TypeDescriptors.integers()).via(bytes 
-> bytes.length))
+.apply("key", WithKeys.of("foo"))
+.apply("gbk", GroupByKey.create())
+.apply("values", Values.create());
+
+RunnerApi.Pipeline protoPipeline = PipelineTranslation.toProto(p);
+checkState(
+protoPipeline
+.getRootTransformIdsList()
+.containsAll(ImmutableList.of("impulse", "map", "key", "gbk", 
"values")),
+"Unexpected Root Transform IDs %s",
+protoPipeline.getRootTransformIdsList());
+
+FusedPipeline fused = GreedyPipelineFuser.fuse(protoPipeline);
+checkState(
+fused.getRunnerExecutedTransforms().size() == 2,
+"Unexpected number of runner transforms %s",
+fused.getRunnerExecutedTransforms());
+checkState(
+fused.getFusedStages().size() == 2,
+"Unexpected number of fused stages %s",
+fused.getFusedStages());
+RunnerApi.Pipeline fusedProto = 
fused.toPipeline(protoPipeline.getComponents());
+
+assertThat(
+"Root Transforms should all be present in the Pipeline Components",
+fusedProto.getComponents().getTransformsMap().keySet(),
+hasItems(fusedProto.getRootTransformIdsList().toArray(new String[0])));
+assertThat(
+"Should contain Impulse, GroupByKey, and two Environment Stages",
+fusedProto.getRootTransformIdsCount(),
+equalTo(4));
+assertThat(fusedProto.getRootTransformIdsList(), hasItems("impulse", 
"gbk"));
+assertRootsInTopologicalOrder(fusedProto);
+// Since MapElements, WithKeys, and Values are all composites of a ParDo, 
we do prefix matching
+// instead of looking at the inside of their expansions
+assertThat(
+"Fused transforms should be present in the components",
+

[jira] [Work logged] (BEAM-3042) Add tracking of bytes read / time spent when reading side inputs

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3042?focusedWorklogId=82952=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82952
 ]

ASF GitHub Bot logged work on BEAM-3042:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:03
Start Date: 21/Mar/18 22:03
Worklog Time Spent: 10m 
  Work Description: chamikaramj closed pull request #4912: [BEAM-3042] 
Fixing check for sideinput_io_metrics experiment flag.
URL: https://github.com/apache/beam/pull/4912
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/runners/worker/operations.py 
b/sdks/python/apache_beam/runners/worker/operations.py
index 11ff909f3e9..0fa32e3c997 100644
--- a/sdks/python/apache_beam/runners/worker/operations.py
+++ b/sdks/python/apache_beam/runners/worker/operations.py
@@ -289,8 +289,7 @@ def _read_side_inputs(self, tags_and_types):
 assert self.side_input_maps is None
 
 # Get experiments active in the worker to check for side input metrics exp.
-experiments = set(
-RuntimeValueProvider.get_value('experiments', str, '').split(','))
+experiments = RuntimeValueProvider.get_value('experiments', list, [])
 
 # We will read the side inputs in the order prescribed by the
 # tags_and_types argument because this is exactly the order needed to
diff --git a/sdks/python/apache_beam/runners/worker/sideinputs.py 
b/sdks/python/apache_beam/runners/worker/sideinputs.py
index cc405e0e477..77157857b05 100644
--- a/sdks/python/apache_beam/runners/worker/sideinputs.py
+++ b/sdks/python/apache_beam/runners/worker/sideinputs.py
@@ -105,8 +105,7 @@ def _start_reader_threads(self):
 
   def _reader_thread(self):
 # pylint: disable=too-many-nested-blocks
-experiments = set(
-RuntimeValueProvider.get_value('experiments', str, '').split(','))
+experiments = RuntimeValueProvider.get_value('experiments', list, [])
 try:
   while True:
 try:
diff --git a/sdks/python/apache_beam/runners/worker/sideinputs_test.py 
b/sdks/python/apache_beam/runners/worker/sideinputs_test.py
index 212dc19fde9..050ecdc5003 100644
--- a/sdks/python/apache_beam/runners/worker/sideinputs_test.py
+++ b/sdks/python/apache_beam/runners/worker/sideinputs_test.py
@@ -92,7 +92,7 @@ def test_bytes_read_behind_experiment(self):
 
   def test_bytes_read_are_reported(self):
 RuntimeValueProvider.set_runtime_options(
-{'experiments': 'sideinput_io_metrics,other'})
+{'experiments': ['sideinput_io_metrics', 'other']})
 mock_read_counter = mock.MagicMock()
 source_records = ['a', 'b', 'c', 'd']
 sources = [


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82952)
Time Spent: 40m  (was: 0.5h)

> Add tracking of bytes read / time spent when reading side inputs
> 
>
> Key: BEAM-3042
> URL: https://issues.apache.org/jira/browse/BEAM-3042
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> It is difficult for Dataflow users to understand how modifying a pipeline or 
> data set can affect how much inter-transform IO is used in their job. The 
> intent of this feature request is to help users understand how side inputs 
> behave when they are consumed.
> This will allow users to understand how much time and how much data their 
> pipeline uses to read/write to inter-transform IO. Users will also be able to 
> modify their pipelines and understand how their changes affect these IO 
> metrics.
> For further information, please review the internal Google doc 
> go/insights-transform-io-design-doc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (2626fb5 -> bb410e2)

2018-03-21 Thread chamikara
This is an automated email from the ASF dual-hosted git repository.

chamikara pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 2626fb5  [BEAM-3873] Current version of commons-compress is DOS 
vulnerable CVE-2018-1324
 add 4dd7f66  Fixing check for sideinput_io_metrics experiment flag.
 new bb410e2  Merge pull request #4912: [BEAM-3042] Fixing check for 
sideinput_io_metrics experiment flag

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/python/apache_beam/runners/worker/operations.py  | 3 +--
 sdks/python/apache_beam/runners/worker/sideinputs.py  | 3 +--
 sdks/python/apache_beam/runners/worker/sideinputs_test.py | 2 +-
 3 files changed, 3 insertions(+), 5 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
chamik...@apache.org.


[beam] 01/01: Merge pull request #4912: [BEAM-3042] Fixing check for sideinput_io_metrics experiment flag

2018-03-21 Thread chamikara
This is an automated email from the ASF dual-hosted git repository.

chamikara pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit bb410e2ddacc818e91d4f6273cc5d3f3bc360a34
Merge: 2626fb5 4dd7f66
Author: Chamikara Jayalath 
AuthorDate: Wed Mar 21 15:03:33 2018 -0700

Merge pull request #4912: [BEAM-3042] Fixing check for sideinput_io_metrics 
experiment flag

 sdks/python/apache_beam/runners/worker/operations.py  | 3 +--
 sdks/python/apache_beam/runners/worker/sideinputs.py  | 3 +--
 sdks/python/apache_beam/runners/worker/sideinputs_test.py | 2 +-
 3 files changed, 3 insertions(+), 5 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
chamik...@apache.org.


[jira] [Work logged] (BEAM-3042) Add tracking of bytes read / time spent when reading side inputs

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3042?focusedWorklogId=82950=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82950
 ]

ASF GitHub Bot logged work on BEAM-3042:


Author: ASF GitHub Bot
Created on: 21/Mar/18 22:02
Start Date: 21/Mar/18 22:02
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #4912: [BEAM-3042] 
Fixing check for sideinput_io_metrics experiment flag.
URL: https://github.com/apache/beam/pull/4912#issuecomment-375110583
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82950)
Time Spent: 0.5h  (was: 20m)

> Add tracking of bytes read / time spent when reading side inputs
> 
>
> Key: BEAM-3042
> URL: https://issues.apache.org/jira/browse/BEAM-3042
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It is difficult for Dataflow users to understand how modifying a pipeline or 
> data set can affect how much inter-transform IO is used in their job. The 
> intent of this feature request is to help users understand how side inputs 
> behave when they are consumed.
> This will allow users to understand how much time and how much data their 
> pipeline uses to read/write to inter-transform IO. Users will also be able to 
> modify their pipelines and understand how their changes affect these IO 
> metrics.
> For further information, please review the internal Google doc 
> go/insights-transform-io-design-doc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >