date:20191016



 [ 
https://issues.apache.org/jira/browse/BEAM-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8418:
--
Description: 
Following pipeline fails on Dataflow runner unless we use beam_fn_api 
experiment.
{noformat}
class NoOpDoFn(beam.DoFn):
  def process(self, element):
return element

p = beam.Pipeline(options=pipeline_options)
_ = p | beam.Impulse() | beam.ParDo(NoOpDoFn())
result = p.run()
{noformat}

The reason is that we encode Impluse payload using url-escaping in [1], while 
Dataflow runner expects base64 encoding in non-fnapi mode. In FnApi mode, DF 
runner expects URL escaping.

We should fix or reconcile the encoding in non-FnAPI path, and add a 
ValidatesRunner test that catches this error.   

[1] 
https://github.com/apache/beam/blob/12d07745835e1b9c1e824b83beeeadf63ab4b234/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L633

  was:
Following pipeline fails on Dataflow runner unless we use beam_fn_api 
experiment.
{noformat}
class NoOpDoFn(beam.DoFn):
  def process(self, element):
return element

p = beam.Pipeline(options=pipeline_options)
_ = p | beam.Impulse() | beam.ParDo(NoOpDoFn())
result = p.run()
{noformat}

The reason is that we encode Impluse payload using url-escaping in [1], while 
Dataflow runner expects base64 encoding in non-fnapi mode. In FnApi mode, DF 
runner expects URL escaping.

We should fix or otherwise the encoding in non-FnAPI path, and add a 
ValidatesRunner test that catches this error.   

[1] 
https://github.com/apache/beam/blob/12d07745835e1b9c1e824b83beeeadf63ab4b234/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L633


> Fix handling of Impulse transform in Dataflow runner. 
> --
>
> Key: BEAM-8418
> URL: https://issues.apache.org/jira/browse/BEAM-8418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Major
>
> Following pipeline fails on Dataflow runner unless we use beam_fn_api 
> experiment.
> {noformat}
> class NoOpDoFn(beam.DoFn):
>   def process(self, element):
> return element
> p = beam.Pipeline(options=pipeline_options)
> _ = p | beam.Impulse() | beam.ParDo(NoOpDoFn())
> result = p.run()
> {noformat}
> The reason is that we encode Impluse payload using url-escaping in [1], while 
> Dataflow runner expects base64 encoding in non-fnapi mode. In FnApi mode, DF 
> runner expects URL escaping.
> We should fix or reconcile the encoding in non-FnAPI path, and add a 
> ValidatesRunner test that catches this error.   
> [1] 
> https://github.com/apache/beam/blob/12d07745835e1b9c1e824b83beeeadf63ab4b234/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-8418) Fix handling of Impulse transform in Dataflow runner.



 [ 
https://issues.apache.org/jira/browse/BEAM-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev reassigned BEAM-8418:
-

Assignee: Robert Bradshaw

> Fix handling of Impulse transform in Dataflow runner. 
> --
>
> Key: BEAM-8418
> URL: https://issues.apache.org/jira/browse/BEAM-8418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Major
>
> Following pipeline fails on Dataflow runner unless we use beam_fn_api 
> experiment.
> {noformat}
> class NoOpDoFn(beam.DoFn):
>   def process(self, element):
> return element
> p = beam.Pipeline(options=pipeline_options)
> _ = p | beam.Impulse() | beam.ParDo(NoOpDoFn())
> result = p.run()
> {noformat}
> The reason is that we encode Impluse payload using url-escaping in [1], while 
> Dataflow runner expects base64 encoding in non-fnapi mode. In FnApi mode, DF 
> runner expects URL escaping.
> We should fix or otherwise the encoding in non-FnAPI path, and add a 
> ValidatesRunner test that catches this error.   
> [1] 
> https://github.com/apache/beam/blob/12d07745835e1b9c1e824b83beeeadf63ab4b234/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8418) Fix handling of Impulse transform in Dataflow runner.



 [ 
https://issues.apache.org/jira/browse/BEAM-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8418:
--
Status: Open  (was: Triage Needed)

> Fix handling of Impulse transform in Dataflow runner. 
> --
>
> Key: BEAM-8418
> URL: https://issues.apache.org/jira/browse/BEAM-8418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> Following pipeline fails on Dataflow runner unless we use beam_fn_api 
> experiment.
> {noformat}
> class NoOpDoFn(beam.DoFn):
>   def process(self, element):
> return element
> p = beam.Pipeline(options=pipeline_options)
> _ = p | beam.Impulse() | beam.ParDo(NoOpDoFn())
> result = p.run()
> {noformat}
> The reason is that we encode Impluse payload using url-escaping in [1], while 
> Dataflow runner expects base64 encoding in non-fnapi mode. In FnApi mode, DF 
> runner expects URL escaping.
> We should fix or otherwise the encoding in non-FnAPI path, and add a 
> ValidatesRunner test that catches this error.   
> [1] 
> https://github.com/apache/beam/blob/12d07745835e1b9c1e824b83beeeadf63ab4b234/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-8418) Fix handling of Impulse transform in Dataflow runner.



[ 
https://issues.apache.org/jira/browse/BEAM-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953319#comment-16953319
 ] 

Valentyn Tymofieiev edited comment on BEAM-8418 at 10/17/19 2:12 AM:
-

[~robertwb] do you have an insight why URL-escaping was chosen to encode 
impulse payload in [1], and advice on how the encoding should be reconciled 
here?

Thanks. 

[1] 
https://github.com/apache/beam/pull/6581/files#diff-28cee3232c08db90a95ec550f0b182ccR517


was (Author: tvalentyn):
[~robertwb] do you have an insight why URL-escaping was chosen, and advise on 
how the encoding should be reconciled here?

Thanks. 

> Fix handling of Impulse transform in Dataflow runner. 
> --
>
> Key: BEAM-8418
> URL: https://issues.apache.org/jira/browse/BEAM-8418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> Following pipeline fails on Dataflow runner unless we use beam_fn_api 
> experiment.
> {noformat}
> class NoOpDoFn(beam.DoFn):
>   def process(self, element):
> return element
> p = beam.Pipeline(options=pipeline_options)
> _ = p | beam.Impulse() | beam.ParDo(NoOpDoFn())
> result = p.run()
> {noformat}
> The reason is that we encode Impluse payload using url-escaping in [1], while 
> Dataflow runner expects base64 encoding in non-fnapi mode. In FnApi mode, DF 
> runner expects URL escaping.
> We should fix or otherwise the encoding in non-FnAPI path, and add a 
> ValidatesRunner test that catches this error.   
> [1] 
> https://github.com/apache/beam/blob/12d07745835e1b9c1e824b83beeeadf63ab4b234/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8418) Fix handling of Impulse transform in Dataflow runner.



[ 
https://issues.apache.org/jira/browse/BEAM-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953319#comment-16953319
 ] 

Valentyn Tymofieiev commented on BEAM-8418:
---

[~robertwb] do you have an insight why URL-escaping was chosen, and advise on 
how the encoding should be reconciled here?

Thanks. 

> Fix handling of Impulse transform in Dataflow runner. 
> --
>
> Key: BEAM-8418
> URL: https://issues.apache.org/jira/browse/BEAM-8418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> Following pipeline fails on Dataflow runner unless we use beam_fn_api 
> experiment.
> {noformat}
> class NoOpDoFn(beam.DoFn):
>   def process(self, element):
> return element
> p = beam.Pipeline(options=pipeline_options)
> _ = p | beam.Impulse() | beam.ParDo(NoOpDoFn())
> result = p.run()
> {noformat}
> The reason is that we encode Impluse payload using url-escaping in [1], while 
> Dataflow runner expects base64 encoding in non-fnapi mode. In FnApi mode, DF 
> runner expects URL escaping.
> We should fix or otherwise the encoding in non-FnAPI path, and add a 
> ValidatesRunner test that catches this error.   
> [1] 
> https://github.com/apache/beam/blob/12d07745835e1b9c1e824b83beeeadf63ab4b234/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8418) Fix handling of Impulse transform in Dataflow runner.

Valentyn Tymofieiev created BEAM-8418:
-

 Summary: Fix handling of Impulse transform in Dataflow runner. 
 Key: BEAM-8418
 URL: https://issues.apache.org/jira/browse/BEAM-8418
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Valentyn Tymofieiev


Following pipeline fails on Dataflow runner unless we use beam_fn_api 
experiment.
{noformat}
class NoOpDoFn(beam.DoFn):
  def process(self, element):
return element

p = beam.Pipeline(options=pipeline_options)
_ = p | beam.Impulse() | beam.ParDo(NoOpDoFn())
result = p.run()
{noformat}

The reason is that we encode Impluse payload using url-escaping in [1], while 
Dataflow runner expects base64 encoding in non-fnapi mode. In FnApi mode, DF 
runner expects URL escaping.

We should fix or otherwise the encoding in non-FnAPI path, and add a 
ValidatesRunner test that catches this error.   

[1] 
https://github.com/apache/beam/blob/12d07745835e1b9c1e824b83beeeadf63ab4b234/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8402) Create a class hierarchy to represent environments



 [ 
https://issues.apache.org/jira/browse/BEAM-8402?focusedWorklogId=329518=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329518
 ]

ASF GitHub Bot logged work on BEAM-8402:


Author: ASF GitHub Bot
Created on: 17/Oct/19 01:18
Start Date: 17/Oct/19 01:18
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9811: [BEAM-8402] 
Create a class hierarchy to represent environments
URL: https://github.com/apache/beam/pull/9811#discussion_r335774454
 
 

 ##
 File path: sdks/python/apache_beam/transforms/environments.py
 ##
 @@ -0,0 +1,388 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Environments concepts."""
+
+from __future__ import absolute_import
+
+import json
+
+from google.protobuf import message
+
+from apache_beam.portability import common_urns
+from apache_beam.portability import python_urns
+from apache_beam.portability.api import beam_runner_api_pb2
+from apache_beam.portability.api import endpoints_pb2
+from apache_beam.utils import proto_utils
+
+__all__ = ['Environment',
+   'DockerEnvironment', 'ProcessEnvironment', 'ExternalEnvironment',
+   'EmbeddedPythonEnvironment', 'EmbeddedPythonGrpcEnvironment',
+   'SubprocessSDKEnvironment', 'RunnerAPIEnvironmentHolder']
+
+
+class Environment(object):
+  """Abstract base class for environments.
+
+  Represents a type and configuration of environment.
+  Each type of Environment should have a unique urn.
+  """
+
+  _known_urns = {}
+  _urn_to_env_cls = {}
+
+  def to_runner_api_parameter(self, context):
+raise NotImplementedError
+
+  @classmethod
+  def register_urn(cls, urn, parameter_type, constructor=None):
+
+def register(constructor):
+  if isinstance(constructor, type):
+constructor.from_runner_api_parameter = register(
+constructor.from_runner_api_parameter)
+# register environment urn to environment class
+cls._urn_to_env_cls[urn] = constructor
+return constructor
+
+  else:
+cls._known_urns[urn] = parameter_type, constructor
+return staticmethod(constructor)
+
+if constructor:
+  # Used as a statement.
+  register(constructor)
+else:
+  # Used as a decorator.
+  return register
+
+  @classmethod
+  def get_env_cls_from_urn(cls, urn):
+return cls._urn_to_env_cls[urn]
+
+  def to_runner_api(self, context):
+urn, typed_param = self.to_runner_api_parameter(context)
+return beam_runner_api_pb2.Environment(
+urn=urn,
+payload=typed_param.SerializeToString()
+if isinstance(typed_param, message.Message)
+else typed_param if (isinstance(typed_param, bytes) or
+ typed_param is None)
+else typed_param.encode('utf-8')
+)
+
+  @classmethod
+  def from_runner_api(cls, proto, context):
+if proto is None or not proto.urn:
+  return None
+parameter_type, constructor = cls._known_urns[proto.urn]
+
+try:
+  return constructor(
+  proto_utils.parse_Bytes(proto.payload, parameter_type),
+  context)
+except Exception:
+  if context.allow_proto_holders:
+return RunnerAPIEnvironmentHolder(proto)
+  raise
+
+  @classmethod
+  def from_options(cls, options):
+"""Creates an Environment object from PipelineOptions.
+
+Args:
+  options: The PipelineOptions object.
+"""
+raise NotImplementedError
+
+
+@Environment.register_urn(common_urns.environments.DOCKER.urn,
+  beam_runner_api_pb2.DockerPayload)
+class DockerEnvironment(Environment):
+
+  def __init__(self, container_image=None):
+from apache_beam.runners.portability.portable_runner import PortableRunner
+
+if container_image:
+  self.container_image = container_image
+else:
+  self.container_image = PortableRunner.default_docker_image()
 
 Review comment:
   perhaps this should be allowed to be None, and choosing a default image if 
it is None can be handled inside the `PortableRunner`? 
 

This is an automated message

[jira] [Assigned] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam

2019-10-16 Thread Ahmet Altay (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay reassigned BEAM-8368:
-

Assignee: Brian Hulette  (was: Ahmet Altay)

> [Python] libprotobuf-generated exception when importing apache_beam
> ---
>
> Key: BEAM-8368
> URL: https://issues.apache.org/jira/browse/BEAM-8368
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.17.0
>Reporter: Ubaier Bhat
>Assignee: Brian Hulette
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: error_log.txt
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Unable to import apache_beam after upgrading to macos 10.15 (Catalina). 
> Cleared all the pipenvs and but can't get it working again.
> {code}
> import apache_beam as beam
> /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already 
> exists in database: 
> [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> libc++abi.dylib: terminating with uncaught exception of type 
> google::protobuf::FatalException: CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8417) Expose ExternalWorkerHandler hostname



 [ 
https://issues.apache.org/jira/browse/BEAM-8417?focusedWorklogId=329506=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329506
 ]

ASF GitHub Bot logged work on BEAM-8417:


Author: ASF GitHub Bot
Created on: 17/Oct/19 00:44
Start Date: 17/Oct/19 00:44
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9814: [BEAM-8417] Expose 
ExternalWorkerHandler hostname
URL: https://github.com/apache/beam/pull/9814#issuecomment-542948055
 
 
   R: @robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329506)
Time Spent: 0.5h  (was: 20m)

> Expose ExternalWorkerHandler hostname
> -
>
> Key: BEAM-8417
> URL: https://issues.apache.org/jira/browse/BEAM-8417
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Wanqi Lyu
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently `fn_api_runner.ExternalWorkerHandler` endpoints have `localhost` as 
> their hostname by default, which prevents it from being connected from 
> external workers started on other machines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8417) Expose ExternalWorkerHandler hostname



 [ 
https://issues.apache.org/jira/browse/BEAM-8417?focusedWorklogId=329507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329507
 ]

ASF GitHub Bot logged work on BEAM-8417:


Author: ASF GitHub Bot
Created on: 17/Oct/19 00:44
Start Date: 17/Oct/19 00:44
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9814: [BEAM-8417] Expose 
ExternalWorkerHandler hostname
URL: https://github.com/apache/beam/pull/9814#issuecomment-542948153
 
 
   R: @tweise 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329507)
Time Spent: 40m  (was: 0.5h)

> Expose ExternalWorkerHandler hostname
> -
>
> Key: BEAM-8417
> URL: https://issues.apache.org/jira/browse/BEAM-8417
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Wanqi Lyu
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently `fn_api_runner.ExternalWorkerHandler` endpoints have `localhost` as 
> their hostname by default, which prevents it from being connected from 
> external workers started on other machines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8417) Expose ExternalWorkerHandler hostname



 [ 
https://issues.apache.org/jira/browse/BEAM-8417?focusedWorklogId=329503=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329503
 ]

ASF GitHub Bot logged work on BEAM-8417:


Author: ASF GitHub Bot
Created on: 17/Oct/19 00:42
Start Date: 17/Oct/19 00:42
Worklog Time Spent: 10m 
  Work Description: violalyu commented on pull request #9814: [BEAM-8417] 
Expose ExternalWorkerHandler hostname
URL: https://github.com/apache/beam/pull/9814
 
 
   Currently `fn_api_runner.ExternalWorkerHandler` endpoints have `localhost` 
as their hostname by default, which prevents it from being connected from 
external workers started on other machines.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build

[jira] [Work logged] (BEAM-8417) Expose ExternalWorkerHandler hostname



 [ 
https://issues.apache.org/jira/browse/BEAM-8417?focusedWorklogId=329505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329505
 ]

ASF GitHub Bot logged work on BEAM-8417:


Author: ASF GitHub Bot
Created on: 17/Oct/19 00:42
Start Date: 17/Oct/19 00:42
Worklog Time Spent: 10m 
  Work Description: violalyu commented on issue #9814: [BEAM-8417] Expose 
ExternalWorkerHandler hostname
URL: https://github.com/apache/beam/pull/9814#issuecomment-542947670
 
 
   R: @Hannah-Jiang 
   R: @mxm 
   R: @chadrik 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329505)
Time Spent: 20m  (was: 10m)

> Expose ExternalWorkerHandler hostname
> -
>
> Key: BEAM-8417
> URL: https://issues.apache.org/jira/browse/BEAM-8417
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Wanqi Lyu
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently `fn_api_runner.ExternalWorkerHandler` endpoints have `localhost` as 
> their hostname by default, which prevents it from being connected from 
> external workers started on other machines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8417) Expose ExternalWorkerHandler hostname

2019-10-16 Thread Wanqi Lyu (Jira)

Wanqi Lyu created BEAM-8417:
---

 Summary: Expose ExternalWorkerHandler hostname
 Key: BEAM-8417
 URL: https://issues.apache.org/jira/browse/BEAM-8417
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Wanqi Lyu


Currently `fn_api_runner.ExternalWorkerHandler` endpoints have `localhost` as 
their hostname by default, which prevents it from being connected from external 
workers started on other machines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8415) Improve error message when adding a PTransform with a name that already exists in the pipeline



 [ 
https://issues.apache.org/jira/browse/BEAM-8415?focusedWorklogId=329500=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329500
 ]

ASF GitHub Bot logged work on BEAM-8415:


Author: ASF GitHub Bot
Created on: 17/Oct/19 00:29
Start Date: 17/Oct/19 00:29
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on issue #9812: [BEAM-8415] 
Improving error message when applying PTransform with a n…
URL: https://github.com/apache/beam/pull/9812#issuecomment-542944977
 
 
   Done. Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329500)
Time Spent: 0.5h  (was: 20m)

> Improve error message when adding a PTransform with a name that already 
> exists in the pipeline
> --
>
> Key: BEAM-8415
> URL: https://issues.apache.org/jira/browse/BEAM-8415
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: David Yan
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, when trying to apply a PTransform with a name that already exists 
> in the pipeline, it returns a confusing error:
> Transform "XXX" does not have a stable unique label. This will prevent 
> updating of pipelines. To apply a transform with a specified label write 
> pvalue | "label" >> transform
> We'd like to improve this error message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.



 [ 
https://issues.apache.org/jira/browse/BEAM-8372?focusedWorklogId=329497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329497
 ]

ASF GitHub Bot logged work on BEAM-8372:


Author: ASF GitHub Bot
Created on: 17/Oct/19 00:09
Start Date: 17/Oct/19 00:09
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9775: [BEAM-8372] Job server 
submitting UberJars directly to Flink Runner.
URL: https://github.com/apache/beam/pull/9775#issuecomment-542941178
 
 
   I opened a bug for flakiness in test_concurrent_requests. Perhaps this PR 
caused it?
   https://issues.apache.org/jira/browse/BEAM-8416
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329497)
Time Spent: 4h 20m  (was: 4h 10m)

> Allow submission of Flink UberJar directly to flink cluster.
> 
>
> Key: BEAM-8372
> URL: https://issues.apache.org/jira/browse/BEAM-8372
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8416) ZipFileArtifactServiceTest.test_concurrent_requests flaky

2019-10-16 Thread Udi Meiri (Jira)

Udi Meiri created BEAM-8416:
---

 Summary: ZipFileArtifactServiceTest.test_concurrent_requests flaky
 Key: BEAM-8416
 URL: https://issues.apache.org/jira/browse/BEAM-8416
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Udi Meiri


{code}
Traceback (most recent call last):
  File "/usr/lib/python3.7/unittest/case.py", line 59, in testPartExecutor
yield
  File "/usr/lib/python3.7/unittest/case.py", line 615, in run
testMethod()
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py",
 line 215, in test_concurrent_requests
_ = list(pool.map(check, range(100)))
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 586, in 
result_iterator
yield fs.pop().result()
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in 
__get_result
raise self._exception
  File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py",
 line 208, in check
self._service, tokens[session(index)], name(index)))
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py",
 line 73, in retrieve_artifact
name=name)))
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py",
 line 70, in 
return b''.join(chunk.data for chunk in retrieval_service.GetArtifact(
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service.py",
 line 133, in GetArtifact
chunk = fin.read(self._chunk_size)
  File "/usr/lib/python3.7/zipfile.py", line 899, in read
data = self._read1(n)
  File "/usr/lib/python3.7/zipfile.py", line 989, in _read1
self._update_crc(data)
  File "/usr/lib/python3.7/zipfile.py", line 917, in _update_crc
raise BadZipFile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipFile: Bad CRC-32 for file 
'/3b2b55eb92de23535010b7ac80d553ec2d4bae872ac5606bc3042ce9313dff87/e1d492628cc0c1d0c1b736184f689be54fa03a996de918268ad834560e77305f'
{code}
and:
{code}
Traceback (most recent call last):
  File "/usr/lib/python3.7/unittest/case.py", line 59, in testPartExecutor
yield
  File "/usr/lib/python3.7/unittest/case.py", line 615, in run
testMethod()
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py",
 line 215, in test_concurrent_requests
_ = list(pool.map(check, range(100)))
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 586, in 
result_iterator
yield fs.pop().result()
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in 
__get_result
raise self._exception
  File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py",
 line 208, in check
self._service, tokens[session(index)], name(index)))
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py",
 line 73, in retrieve_artifact
name=name)))
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py",
 line 70, in 
return b''.join(chunk.data for chunk in retrieval_service.GetArtifact(
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service.py",
 line 128, in GetArtifact
with self._open(artifact.uri, 'r') as fin:
  File

[jira] [Work logged] (BEAM-8415) Improve error message when adding a PTransform with a name that already exists in the pipeline



 [ 
https://issues.apache.org/jira/browse/BEAM-8415?focusedWorklogId=329496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329496
 ]

ASF GitHub Bot logged work on BEAM-8415:


Author: ASF GitHub Bot
Created on: 17/Oct/19 00:04
Start Date: 17/Oct/19 00:04
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9812: [BEAM-8415] 
Improving error message when applying PTransform with a n…
URL: https://github.com/apache/beam/pull/9812#issuecomment-542940234
 
 
   Thanks! I think it'd be good to keep it actionable by keeping the part about 
how a transform can be renamed. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329496)
Time Spent: 20m  (was: 10m)

> Improve error message when adding a PTransform with a name that already 
> exists in the pipeline
> --
>
> Key: BEAM-8415
> URL: https://issues.apache.org/jira/browse/BEAM-8415
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: David Yan
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, when trying to apply a PTransform with a name that already exists 
> in the pipeline, it returns a confusing error:
> Transform "XXX" does not have a stable unique label. This will prevent 
> updating of pipelines. To apply a transform with a specified label write 
> pvalue | "label" >> transform
> We'd like to improve this error message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8415) Improve error message when adding a PTransform with a name that already exists in the pipeline



 [ 
https://issues.apache.org/jira/browse/BEAM-8415?focusedWorklogId=329491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329491
 ]

ASF GitHub Bot logged work on BEAM-8415:


Author: ASF GitHub Bot
Created on: 16/Oct/19 23:52
Start Date: 16/Oct/19 23:52
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #9812: [BEAM-8415] 
Improving error message when applying PTransform with a n…
URL: https://github.com/apache/beam/pull/9812
 
 
   Improving error message when applying PTransform with name that already 
exists in the pipeline.
   
   R: @robertwb 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build

[jira] [Work logged] (BEAM-8402) Create a class hierarchy to represent environments



 [ 
https://issues.apache.org/jira/browse/BEAM-8402?focusedWorklogId=329487=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329487
 ]

ASF GitHub Bot logged work on BEAM-8402:


Author: ASF GitHub Bot
Created on: 16/Oct/19 23:46
Start Date: 16/Oct/19 23:46
Worklog Time Spent: 10m 
  Work Description: violalyu commented on issue #9811: [BEAM-8402] Create a 
class hierarchy to represent environments
URL: https://github.com/apache/beam/pull/9811#issuecomment-542936034
 
 
   R: @robertwb 
   R: @mxm 
   R: @chadrik 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329487)
Time Spent: 20m  (was: 10m)

> Create a class hierarchy to represent environments
> --
>
> Key: BEAM-8402
> URL: https://issues.apache.org/jira/browse/BEAM-8402
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As a first step towards making it possible to assign different environments 
> to sections of a pipeline, we first need to expose environment classes to the 
> pipeline API.  Unlike PTransforms, PCollections, Coders, and Windowings,  
> environments exists solely in the portability framework as protobuf objects.  
>  By creating a hierarchy of "native" classes that represent the various 
> environment types -- external, docker, process, etc -- users will be able to 
> instantiate these and assign them to parts of the pipeline.  The assignment 
> portion will be covered in a follow-up issue/PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8415) Improve error message when adding a PTransform with a name that already exists in the pipeline

2019-10-16 Thread David Yan (Jira)

David Yan created BEAM-8415:
---

 Summary: Improve error message when adding a PTransform with a 
name that already exists in the pipeline
 Key: BEAM-8415
 URL: https://issues.apache.org/jira/browse/BEAM-8415
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: David Yan


Currently, when trying to apply a PTransform with a name that already exists in 
the pipeline, it returns a confusing error:

Transform "XXX" does not have a stable unique label. This will prevent updating 
of pipelines. To apply a transform with a specified label write pvalue | 
"label" >> transform

We'd like to improve this error message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8402) Create a class hierarchy to represent environments



 [ 
https://issues.apache.org/jira/browse/BEAM-8402?focusedWorklogId=329485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329485
 ]

ASF GitHub Bot logged work on BEAM-8402:


Author: ASF GitHub Bot
Created on: 16/Oct/19 23:43
Start Date: 16/Oct/19 23:43
Worklog Time Spent: 10m 
  Work Description: violalyu commented on pull request #9811: [BEAM-8402] 
Create a class hierarchy to represent environments
URL: https://github.com/apache/beam/pull/9811
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)



 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=329480=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329480
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 16/Oct/19 23:18
Start Date: 16/Oct/19 23:18
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #9790: [BEAM-7389] Show 
code snippet outputs as stdout
URL: https://github.com/apache/beam/pull/9790#issuecomment-542929916
 
 
   Added a wrapper function to correctly handle UTF-8 strings as well as Python 
objects/dictionaries from stdout in both Python 2 and 3.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329480)
Time Spent: 69.5h  (was: 69h 20m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 69.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-646) Get runners out of the apply()

2019-10-16 Thread Ning Kang (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953250#comment-16953250
 ] 

Ning Kang edited comment on BEAM-646 at 10/16/19 10:51 PM:
---

Hi,

This is Ning. I'm working on a project that supports InteractiveRunner 
(Python3+) when users use Beam in notebook environment.

Would apply() interception in runner still be useful as a hook for non-Beam 
related but interactivity related features such as collecting IPython/notebook 
cell information when a PTransfrom is applied?

Of course, we can also have all pipelines support interactivity by making the 
interception inside pipelines themselves. But it's unlikely that all runners 
would/could take a pipeline with interactivity logic at this moment. And those 
ipython/notebook dependencies probably shouldn't be introduced into pipeline 
itself.

Would there be any APIs that support invoking arbitrary external logic at 
different stages of building a pipeline when pipeline is completely decoupled 
from runner?

Thanks!


was (Author: ningk):
Hi,

This is Ning. I'm working a project that supports InteractiveRunner (Python3+) 
when users use Beam in notebook environment.

Would apply() interception in runner still be useful as a hook for non-Beam 
related but interactivity related features such as collecting IPython/notebook 
cell information when a PTransfrom is applied?

Of course, we can also have all pipelines support interactivity by making the 
interception inside pipelines themselves. But it's unlikely that all runners 
would/could take a pipeline with interactivity logic at this moment. And those 
ipython/notebook dependencies probably shouldn't be introduced into pipeline 
itself.

Would there be any APIs that support invoking arbitrary external logic at 
different stages of building a pipeline when pipeline is completely decoupled 
from runner?

Thanks!

> Get runners out of the apply()
> --
>
> Key: BEAM-646
> URL: https://issues.apache.org/jira/browse/BEAM-646
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Thomas Groh
>Priority: Major
>  Labels: backwards-incompatible
> Fix For: 0.6.0
>
>
> Right now, the runner intercepts calls to apply() and replaces transforms as 
> we go. This means that there is no "original" user graph. For portability and 
> misc architectural benefits, we would like to build the original graph first, 
> and have the runner override later.
> Some runners already work in this manner, but we could integrate it more 
> smoothly, with more validation, via some handy APIs on e.g. the Pipeline 
> object.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-646) Get runners out of the apply()

2019-10-16 Thread Ning Kang (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953250#comment-16953250
 ] 

Ning Kang commented on BEAM-646:


Hi,

This is Ning. I'm working a project that supports InteractiveRunner (Python3+) 
when users use Beam in notebook environment.

Would apply() interception in runner still be useful as a hook for non-Beam 
related but interactivity related features such as collecting IPython/notebook 
cell information when a PTransfrom is applied?

Of course, we can also have all pipelines support interactivity by making the 
interception inside pipelines themselves. But it's unlikely that all runners 
would/could take a pipeline with interactivity logic at this moment. And those 
ipython/notebook dependencies probably shouldn't be introduced into pipeline 
itself.

Would there be any APIs that support invoking arbitrary external logic at 
different stages of building a pipeline when pipeline is completely decoupled 
from runner?

Thanks!

> Get runners out of the apply()
> --
>
> Key: BEAM-646
> URL: https://issues.apache.org/jira/browse/BEAM-646
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Thomas Groh
>Priority: Major
>  Labels: backwards-incompatible
> Fix For: 0.6.0
>
>
> Right now, the runner intercepts calls to apply() and replaces transforms as 
> we go. This means that there is no "original" user graph. For portability and 
> misc architectural benefits, we would like to build the original graph first, 
> and have the runner override later.
> Some runners already work in this manner, but we could integrate it more 
> smoothly, with more validation, via some handy APIs on e.g. the Pipeline 
> object.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam



 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=329462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329462
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 16/Oct/19 22:32
Start Date: 16/Oct/19 22:32
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9741: [BEAM-7926] Visualize 
PCollection
URL: https://github.com/apache/beam/pull/9741#issuecomment-542918457
 
 
   retest this please.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329462)
Time Spent: 2h 10m  (was: 2h)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs



 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=329461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329461
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 16/Oct/19 22:32
Start Date: 16/Oct/19 22:32
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335738488
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
 ##
 @@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rule;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor.FieldDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.utils.SelectHelpers;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelRecordType;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilder;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting;
+
+public class BeamIOPushDownRule extends RelOptRule {
+  // ~ Static fields/initializers -
+
+  public static final BeamIOPushDownRule INSTANCE =
+  new BeamIOPushDownRule(RelFactories.LOGICAL_BUILDER);
+
+  // ~ Constructors ---
+
+  public BeamIOPushDownRule(RelBuilderFactory relBuilderFactory) {
+super(operand(Calc.class, operand(BeamIOSourceRel.class, any())), 
relBuilderFactory, null);
+  }
+
+  // ~ Methods 
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+final BeamIOSourceRel ioSourceRel = call.rel(1);
+final BeamSqlTable beamSqlTable = ioSourceRel.getBeamSqlTable();
+
+if (!beamSqlTable.supportsProjects()) {
+  return;
+}
+
+// Nested rows are not supported at the moment
+for (RelDataTypeField field : ioSourceRel.getRowType().getFieldList()) {
+  if

[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs



 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=329456=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329456
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 16/Oct/19 22:15
Start Date: 16/Oct/19 22:15
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335733407
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
 ##
 @@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rule;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor.FieldDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.utils.SelectHelpers;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelRecordType;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilder;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting;
+
+public class BeamIOPushDownRule extends RelOptRule {
+  // ~ Static fields/initializers -
+
+  public static final BeamIOPushDownRule INSTANCE =
+  new BeamIOPushDownRule(RelFactories.LOGICAL_BUILDER);
+
+  // ~ Constructors ---
+
+  public BeamIOPushDownRule(RelBuilderFactory relBuilderFactory) {
+super(operand(Calc.class, operand(BeamIOSourceRel.class, any())), 
relBuilderFactory, null);
+  }
+
+  // ~ Methods 
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+final BeamIOSourceRel ioSourceRel = call.rel(1);
+final BeamSqlTable beamSqlTable = ioSourceRel.getBeamSqlTable();
+
+if (!beamSqlTable.supportsProjects()) {
+  return;
+}
+
+// Nested rows are not supported at the moment
+for (RelDataTypeField field : ioSourceRel.getRowType().getFieldList()) {
+  if

[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs



 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=329453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329453
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 16/Oct/19 22:11
Start Date: 16/Oct/19 22:11
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #9764: [BEAM-8365] Project 
push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#issuecomment-542912906
 
 
   cc: @amaliujia 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329453)
Time Spent: 7.5h  (was: 7h 20m)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8072) Allow non ColumnRef nodes in aggregation functions



 [ 
https://issues.apache.org/jira/browse/BEAM-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gleb Kanterov updated BEAM-8072:

Summary: Allow non ColumnRef nodes in aggregation functions  (was: Allow 
non ColumnRef nodes in aggreation functions)

> Allow non ColumnRef nodes in aggregation functions
> --
>
> Key: BEAM-8072
> URL: https://issues.apache.org/jira/browse/BEAM-8072
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Critical
>
> Currently we throw an error if node is not a Column Ref or CAST(Column Ref)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8042) Parsing of aggregate query fails



[ 
https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953197#comment-16953197
 ] 

Gleb Kanterov commented on BEAM-8042:
-

The exception happens during query planning, so it is reproducible even if a 
table is non-empty because the query planner doesn't know if table is empty or 
not.

> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Critical
>
> {code}
>   @Rule
>   public TestPipeline pipeline = 
> TestPipeline.fromOptions(createPipelineOptions());
>   private static PipelineOptions createPipelineOptions() {
> BeamSqlPipelineOptions opts = 
> PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class);
> opts.setPlannerName(ZetaSQLQueryPlanner.class.getName());
> return opts;
>   }
>   @Test
>   public void testAggregate() {
> Schema inputSchema = Schema.builder()
> .addByteArrayField("id")
> .addInt64Field("has_f1")
> .addInt64Field("has_f2")
> .addInt64Field("has_f3")
> .addInt64Field("has_f4")
> .addInt64Field("has_f5")
> .addInt64Field("has_f6")
> .build();
> String sql = "SELECT \n" +
> "  id, \n" +
> "  COUNT(*) as count, \n" +
> "  SUM(has_f1) as f1_count, \n" +
> "  SUM(has_f2) as f2_count, \n" +
> "  SUM(has_f3) as f3_count, \n" +
> "  SUM(has_f4) as f4_count, \n" +
> "  SUM(has_f5) as f5_count, \n" +
> "  SUM(has_f6) as f6_count  \n" +
> "FROM PCOLLECTION \n" +
> "GROUP BY id";
> pipeline
> .apply(Create.empty(inputSchema))
> .apply(SqlTransform.query(sql));
> pipeline.run();
>   }
> {code}
> {code}
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>   at 
>   ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>   at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205)
>   ... 48 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8081) CalciteQueryPlanner breaks ZetaSQLQueryPlanner with setting RelMetadataQuery.THREAD_PROVIDERS



[ 
https://issues.apache.org/jira/browse/BEAM-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953196#comment-16953196
 ] 

Gleb Kanterov commented on BEAM-8081:
-

Lowering priority because it's possible to create a new Thread as a workaround.

> CalciteQueryPlanner breaks ZetaSQLQueryPlanner with setting 
> RelMetadataQuery.THREAD_PROVIDERS
> -
>
> Key: BEAM-8081
> URL: https://issues.apache.org/jira/browse/BEAM-8081
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Gleb Kanterov
>Priority: Major
>
> If two independent tests will share the same thread, and the first one will 
> use one CalciteQueryPlanner, and the second one ZetaSQLPlanner, it's going to 
> throw NullPointerException.
> The root cause is using RelMetadataQuery.THREAD_PROVIDERS in 
> CalciteQueryPlanner
> {code}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.getNodeStats(BeamSqlRelUtils.java:96)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel.beamComputeSelfCost(BeamIOSourceRel.java:111)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner$NonCumulativeCostImpl.getNonCumulativeCost(CalciteQueryPlanner.java:217)
>   at 
> GeneratedMetadataHandler_NonCumulativeCost.getNonCumulativeCost_$(Unknown 
> Source)
>   at 
> GeneratedMetadataHandler_NonCumulativeCost.getNonCumulativeCost(Unknown 
> Source)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.metadata.RelMetadataQuery.getNonCumulativeCost(RelMetadataQuery.java:301)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.getCost(VolcanoPlanner.java:929)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.RelSubset.propagateCostImprovements0(RelSubset.java:347)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.RelSubset.propagateCostImprovements(RelSubset.java:330)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(VolcanoPlanner.java:1816)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1752)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:846)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:868)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:90)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:329)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1656)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:846)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:868)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:90)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:329)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1656)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:846)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:868)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:90)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:329)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1656)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:846)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:868)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.changeTraits(VolcanoPlanner.java:529)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:325)
>   at 
>

[jira] [Updated] (BEAM-8081) CalciteQueryPlanner breaks ZetaSQLQueryPlanner with setting RelMetadataQuery.THREAD_PROVIDERS



 [ 
https://issues.apache.org/jira/browse/BEAM-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gleb Kanterov updated BEAM-8081:

Priority: Major  (was: Critical)

> CalciteQueryPlanner breaks ZetaSQLQueryPlanner with setting 
> RelMetadataQuery.THREAD_PROVIDERS
> -
>
> Key: BEAM-8081
> URL: https://issues.apache.org/jira/browse/BEAM-8081
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Gleb Kanterov
>Priority: Major
>
> If two independent tests will share the same thread, and the first one will 
> use one CalciteQueryPlanner, and the second one ZetaSQLPlanner, it's going to 
> throw NullPointerException.
> The root cause is using RelMetadataQuery.THREAD_PROVIDERS in 
> CalciteQueryPlanner
> {code}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.getNodeStats(BeamSqlRelUtils.java:96)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel.beamComputeSelfCost(BeamIOSourceRel.java:111)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner$NonCumulativeCostImpl.getNonCumulativeCost(CalciteQueryPlanner.java:217)
>   at 
> GeneratedMetadataHandler_NonCumulativeCost.getNonCumulativeCost_$(Unknown 
> Source)
>   at 
> GeneratedMetadataHandler_NonCumulativeCost.getNonCumulativeCost(Unknown 
> Source)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.metadata.RelMetadataQuery.getNonCumulativeCost(RelMetadataQuery.java:301)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.getCost(VolcanoPlanner.java:929)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.RelSubset.propagateCostImprovements0(RelSubset.java:347)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.RelSubset.propagateCostImprovements(RelSubset.java:330)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(VolcanoPlanner.java:1816)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1752)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:846)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:868)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:90)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:329)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1656)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:846)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:868)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:90)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:329)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1656)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:846)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:868)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:90)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:329)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1656)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:846)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:868)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.changeTraits(VolcanoPlanner.java:529)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:325)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
>

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=329367=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329367
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 16/Oct/19 20:32
Start Date: 16/Oct/19 20:32
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r335694371
 
 

 ##
 File path: sdks/python/apache_beam/testing/interactive_stream.py
 ##
 @@ -0,0 +1,123 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import grpc
+import time
+
+from apache_beam.portability.api import beam_interactive_api_pb2
+from apache_beam.portability.api import beam_interactive_api_pb2_grpc
+from apache_beam.portability.api.beam_interactive_api_pb2_grpc import 
InteractiveServiceServicer
+from concurrent.futures import ThreadPoolExecutor
+
+
+def to_api_state(state):
+  if state == 'STOPPED':
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329367)
Time Spent: 4h  (was: 3h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam



 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=329365=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329365
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 16/Oct/19 20:25
Start Date: 16/Oct/19 20:25
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9741: [BEAM-7926] Visualize 
PCollection
URL: https://github.com/apache/beam/pull/9741#issuecomment-542876876
 
 
   retest this please.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329365)
Time Spent: 2h  (was: 1h 50m)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8367) Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS



 [ 
https://issues.apache.org/jira/browse/BEAM-8367?focusedWorklogId=329357=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329357
 ]

ASF GitHub Bot logged work on BEAM-8367:


Author: ASF GitHub Bot
Created on: 16/Oct/19 19:58
Start Date: 16/Oct/19 19:58
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9797: [BEAM-8367] 
Using insertId for BQ streaming inserts
URL: https://github.com/apache/beam/pull/9797
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329357)
Time Spent: 2h  (was: 1h 50m)

> Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS
> -
>
> Key: BEAM-8367
> URL: https://issues.apache.org/jira/browse/BEAM-8367
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.17.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Unique IDs ensure (best effort) that writes to BigQuery are idempotent, for 
> example, we don't write the same record twice in a VM failure.
>  
> Currently Python BQ sink insert BQ IDs here but they'll be re-generated in a 
> VM failure resulting in data duplication.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L766]
>  
> Correct fix is to do a Reshuffle to checkpoint unique IDs once they are 
> generated, similar to how Java BQ sink operates.
> [https://github.com/apache/beam/blob/dcf6ad301069e4d2cfaec5db6b178acb7bb67f49/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java#L225]
>  
> Pablo, can you do an initial assessment here ?
> I think this is a relatively small fix but I might be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4



 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=329340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329340
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 16/Oct/19 19:02
Start Date: 16/Oct/19 19:02
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-542845583
 
 
   Filed https://issues.apache.org/jira/browse/BEAM-8414 to reenable missing 
checks, I don't think we need another issue.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329340)
Time Spent: 8h 50m  (was: 8h 40m)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8414) Cleanup Python codebase to enable some of the excluded Python lint checks.



[ 
https://issues.apache.org/jira/browse/BEAM-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953112#comment-16953112
 ] 

Valentyn Tymofieiev commented on BEAM-8414:
---

Additionally,

we can also re-enable unnecessary-comprehension check. In most cases the 
cleanup requires changing [x for x in some_iterable] into list(some_iterable).

> Cleanup Python  codebase to enable some of the excluded Python lint checks.
> ---
>
> Key: BEAM-8414
> URL: https://issues.apache.org/jira/browse/BEAM-8414
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Minor
>  Labels: beginner, easy, easy-fix, easyfix, newbie, starter
>
> https://github.com/apache/beam/pull/9725 upgraded lint checker, however  Beam 
> codebase is not fully compliant with some of the checks new linter supports, 
> so we excluded such checks. We would like to have some checks permanently 
> excluded (see discussion on the PR), however we would like to re-enable the 
> following checks:
> consider-using-set-comprehension
> chained-comparison
> consider-using-sys-exit
> To reenable these checks, we should:
> 1) remove them from disabled checks in .pylintrc [1] 
> https://github.com/apache/beam/blob/master/sdks/python/.pylintrc and 
> 2) cleanup the codebase to make it compliant.
> [1] 
> https://github.com/apache/beam/blob/3330069291d8168c56c77acfef84c2566af05ec6/sdks/python/.pylintrc#L81



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8414) Cleanup Python codebase to enable some of the excluded Python lint checks.



 [ 
https://issues.apache.org/jira/browse/BEAM-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8414:
--
Status: Open  (was: Triage Needed)

> Cleanup Python  codebase to enable some of the excluded Python lint checks.
> ---
>
> Key: BEAM-8414
> URL: https://issues.apache.org/jira/browse/BEAM-8414
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Minor
>  Labels: beginner, easy, easy-fix, easyfix, newbie, starter
>
> https://github.com/apache/beam/pull/9725 upgraded lint checker, however  Beam 
> codebase is not fully compliant with some of the checks new linter supports, 
> so we excluded such checks. We would like to have some checks permanently 
> excluded (see discussion on the PR), however we would like to re-enable the 
> following checks:
> consider-using-set-comprehension
> chained-comparison
> consider-using-sys-exit
> To reenable these checks, we should:
> 1) remove them from disabled checks in .pylintrc [1] 
> https://github.com/apache/beam/blob/master/sdks/python/.pylintrc and 
> 2) cleanup the codebase to make it compliant.
> [1] 
> https://github.com/apache/beam/blob/3330069291d8168c56c77acfef84c2566af05ec6/sdks/python/.pylintrc#L81



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8414) Cleanup Python codebase to enable some of the excluded Python lint checks.

Valentyn Tymofieiev created BEAM-8414:
-

 Summary: Cleanup Python  codebase to enable some of the excluded 
Python lint checks.
 Key: BEAM-8414
 URL: https://issues.apache.org/jira/browse/BEAM-8414
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Valentyn Tymofieiev


https://github.com/apache/beam/pull/9725 upgraded lint checker, however  Beam 
codebase is not fully compliant with some of the checks new linter supports, so 
we excluded such checks. We would like to have some checks permanently excluded 
(see discussion on the PR), however we would like to re-enable the following 
checks:

consider-using-set-comprehension
chained-comparison
consider-using-sys-exit

To reenable these checks, we should:
1) remove them from disabled checks in .pylintrc [1] 
https://github.com/apache/beam/blob/master/sdks/python/.pylintrc and 
2) cleanup the codebase to make it compliant.

[1] 
https://github.com/apache/beam/blob/3330069291d8168c56c77acfef84c2566af05ec6/sdks/python/.pylintrc#L81




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam



 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=329333=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329333
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 16/Oct/19 18:45
Start Date: 16/Oct/19 18:45
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9741: [BEAM-7926] Visualize 
PCollection
URL: https://github.com/apache/beam/pull/9741#issuecomment-542839019
 
 
   > Any updates @rohdesamuel @KevinGG ?
   
   Just trying to get unblocked by tinkering the setup.py.
   Thanks Valentyn for helping me investigating the issue. We filed a bug: 
BEAM-8397
   There is something weird going on in the code base that some times lead tox 
suite or nose test into stack overflow.
   And a workaround by lowering upper-bound of ipython in the setup seems to 
help.
   
   Once the pre-commit passes, I'll notify Sam to do a first iteration and ping 
you after. Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329333)
Time Spent: 1h 50m  (was: 1h 40m)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-4132) Element type inference doesn't work for multi-output DoFns



 [ 
https://issues.apache.org/jira/browse/BEAM-4132?focusedWorklogId=329321=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329321
 ]

ASF GitHub Bot logged work on BEAM-4132:


Author: ASF GitHub Bot
Created on: 16/Oct/19 18:29
Start Date: 16/Oct/19 18:29
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9810: [BEAM-4132] Support 
multi-output type inference
URL: https://github.com/apache/beam/pull/9810#issuecomment-542832764
 
 
   R: @robertwb, @aaltay
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329321)
Time Spent: 20m  (was: 10m)

> Element type inference doesn't work for multi-output DoFns
> --
>
> Key: BEAM-4132
> URL: https://issues.apache.org/jira/browse/BEAM-4132
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.4.0
>Reporter: Chuan Yu Foo
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> TLDR: if you have a multi-output DoFn, then the non-main PCollections with 
> incorrectly have their element types set to None. This affects type checking 
> for pipelines involving these PCollections.
> Minimal example:
> {code}
> import apache_beam as beam
> class TripleDoFn(beam.DoFn):
>   def process(self, elem):
> yield_elem
> if elem % 2 == 0:
>   yield beam.pvalue.TaggedOutput('ten_times', elem * 10)
> if elem % 3 == 0:
>   yield beam.pvalue.TaggedOutput('hundred_times', elem * 100)
>   
> @beam.typehints.with_input_types(int)
> @beam.typehints.with_output_types(int)
> class MultiplyBy(beam.DoFn):
>   def __init__(self, multiplier):
> self._multiplier = multiplier
>   def process(self, elem):
> return elem * self._multiplier
>   
> def main():
>   with beam.Pipeline() as p:
> x, a, b = (
>   p
>   | 'Create' >> beam.Create([1, 2, 3])
>   | 'TripleDo' >> beam.ParDo(TripleDoFn()).with_outputs(
> 'ten_times', 'hundred_times', main='main_output'))
> _ = a | 'MultiplyBy2' >> beam.ParDo(MultiplyBy(2))
> if __name__ == '__main__':
>   main()
> {code}
> Running this yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 
> 'MultiplyBy2': requires  but got None for elem
> {noformat}
> Replacing {{a}} with {{b}} yields the same error. Replacing {{a}} with {{x}} 
> instead yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 
> 'MultiplyBy2': requires  but got Union[TaggedOutput, int] for elem
> {noformat}
> I would expect Beam to correctly infer that {{a}} and {{b}} have element 
> types of {{int}} rather than {{None}}, and I would also expect Beam to 
> correctly figure out that the element types of {{x}} are compatible with 
> {{int}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-4132) Element type inference doesn't work for multi-output DoFns



 [ 
https://issues.apache.org/jira/browse/BEAM-4132?focusedWorklogId=329318=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329318
 ]

ASF GitHub Bot logged work on BEAM-4132:


Author: ASF GitHub Bot
Created on: 16/Oct/19 18:27
Start Date: 16/Oct/19 18:27
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9810: [BEAM-4132] 
Support multi-output type inference
URL: https://github.com/apache/beam/pull/9810
 
 
   For tagged multi-output results, the contained PCollections' `element_type` 
members are left as `None`. This fix recursively runs type inference on these 
PCollections.
   See bug for examples.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build

[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4



 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=329319=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329319
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 16/Oct/19 18:27
Start Date: 16/Oct/19 18:27
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-542831863
 
 
   Thanks, @chadrik !
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329319)
Time Spent: 8h 40m  (was: 8.5h)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4



 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=329317=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329317
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 16/Oct/19 18:26
Start Date: 16/Oct/19 18:26
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9725: [BEAM-8350] 
Upgrade to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329317)
Time Spent: 8.5h  (was: 8h 20m)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=329316=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329316
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 16/Oct/19 18:26
Start Date: 16/Oct/19 18:26
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r335639621
 
 

 ##
 File path: sdks/python/apache_beam/testing/interactive_stream.py
 ##
 @@ -0,0 +1,123 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import grpc
+import time
+
+from apache_beam.portability.api import beam_interactive_api_pb2
+from apache_beam.portability.api import beam_interactive_api_pb2_grpc
+from apache_beam.portability.api.beam_interactive_api_pb2_grpc import 
InteractiveServiceServicer
+from concurrent.futures import ThreadPoolExecutor
+
+
+def to_api_state(state):
+  if state == 'STOPPED':
+return beam_interactive_api_pb2.StatusResponse.STOPPED
+  if state == 'PAUSED':
+return beam_interactive_api_pb2.StatusResponse.PAUSED
+  return beam_interactive_api_pb2.StatusResponse.RUNNING
+
+class InteractiveStreamController(InteractiveServiceServicer):
+  def __init__(self, endpoint, streaming_cache):
+self._endpoint = endpoint
+self._server = grpc.server(ThreadPoolExecutor(max_workers=10))
+beam_interactive_api_pb2_grpc.add_InteractiveServiceServicer_to_server(
+self, self._server)
+self._server.add_insecure_port(self._endpoint)
+self._server.start()
+
+self._streaming_cache = streaming_cache
+self._state = 'STOPPED'
+self._playback_speed = 1.0
+
+  def Start(self, request, context):
+"""Requests that the Service starts emitting elements.
+"""
+
+self._next_state('RUNNING')
+self._playback_speed = request.playback_speed or 1.0
+self._playback_speed = max(min(self._playback_speed, 100.0), 0.001)
+return beam_interactive_api_pb2.StartResponse()
+
+  def Stop(self, request, context):
+"""Requests that the Service stop emitting elements.
+"""
+self._next_state('STOPPED')
+return beam_interactive_api_pb2.StartResponse()
+
+  def Pause(self, request, context):
+"""Requests that the Service pause emitting elements.
+"""
+self._next_state('PAUSED')
+return beam_interactive_api_pb2.PauseResponse()
+
+  def Step(self, request, context):
+"""Requests that the Service emit a single element from each cached source.
+"""
+self._next_state('STEP')
+return beam_interactive_api_pb2.StepResponse()
+
+  def Status(self, request, context):
+"""Returns the status of the service.
+"""
+resp = beam_interactive_api_pb2.StatusResponse()
+resp.stream_time.GetCurrentTime()
+resp.state = to_api_state(self._state)
+return resp
+
+  def _reset_state(self):
+self._reader = None
+self._playback_speed = 1.0
+self._state = 'STOPPED'
+
+  def _next_state(self, state):
+if not self._state or self._state == 'STOPPED':
 
 Review comment:
   deleted "not self._state"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329316)
Time Spent: 3h 50m  (was: 3h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
>

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=329315=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329315
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 16/Oct/19 18:26
Start Date: 16/Oct/19 18:26
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r335639367
 
 

 ##
 File path: sdks/python/apache_beam/testing/interactive_stream.py
 ##
 @@ -0,0 +1,123 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import grpc
+import time
+
+from apache_beam.portability.api import beam_interactive_api_pb2
+from apache_beam.portability.api import beam_interactive_api_pb2_grpc
+from apache_beam.portability.api.beam_interactive_api_pb2_grpc import 
InteractiveServiceServicer
+from concurrent.futures import ThreadPoolExecutor
+
+
+def to_api_state(state):
+  if state == 'STOPPED':
+return beam_interactive_api_pb2.StatusResponse.STOPPED
+  if state == 'PAUSED':
+return beam_interactive_api_pb2.StatusResponse.PAUSED
+  return beam_interactive_api_pb2.StatusResponse.RUNNING
+
+class InteractiveStreamController(InteractiveServiceServicer):
+  def __init__(self, endpoint, streaming_cache):
+self._endpoint = endpoint
+self._server = grpc.server(ThreadPoolExecutor(max_workers=10))
 
 Review comment:
   Probably not. I dropped it down to 2. One thread to handle the TestStream 
persistent connection and another to handle any user requests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329315)
Time Spent: 3h 40m  (was: 3.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4



 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=329314=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329314
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 16/Oct/19 18:25
Start Date: 16/Oct/19 18:25
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-542831016
 
 
   Ready to go!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329314)
Time Spent: 8h 20m  (was: 8h 10m)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=329313=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329313
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 16/Oct/19 18:24
Start Date: 16/Oct/19 18:24
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r335638458
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -0,0 +1,123 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamRecord
+from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.utils import timestamp
+
+from google.protobuf import timestamp_pb2
+
+
+def to_timestamp(timestamp_secs):
+  """Converts seconds since epoch to an apache_beam.util.Timestamp.
+  """
+  return timestamp.Timestamp.of(timestamp_secs)
+
+def from_timestamp_proto(timestamp_proto):
+  return timestamp.Timestamp(seconds=timestamp_proto.seconds,
+ micros=timestamp_proto.nanos * 1000)
+
+def to_timestamp_usecs(ts):
+  """Converts a google.protobuf.Timestamp and
+ apache_beam.util.timestamp.Timestamp to seconds since epoch.
+  """
+  if isinstance(ts, timestamp_pb2.Timestamp):
+return (ts.seconds * 10**6) + (ts.nanos * 10**-3)
+  if isinstance(ts, timestamp.Timestamp):
+return ts.micros
+
+class StreamingCache(object):
+  """Abstraction that holds the logic for reading and writing to cache.
+  """
+  def __init__(self, readers):
+self._readers = readers
+
+  class Reader(object):
+"""Abstraction that reads from PCollection readers.
+
+This class is an Abstraction layer over multiple PCollection readers to be
+used for supplying the Interactive Service with TestStream events.
+
+This class is also responsible for holding the state of the clock, 
injecting
+clock advancement events, and watermark advancement events.
+"""
+def __init__(self, readers):
+  self._readers = [reader.read() for reader in readers]
+  self._watermark = timestamp.MIN_TIMESTAMP
+  self._timestamp = timestamp.MIN_TIMESTAMP
+
+def read(self):
+  """Reads records from PCollection readers.
+  """
+  records = []
+  for r in self._readers:
+try:
+  record = InteractiveStreamRecord()
+  record.ParseFromString(next(r))
+  records.append(record)
+except StopIteration:
+  pass
+
+  events = []
+  if not records:
+self.advance_watermark(timestamp.MAX_TIMESTAMP, events)
+
+  records.sort(key=lambda x: x.processing_time)
+  for r in records:
+self.advance_processing_time(
+from_timestamp_proto(r.processing_time), events)
+self.advance_watermark(from_timestamp_proto(r.watermark), events)
+
+events.append(TestStreamPayload.Event(
+element_event=TestStreamPayload.Event.AddElements(
+elements=[r.element])))
+  return events
+
+def advance_processing_time(self, processing_time, events):
+  """Advances the internal clock state and injects an AdvanceProcessingTime
+ event.
+  """
+  if self._timestamp != processing_time:
+duration = timestamp.Duration(
+micros=processing_time.micros - self._timestamp.micros)
+if self._timestamp == timestamp.MIN_TIMESTAMP:
+  duration = timestamp.Duration(micros=processing_time.micros)
 
 Review comment:
   Good catch, I simplified the implementation to assume the clock and 
self._timestamp starts at t=0.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking

[jira] [Assigned] (BEAM-4132) Element type inference doesn't work for multi-output DoFns

2019-10-16 Thread Udi Meiri (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri reassigned BEAM-4132:
---

Assignee: Udi Meiri

> Element type inference doesn't work for multi-output DoFns
> --
>
> Key: BEAM-4132
> URL: https://issues.apache.org/jira/browse/BEAM-4132
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.4.0
>Reporter: Chuan Yu Foo
>Assignee: Udi Meiri
>Priority: Major
>
> TLDR: if you have a multi-output DoFn, then the non-main PCollections with 
> incorrectly have their element types set to None. This affects type checking 
> for pipelines involving these PCollections.
> Minimal example:
> {code}
> import apache_beam as beam
> class TripleDoFn(beam.DoFn):
>   def process(self, elem):
> yield_elem
> if elem % 2 == 0:
>   yield beam.pvalue.TaggedOutput('ten_times', elem * 10)
> if elem % 3 == 0:
>   yield beam.pvalue.TaggedOutput('hundred_times', elem * 100)
>   
> @beam.typehints.with_input_types(int)
> @beam.typehints.with_output_types(int)
> class MultiplyBy(beam.DoFn):
>   def __init__(self, multiplier):
> self._multiplier = multiplier
>   def process(self, elem):
> return elem * self._multiplier
>   
> def main():
>   with beam.Pipeline() as p:
> x, a, b = (
>   p
>   | 'Create' >> beam.Create([1, 2, 3])
>   | 'TripleDo' >> beam.ParDo(TripleDoFn()).with_outputs(
> 'ten_times', 'hundred_times', main='main_output'))
> _ = a | 'MultiplyBy2' >> beam.ParDo(MultiplyBy(2))
> if __name__ == '__main__':
>   main()
> {code}
> Running this yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 
> 'MultiplyBy2': requires  but got None for elem
> {noformat}
> Replacing {{a}} with {{b}} yields the same error. Replacing {{a}} with {{x}} 
> instead yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 
> 'MultiplyBy2': requires  but got Union[TaggedOutput, int] for elem
> {noformat}
> I would expect Beam to correctly infer that {{a}} and {{b}} have element 
> types of {{int}} rather than {{None}}, and I would also expect Beam to 
> correctly figure out that the element types of {{x}} are compatible with 
> {{int}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=329304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329304
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 16/Oct/19 18:12
Start Date: 16/Oct/19 18:12
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r335632805
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -0,0 +1,123 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamRecord
+from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.utils import timestamp
+
+from google.protobuf import timestamp_pb2
+
+
+def to_timestamp(timestamp_secs):
+  """Converts seconds since epoch to an apache_beam.util.Timestamp.
+  """
+  return timestamp.Timestamp.of(timestamp_secs)
+
+def from_timestamp_proto(timestamp_proto):
+  return timestamp.Timestamp(seconds=timestamp_proto.seconds,
+ micros=timestamp_proto.nanos * 1000)
+
+def to_timestamp_usecs(ts):
+  """Converts a google.protobuf.Timestamp and
+ apache_beam.util.timestamp.Timestamp to seconds since epoch.
+  """
+  if isinstance(ts, timestamp_pb2.Timestamp):
+return (ts.seconds * 10**6) + (ts.nanos * 10**-3)
+  if isinstance(ts, timestamp.Timestamp):
+return ts.micros
+
+class StreamingCache(object):
+  """Abstraction that holds the logic for reading and writing to cache.
+  """
+  def __init__(self, readers):
+self._readers = readers
+
+  class Reader(object):
+"""Abstraction that reads from PCollection readers.
+
+This class is an Abstraction layer over multiple PCollection readers to be
+used for supplying the Interactive Service with TestStream events.
+
+This class is also responsible for holding the state of the clock, 
injecting
+clock advancement events, and watermark advancement events.
+"""
+def __init__(self, readers):
+  self._readers = [reader.read() for reader in readers]
+  self._watermark = timestamp.MIN_TIMESTAMP
 
 Review comment:
   Good point, I forgot to update this to be per PCollection tag.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329304)
Time Spent: 3h 20m  (was: 3h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8367) Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS



 [ 
https://issues.apache.org/jira/browse/BEAM-8367?focusedWorklogId=329303=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329303
 ]

ASF GitHub Bot logged work on BEAM-8367:


Author: ASF GitHub Bot
Created on: 16/Oct/19 18:10
Start Date: 16/Oct/19 18:10
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9797: [BEAM-8367] Using 
insertId for BQ streaming inserts
URL: https://github.com/apache/beam/pull/9797#issuecomment-542825297
 
 
   Run Python2_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329303)
Time Spent: 1h 50m  (was: 1h 40m)

> Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS
> -
>
> Key: BEAM-8367
> URL: https://issues.apache.org/jira/browse/BEAM-8367
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.17.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Unique IDs ensure (best effort) that writes to BigQuery are idempotent, for 
> example, we don't write the same record twice in a VM failure.
>  
> Currently Python BQ sink insert BQ IDs here but they'll be re-generated in a 
> VM failure resulting in data duplication.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L766]
>  
> Correct fix is to do a Reshuffle to checkpoint unique IDs once they are 
> generated, similar to how Java BQ sink operates.
> [https://github.com/apache/beam/blob/dcf6ad301069e4d2cfaec5db6b178acb7bb67f49/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java#L225]
>  
> Pablo, can you do an initial assessment here ?
> I think this is a relatively small fix but I might be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8413) test_streaming_pipeline_returns_expected_user_metrics_fnapi_it failed on latest PostCommit Py36

2019-10-16 Thread Boyuan Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953061#comment-16953061
 ] 

Boyuan Zhang commented on BEAM-8413:


There are more test failures:
https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/4807/
https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/4808/
https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/4810/
https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/4811/
https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/4814/
https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/4815/
https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/4818/


> test_streaming_pipeline_returns_expected_user_metrics_fnapi_it  failed on 
> latest PostCommit Py36 
> -
>
> Key: BEAM-8413
> URL: https://issues.apache.org/jira/browse/BEAM-8413
> Project: Beam
>  Issue Type: New Feature
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Mikhail Gryzykhin
>Priority: Major
>
> https://builds.apache.org/job/beam_PostCommit_Python36/731/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8413) test_streaming_pipeline_returns_expected_user_metrics_fnapi_it failed on latest PostCommit Py36

2019-10-16 Thread Boyuan Zhang (Jira)

Boyuan Zhang created BEAM-8413:
--

 Summary: 
test_streaming_pipeline_returns_expected_user_metrics_fnapi_it  failed on 
latest PostCommit Py36 
 Key: BEAM-8413
 URL: https://issues.apache.org/jira/browse/BEAM-8413
 Project: Beam
  Issue Type: New Feature
  Components: test-failures
Reporter: Boyuan Zhang
Assignee: Mikhail Gryzykhin


https://builds.apache.org/job/beam_PostCommit_Python36/731/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8406) TextTable support JSON format

2019-10-16 Thread Omshi Samal (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953039#comment-16953039
 ] 

Omshi Samal commented on BEAM-8406:
---

Hi, thanks for the info - that's helpful, I'll keep you updated. 

> TextTable support JSON format
> -
>
> Key: BEAM-8406
> URL: https://issues.apache.org/jira/browse/BEAM-8406
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Priority: Major
>
> Have a JSON table implementation similar to [1].
> [1]: 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/text/TextTable.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4



 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=329284=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329284
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 16/Oct/19 17:29
Start Date: 16/Oct/19 17:29
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-542808340
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329284)
Time Spent: 8h 10m  (was: 8h)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4



 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=329279=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329279
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 16/Oct/19 17:25
Start Date: 16/Oct/19 17:25
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-542806630
 
 
   Run CommunityMetrics PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329279)
Time Spent: 8h  (was: 7h 50m)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4



 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=329273=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329273
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 16/Oct/19 17:18
Start Date: 16/Oct/19 17:18
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-542803906
 
 
   LGTM, thank you.
   
   We can merge after tests pass.
   
   I can file remaining Jiras.
   
   > I think we should exclude this, because mypy is much better at determining 
structural problems, 
   and it will catch this kind of thing with far fewer false positives.
   Sounds good. We can certainly use mypy if that's the case. I would be 
conservative with excluding the rules unless:
   - We have frequent false-positives for a rule.
   - We have checked that mypy catches the errors of this kind.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329273)
Time Spent: 7h 40m  (was: 7.5h)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)



 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=329275=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329275
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 16/Oct/19 17:18
Start Date: 16/Oct/19 17:18
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #9790: 
[BEAM-7389] Show code snippet outputs as stdout
URL: https://github.com/apache/beam/pull/9790#discussion_r335605333
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter_test.py
 ##
 @@ -31,31 +31,26 @@
 
 
 def check_perennials(actual):
-  # [START perennials]
-  perennials = [
-  {'icon': '', 'name': 'Strawberry', 'duration': 'perennial'},
-  {'icon': '', 'name': 'Eggplant', 'duration': 'perennial'},
-  {'icon': '凜', 'name': 'Potato', 'duration': 'perennial'},
-  ]
-  # [END perennials]
-  assert_that(actual, equal_to(perennials))
+  expected = '''[START perennials]
+{'icon': '', 'name': 'Strawberry', 'duration': 'perennial'}
+{'icon': '', 'name': 'Eggplant', 'duration': 'perennial'}
+{'icon': '凜', 'name': 'Potato', 'duration': 'perennial'}
+[END perennials]'''.splitlines()[1:-1]
+  assert_that(actual, equal_to(expected))
 
 
 def check_valid_plants(actual):
-  # [START valid_plants]
-  valid_plants = [
-  {'icon': '', 'name': 'Strawberry', 'duration': 'perennial'},
-  {'icon': '凌', 'name': 'Carrot', 'duration': 'biennial'},
-  {'icon': '', 'name': 'Eggplant', 'duration': 'perennial'},
-  {'icon': '', 'name': 'Tomato', 'duration': 'annual'},
-  ]
-  # [END valid_plants]
-  assert_that(actual, equal_to(valid_plants))
+  expected = '''[START valid_plants]
+{'icon': '', 'name': 'Strawberry', 'duration': 'perennial'}
+{'icon': '凌', 'name': 'Carrot', 'duration': 'biennial'}
+{'icon': '', 'name': 'Eggplant', 'duration': 'perennial'}
+{'icon': '', 'name': 'Tomato', 'duration': 'annual'}
+[END valid_plants]'''.splitlines()[1:-1]
+  assert_that(actual, equal_to(expected))
 
 
 @mock.patch('apache_beam.Pipeline', TestPipeline)
-# pylint: disable=line-too-long
-@mock.patch('apache_beam.examples.snippets.transforms.elementwise.filter.print',
 lambda elem: elem)
+@mock.patch('apache_beam.examples.snippets.transforms.elementwise.filter.print',
 str)
 
 Review comment:
   Alright, I can make that
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329275)
Time Spent: 69h 20m  (was: 69h 10m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 69h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4



 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=329274=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329274
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 16/Oct/19 17:18
Start Date: 16/Oct/19 17:18
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-542803906
 
 
   LGTM, thank you.
   
   We can merge after tests pass.
   
   I can file remaining Jiras.
   
   > I think we should exclude this, because mypy is much better at determining 
structural problems, 
   and it will catch this kind of thing with far fewer false positives.
   
   Sounds good. We can certainly use mypy if that's the case. I would be 
conservative with excluding the rules unless:
   - We have frequent false-positives for a rule.
   - We have checked that mypy catches the errors of this kind.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329274)
Time Spent: 7h 50m  (was: 7h 40m)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-4132) Element type inference doesn't work for multi-output DoFns

2019-10-16 Thread Udi Meiri (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953019#comment-16953019
 ] 

Udi Meiri commented on BEAM-4132:
-

Similarly for Partitions:
{code}
  def test_partition(self):
p = TestPipeline()
even, odd = (p
 | beam.Create([1, 2, 3])
 | 'even_odd' >> beam.Partition(lambda e, _: e % 2, 2))
# Test that the element type of even and odd is int.
res_even = (even
| 'id_even' >> beam.ParDo(lambda e: [e]).with_input_types(int))
res_odd = (odd
   | 'id_odd' >> beam.ParDo(lambda e: [e]).with_input_types(int))
assert_that(res_even, equal_to([2]), label='even_check')
assert_that(res_odd, equal_to([1, 3]), label='odd_check')
p.run()
{code}
Gives the error:
{code}
E   apache_beam.typehints.decorators.TypeCheckError: Type hint 
violation for 'id_even': requires  but got None for e
{code}


> Element type inference doesn't work for multi-output DoFns
> --
>
> Key: BEAM-4132
> URL: https://issues.apache.org/jira/browse/BEAM-4132
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.4.0
>Reporter: Chuan Yu Foo
>Priority: Major
>
> TLDR: if you have a multi-output DoFn, then the non-main PCollections with 
> incorrectly have their element types set to None. This affects type checking 
> for pipelines involving these PCollections.
> Minimal example:
> {code}
> import apache_beam as beam
> class TripleDoFn(beam.DoFn):
>   def process(self, elem):
> yield_elem
> if elem % 2 == 0:
>   yield beam.pvalue.TaggedOutput('ten_times', elem * 10)
> if elem % 3 == 0:
>   yield beam.pvalue.TaggedOutput('hundred_times', elem * 100)
>   
> @beam.typehints.with_input_types(int)
> @beam.typehints.with_output_types(int)
> class MultiplyBy(beam.DoFn):
>   def __init__(self, multiplier):
> self._multiplier = multiplier
>   def process(self, elem):
> return elem * self._multiplier
>   
> def main():
>   with beam.Pipeline() as p:
> x, a, b = (
>   p
>   | 'Create' >> beam.Create([1, 2, 3])
>   | 'TripleDo' >> beam.ParDo(TripleDoFn()).with_outputs(
> 'ten_times', 'hundred_times', main='main_output'))
> _ = a | 'MultiplyBy2' >> beam.ParDo(MultiplyBy(2))
> if __name__ == '__main__':
>   main()
> {code}
> Running this yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 
> 'MultiplyBy2': requires  but got None for elem
> {noformat}
> Replacing {{a}} with {{b}} yields the same error. Replacing {{a}} with {{x}} 
> instead yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 
> 'MultiplyBy2': requires  but got Union[TaggedOutput, int] for elem
> {noformat}
> I would expect Beam to correctly infer that {{a}} and {{b}} have element 
> types of {{int}} rather than {{None}}, and I would also expect Beam to 
> correctly figure out that the element types of {{x}} are compatible with 
> {{int}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8367) Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS



 [ 
https://issues.apache.org/jira/browse/BEAM-8367?focusedWorklogId=329268=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329268
 ]

ASF GitHub Bot logged work on BEAM-8367:


Author: ASF GitHub Bot
Created on: 16/Oct/19 17:11
Start Date: 16/Oct/19 17:11
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9797: [BEAM-8367] Using 
insertId for BQ streaming inserts
URL: https://github.com/apache/beam/pull/9797#issuecomment-542801037
 
 
   LGTM. Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329268)
Time Spent: 1h 40m  (was: 1.5h)

> Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS
> -
>
> Key: BEAM-8367
> URL: https://issues.apache.org/jira/browse/BEAM-8367
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.17.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Unique IDs ensure (best effort) that writes to BigQuery are idempotent, for 
> example, we don't write the same record twice in a VM failure.
>  
> Currently Python BQ sink insert BQ IDs here but they'll be re-generated in a 
> VM failure resulting in data duplication.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L766]
>  
> Correct fix is to do a Reshuffle to checkpoint unique IDs once they are 
> generated, similar to how Java BQ sink operates.
> [https://github.com/apache/beam/blob/dcf6ad301069e4d2cfaec5db6b178acb7bb67f49/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java#L225]
>  
> Pablo, can you do an initial assessment here ?
> I think this is a relatively small fix but I might be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8385) Add an option to run KafkaIOIT with 10GB dataset



 [ 
https://issues.apache.org/jira/browse/BEAM-8385?focusedWorklogId=329266=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329266
 ]

ASF GitHub Bot logged work on BEAM-8385:


Author: ASF GitHub Bot
Created on: 16/Oct/19 17:07
Start Date: 16/Oct/19 17:07
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9770: [BEAM-8385] 
Add a KafkaIOIT hash for 100M records
URL: https://github.com/apache/beam/pull/9770
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329266)
Time Spent: 1h 10m  (was: 1h)

> Add an option to run KafkaIOIT with 10GB dataset
> 
>
> Key: BEAM-8385
> URL: https://issues.apache.org/jira/browse/BEAM-8385
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)



 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=329258=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329258
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 16/Oct/19 16:59
Start Date: 16/Oct/19 16:59
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9790: [BEAM-7389] 
Show code snippet outputs as stdout
URL: https://github.com/apache/beam/pull/9790#discussion_r335595722
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter_test.py
 ##
 @@ -31,31 +31,26 @@
 
 
 def check_perennials(actual):
-  # [START perennials]
-  perennials = [
-  {'icon': '', 'name': 'Strawberry', 'duration': 'perennial'},
-  {'icon': '', 'name': 'Eggplant', 'duration': 'perennial'},
-  {'icon': '凜', 'name': 'Potato', 'duration': 'perennial'},
-  ]
-  # [END perennials]
-  assert_that(actual, equal_to(perennials))
+  expected = '''[START perennials]
+{'icon': '', 'name': 'Strawberry', 'duration': 'perennial'}
+{'icon': '', 'name': 'Eggplant', 'duration': 'perennial'}
+{'icon': '凜', 'name': 'Potato', 'duration': 'perennial'}
+[END perennials]'''.splitlines()[1:-1]
 
 Review comment:
   This make sense.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329258)
Time Spent: 69h 10m  (was: 69h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 69h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)



 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=329257=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329257
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 16/Oct/19 16:59
Start Date: 16/Oct/19 16:59
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9790: [BEAM-7389] 
Show code snippet outputs as stdout
URL: https://github.com/apache/beam/pull/9790#discussion_r335595662
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter_test.py
 ##
 @@ -31,31 +31,26 @@
 
 
 def check_perennials(actual):
-  # [START perennials]
-  perennials = [
-  {'icon': '', 'name': 'Strawberry', 'duration': 'perennial'},
-  {'icon': '', 'name': 'Eggplant', 'duration': 'perennial'},
-  {'icon': '凜', 'name': 'Potato', 'duration': 'perennial'},
-  ]
-  # [END perennials]
-  assert_that(actual, equal_to(perennials))
+  expected = '''[START perennials]
+{'icon': '', 'name': 'Strawberry', 'duration': 'perennial'}
+{'icon': '', 'name': 'Eggplant', 'duration': 'perennial'}
+{'icon': '凜', 'name': 'Potato', 'duration': 'perennial'}
+[END perennials]'''.splitlines()[1:-1]
+  assert_that(actual, equal_to(expected))
 
 
 def check_valid_plants(actual):
-  # [START valid_plants]
-  valid_plants = [
-  {'icon': '', 'name': 'Strawberry', 'duration': 'perennial'},
-  {'icon': '凌', 'name': 'Carrot', 'duration': 'biennial'},
-  {'icon': '', 'name': 'Eggplant', 'duration': 'perennial'},
-  {'icon': '', 'name': 'Tomato', 'duration': 'annual'},
-  ]
-  # [END valid_plants]
-  assert_that(actual, equal_to(valid_plants))
+  expected = '''[START valid_plants]
+{'icon': '', 'name': 'Strawberry', 'duration': 'perennial'}
+{'icon': '凌', 'name': 'Carrot', 'duration': 'biennial'}
+{'icon': '', 'name': 'Eggplant', 'duration': 'perennial'}
+{'icon': '', 'name': 'Tomato', 'duration': 'annual'}
+[END valid_plants]'''.splitlines()[1:-1]
+  assert_that(actual, equal_to(expected))
 
 
 @mock.patch('apache_beam.Pipeline', TestPipeline)
-# pylint: disable=line-too-long
-@mock.patch('apache_beam.examples.snippets.transforms.elementwise.filter.print',
 lambda elem: elem)
+@mock.patch('apache_beam.examples.snippets.transforms.elementwise.filter.print',
 str)
 
 Review comment:
   I would prefer something like
   
   ```
   @mock.patch(
 'apache_beam.examples.snippets.transforms.elementwise.filter.print',
 lambda elem: elem)
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329257)
Time Spent: 69h  (was: 68h 50m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 69h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in

2019-10-16 Thread Daniel Robert (Jira)

[
https://issues.apache.org/jira/browse/BEAM-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953004#comment-16953004
]

Daniel Robert edited comment on BEAM-8347 at 10/16/19 4:58 PM:
---

Alright, I've been playing around with this a lot locally (even rewriting the
rabbitio unbounded source/reader) and I either don't understand the expected
use case or the design seems incorrect.

Notable concerns:

Splits:

In `Source.split()`, the design of the reader binds to a single queue. As such,
there's no concurrency or other advantage in parallelizing reads; it might as
well return `Collections.singletonList(this)` regardless of inputs.
Additionally, using a single reader decreases the likelihood of processing or
acking messages out of queue order.

Watermarks:

Most of the issues are here. Looking at the documentation of
{{UnboundedSource.getWatermark}}:

{quote}[watermark] can be approximate. If records are read that violate this
guarantee, they will be considered late, which will affect how they will be
processed. ...

However, this value should be _as late as possible_. Downstream windows may not
be able to close until this watermark passes their end.

For example, a source may know that the records it reads will be in timestamp
order. In this case, the watermark can be the timestamp of the last record
read. For a source that does not have natural timestamps, timestamps can be set
to the time of reading, in which case the watermark is the current clock time.
{quote}
The current implementation retains the *oldest* timestamp. Further, the
timestamp comes from delivery properties, which, if we run single threaded
consuming above, should be monotonically increasing. A switch to purely using
the most recent timestamp rather than the oldest would go a long way to
ensuring windows advance.

There is precedent for this in some of the other unbounded sources, such as the
Kafka ProcessingTimePolicy
([https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/TimestampPolicyFactory.java#L112)]

Watermarks with no new input:

In the event where there are no new messages, it would be ok to advance the
watermark as well. The approach taken in the kafka io when the stream is
'caught up':
[https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/TimestampPolicyFactory.java#L169]
.

It might be useful to set the watermark to {{NOW - 2 seconds}} in the event
there are no messages available (i.e. here:
[https://github.com/apache/beam/blob/master/sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java#L467]
) in the event 'two seconds ago' is more recent than the current watermark.

(There's an unrelated bug I've discovered as well where start()
[https://github.com/apache/beam/blob/master/sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java#L432]
creates and uses a new ConnectionHandler even though one is already created at
construction time, and thus the one constructed in start() is never properly
stopped)

—

Curious if I'm way off here, or if it'd be worth refactoring the rabbitio
reader to support these changes. Alternatively, if it would make sense to
produce a separate rabbit reader for 'single queue, processing time
watermarking' use cases.

was (Author: drobert):
Alright, I've been playing around with this a lot locally (even rewriting the
rabbitio unbounded source/reader) and I either don't understand the expected
use case or the design seems incorrect.

Notable concerns:

Splits:

In \{{Source.split()}, the design of the reader binds to a single queue. As
such, there's no concurrency or other advantage in parallelizing reads; it
might as well return `Collections.singletonList(this)` regardless of inputs.
Additionally, using a single reader decreases the likelihood of processing or
acking messages out of queue order.

Watermarks:

Most of the issues are here. Looking at the documentation of
{{UnboundedSource.getWatermark}}:

{quote}[watermark] can be approximate. If records are read that violate this
guarantee, they will be considered late, which will affect how they will be
processed. ...

However, this value should be _as late as possible_. Downstream windows may not
be able to close until this watermark passes their end.

[jira] [Commented] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in

2019-10-16 Thread Daniel Robert (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953004#comment-16953004
 ] 

Daniel Robert commented on BEAM-8347:
-

Alright, I've been playing around with this a lot locally (even rewriting the 
rabbitio unbounded source/reader) and I either don't understand the expected 
use case or the design seems incorrect.

 

Notable concerns:

Splits:

In \{{Source.split()}, the design of the reader binds to a single queue. As 
such, there's no concurrency or other advantage in parallelizing reads; it 
might as well return `Collections.singletonList(this)` regardless of inputs. 
Additionally, using a single reader decreases the likelihood of processing or 
acking messages out of queue order.

Watermarks:

Most of the issues are here. Looking at the documentation of 
{{UnboundedSource.getWatermark}}:

 
{quote}[watermark] can be approximate. If records are read that violate this 
guarantee, they will be considered late, which will affect how they will be 
processed. ...

However, this value should be _as late as possible_. Downstream windows may not 
be able to close until this watermark passes their end.

For example, a source may know that the records it reads will be in timestamp 
order. In this case, the watermark can be the timestamp of the last record 
read. For a source that does not have natural timestamps, timestamps can be set 
to the time of reading, in which case the watermark is the current clock time.
{quote}
The current implementation retains the *oldest* timestamp. Further, the 
timestamp comes from delivery properties, which, if we run single threaded 
consuming above, should be monotonically increasing. A switch to purely using 
the most recent timestamp rather than the oldest would go a long way to 
ensuring windows advance.

There is precedent for this in some of the other unbounded sources, such as the 
Kafka ProcessingTimePolicy 
([https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/TimestampPolicyFactory.java#L112)]

Watermarks with no new input:

In the event where there are no new messages, it would be ok to advance the 
watermark as well. The approach taken in the kafka io when the stream is 
'caught up': 
[https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/TimestampPolicyFactory.java#L169]
 .

It might be useful to set the watermark to {{NOW - 2 seconds}} in the event 
there are no messages available (i.e. here: 
[https://github.com/apache/beam/blob/master/sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java#L467]
 ) in the event 'two seconds ago' is more recent than the current watermark.

 

(There's an unrelated bug I've discovered as well where start() 
[https://github.com/apache/beam/blob/master/sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java#L432]
 creates and uses a new ConnectionHandler even though one is already created at 
construction time, and thus the one constructed in start() is never properly 
stopped)

---

Curious if I'm way off here, or if it'd be worth refactoring the rabbitio 
reader to support these changes. Alternatively, if it would make sense to 
produce a separate rabbit reader for 'single queue, processing time 
watermarking' use cases.

> UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
> -
>
> Key: BEAM-8347
> URL: https://issues.apache.org/jira/browse/BEAM-8347
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Affects Versions: 2.15.0
> Environment: testing has been done using the DirectRunner. I also 
> have DataflowRunner available
>Reporter: Daniel Robert
>Priority: Major
>
> I stumbled upon this and then saw a similar StackOverflow post: 
> [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance]
> When calling `advance()` if there are no messages, no state changes, 
> including no changes to the CheckpointMark or Watermark.  If there is a 
> relatively constant rate of new messages coming in, this is not a problem. If 
> data is bursty, and there are periods of no new messages coming in, the 
> watermark will never advance.
> Contrast this with some of the logic in PubsubIO which will make provisions 
> for periods of inactivity to advance the watermark (although it, too, is 
> imperfect: https://issues.apache.org/jira/browse/BEAM-7322 )
> The example given in the StackOverflow post is something like this:
>  
> {code:java}
> pipeline
>   .apply(RabbitMqIO.read()
>   .withUri("amqp://guest:guest@localhost:5672")
>   .withQueue("test")
>   .apply("Windowing", 
> Window.into(
>

[jira] [Commented] (BEAM-8409) docker-credential-gcloud not installed or not available in PATH

2019-10-16 Thread Yifan Zou (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953003#comment-16953003
 ] 

Yifan Zou commented on BEAM-8409:
-

The docker-credential-gcloud was installed in 
https://issues.apache.org/jira/browse/BEAM-7381

I guess we are hitting the https://issues.apache.org/jira/browse/BEAM-7405 
again.

> docker-credential-gcloud not installed or not available in PATH
> ---
>
> Key: BEAM-8409
> URL: https://issues.apache.org/jira/browse/BEAM-8409
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kamil Wasilewski
>Assignee: Yifan Zou
>Priority: Major
>  Labels: currently-failing
>
> _Use this form to file an issue for test failure:_
>  * 
> [beam_PreCommit_CommunityMetrics_Commit|https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PreCommit_CommunityMetrics_Commit/1355/]
>  * 
> [beam_PostCommit_Python2_PR|https://builds.apache.org/job/beam_PostCommit_Python2_PR]
> Initial investigation:
> Jenkins job fails when executing docker-compose script.
> It seems the only Jenkins worker affected is *apache-beam-jenkins-15.*
>  
> Relevant logs:
> 1)
>  
> {code:java}
> 11:56:24 Execution failed for task ':beam-test-infra-metrics:composeUp'.
> 11:56:24 > Exit-code 255 when calling docker-compose, stdout: postgresql uses 
> an image, skipping
> 11:56:24   prometheus uses an image, skipping
> 11:56:24   pushgateway uses an image, skipping
> 11:56:24   alertmanager uses an image, skipping
> 11:56:24   Building grafana
> 11:56:24   [17038] Failed to execute script docker-compose
> 11:56:24   Traceback (most recent call last):
> 11:56:24 File "bin/docker-compose", line 6, in 
> 11:56:24 File "compose/cli/main.py", line 71, in main
> 11:56:24 File "compose/cli/main.py", line 127, in perform_command
> 11:56:24 File "compose/cli/main.py", line 287, in build
> 11:56:24 File "compose/project.py", line 386, in build
> 11:56:24 File "compose/project.py", line 368, in build_service
> 11:56:24 File "compose/service.py", line 1084, in build
> 11:56:24 File "site-packages/docker/api/build.py", line 260, in build
> 11:56:24 File "site-packages/docker/api/build.py", line 307, in 
> _set_auth_headers
> 11:56:24 File "site-packages/docker/auth.py", line 310, in 
> get_all_credentials
> 11:56:24 File "site-packages/docker/auth.py", line 262, in 
> _resolve_authconfig_credstore
> 11:56:24 File "site-packages/docker/auth.py", line 287, in 
> _get_store_instance
> 11:56:24 File "site-packages/dockerpycreds/store.py", line 25, in __init__
> 11:56:24   dockerpycreds.errors.InitializationError: docker-credential-gcloud 
> not installed or not available in PATH
> {code}
> 2)
> {code:java}
> 16:26:08 [9316] Failed to execute script docker-compose
> 16:26:08 Traceback (most recent call last):
> 16:26:08   File "bin/docker-compose", line 6, in 
> 16:26:08   File "compose/cli/main.py", line 71, in main
> 16:26:08   File "compose/cli/main.py", line 127, in perform_command
> 16:26:08   File "compose/cli/main.py", line 287, in build
> 16:26:08   File "compose/project.py", line 386, in build
> 16:26:08   File "compose/project.py", line 368, in build_service
> 16:26:08   File "compose/service.py", line 1084, in build
> 16:26:08   File "site-packages/docker/api/build.py", line 260, in build
> 16:26:08   File "site-packages/docker/api/build.py", line 307, in 
> _set_auth_headers
> 16:26:08   File "site-packages/docker/auth.py", line 310, in 
> get_all_credentials
> 16:26:08   File "site-packages/docker/auth.py", line 262, in 
> _resolve_authconfig_credstore
> 16:26:08   File "site-packages/docker/auth.py", line 287, in 
> _get_store_instance
> 16:26:08   File "site-packages/dockerpycreds/store.py", line 25, in __init__
> 16:26:08 dockerpycreds.errors.InitializationError: docker-credential-gcloud 
> not installed or not available in PATH
> {code}
>  **
>  
> 
> _After you've filled out the above details, pl__ease [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8355) Make BooleanCoder a standard coder



 [ 
https://issues.apache.org/jira/browse/BEAM-8355?focusedWorklogId=329255=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329255
 ]

ASF GitHub Bot logged work on BEAM-8355:


Author: ASF GitHub Bot
Created on: 16/Oct/19 16:53
Start Date: 16/Oct/19 16:53
Worklog Time Spent: 10m 
  Work Description: lostluck commented on issue #9809: [BEAM-8355] Add 
standard bool coder to Go SDK
URL: https://github.com/apache/beam/pull/9809#issuecomment-542794290
 
 
   R: @mxm 
   R: @chadrik 
   R: @TheNeuralBit 
   R: @robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329255)
Time Spent: 3h 50m  (was: 3h 40m)

> Make BooleanCoder a standard coder
> --
>
> Key: BEAM-8355
> URL: https://issues.apache.org/jira/browse/BEAM-8355
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This involves making the current java BooleanCoder a standard coder, and 
> implementing an equivalent coder in python



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8355) Make BooleanCoder a standard coder



 [ 
https://issues.apache.org/jira/browse/BEAM-8355?focusedWorklogId=329254=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329254
 ]

ASF GitHub Bot logged work on BEAM-8355:


Author: ASF GitHub Bot
Created on: 16/Oct/19 16:51
Start Date: 16/Oct/19 16:51
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #9809: [BEAM-8355] 
Add standard bool coder to Go SDK
URL: https://github.com/apache/beam/pull/9809
 
 
   Adds encoding and decoding for the standard bool coder to the Go SDK.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build

[jira] [Work logged] (BEAM-8367) Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS



 [ 
https://issues.apache.org/jira/browse/BEAM-8367?focusedWorklogId=329253=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329253
 ]

ASF GitHub Bot logged work on BEAM-8367:


Author: ASF GitHub Bot
Created on: 16/Oct/19 16:49
Start Date: 16/Oct/19 16:49
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9797: [BEAM-8367] 
Using insertId for BQ streaming inserts
URL: https://github.com/apache/beam/pull/9797#discussion_r335590577
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_test.py
 ##
 @@ -425,8 +425,8 @@ def test_dofn_client_process_flush_called(self):
 test_client=client)
 
 fn.start_bundle()
-fn.process(('project_id:dataset_id.table_id', {'month': 1}))
-fn.process(('project_id:dataset_id.table_id', {'month': 2}))
+fn.process(('project_id:dataset_id.table_id', ({'month': 1}, 'insertid1')))
 
 Review comment:
   Added : D
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329253)
Time Spent: 1.5h  (was: 1h 20m)

> Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS
> -
>
> Key: BEAM-8367
> URL: https://issues.apache.org/jira/browse/BEAM-8367
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.17.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Unique IDs ensure (best effort) that writes to BigQuery are idempotent, for 
> example, we don't write the same record twice in a VM failure.
>  
> Currently Python BQ sink insert BQ IDs here but they'll be re-generated in a 
> VM failure resulting in data duplication.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L766]
>  
> Correct fix is to do a Reshuffle to checkpoint unique IDs once they are 
> generated, similar to how Java BQ sink operates.
> [https://github.com/apache/beam/blob/dcf6ad301069e4d2cfaec5db6b178acb7bb67f49/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java#L225]
>  
> Pablo, can you do an initial assessment here ?
> I think this is a relatively small fix but I might be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4



 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=329252=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329252
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 16/Oct/19 16:48
Start Date: 16/Oct/19 16:48
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-542792382
 
 
   Run CommunityMetrics PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329252)
Time Spent: 7.5h  (was: 7h 20m)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read



 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=329249=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329249
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 16/Oct/19 16:46
Start Date: 16/Oct/19 16:46
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #9765: [BEAM-8382] 
Add polling interval to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-542790436
 
 
   @jfarr I agree that there is no perfect solution for that. In the same time, 
I don't see too much difference between hardcoded value and configured one 
taking into account that Kinesis is unbounded source and user pipeline has to 
run for a long time without interruption. So, what's happened if we set initial 
polling interval too small or too big? Would we need to stop a pipeline, set 
new value and restart it again?

   Unfortunately, there is no guarantee that this value will be efficient and 
work fine all the way since, as you said above, other client can start consume 
from the same shard in parallel. So, in this case it would better to provide 
adaptive solution. 
   
   Even if we should expose some knobs to user, so what's about using 
`FluentBackoff` [1] instead? It can be configured with retry configuration 
object, like we do in `SolrIO` and some other IOs with `RetryConfiguration`. 
   
   Other option could be to allow to provide UDF object which will manage 
polling delay dynamically (for example, `SerializableFunction`).
   
   Wdyt?
   
   [1] 
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/FluentBackoff.java
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329249)
Time Spent: 1h 40m  (was: 1.5h)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8410) JdbcIO should support setConnectionInitSqls in its DataSource

2019-10-16 Thread Cam Mach (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cam Mach updated BEAM-8410:
---
Description: This property, connectionInitSqls, is very handy for anyone 
who use MySql and Mariadb, to set any init sql statements to be executed at 
connection time. Note: but it's not applicable across databases  (was: This 
property is very handy for anyone who use MySql and Mariadb, to set any init 
sql statements to be executed at connection time. Note: but it's not applicable 
across databases)

> JdbcIO should support setConnectionInitSqls in its DataSource
> -
>
> Key: BEAM-8410
> URL: https://issues.apache.org/jira/browse/BEAM-8410
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Cam Mach
>Assignee: Cam Mach
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This property, connectionInitSqls, is very handy for anyone who use MySql and 
> Mariadb, to set any init sql statements to be executed at connection time. 
> Note: but it's not applicable across databases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read



 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=329247=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329247
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 16/Oct/19 16:44
Start Date: 16/Oct/19 16:44
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #9765: [BEAM-8382] 
Add polling interval to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-542790436
 
 
   @jfarr I agree that there is no perfect solution for that. In the same time, 
I don't see too much difference between hardcoded value and configured one 
taking into account that Kinesis is unbounded source and user pipeline has to 
run for a long time without interruption. So, what's happened if we set initial 
polling interval too small or too big? Would we need to stop a pipeline, set 
new value and restart it again?

   Unfortunately, there is no guarantee that this value will be efficient and 
work fine all the way since, as you said above, other client can start consume 
from the same shard in parallel. So, in this case it would better to provide 
adaptive solution. 
   
   Even if we should expose some knobs to user, so what's about using 
`FluentBackoff` [1] instead? It can be configured with retry configuration 
object, like we do in `SolrIO` and some other IOs with `RetryConfiguration`. 
   
   Other option could be to allow to provide UDF object which will manage 
polling delay dynamically.
   
   Wdyt?
   
   [1] 
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/FluentBackoff.java
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329247)
Time Spent: 1.5h  (was: 1h 20m)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8410) JdbcIO should support setConnectionInitSqls in its DataSource

2019-10-16 Thread Cam Mach (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cam Mach updated BEAM-8410:
---
Description: This property is very handy for anyone who use MySql and 
Mariadb, to set any init sql statements to be executed at connection time. 
Note: but it's not applicable across databases

> JdbcIO should support setConnectionInitSqls in its DataSource
> -
>
> Key: BEAM-8410
> URL: https://issues.apache.org/jira/browse/BEAM-8410
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Cam Mach
>Assignee: Cam Mach
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This property is very handy for anyone who use MySql and Mariadb, to set any 
> init sql statements to be executed at connection time. Note: but it's not 
> applicable across databases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8410) JdbcIO should support setConnectionInitSqls in its DataSource



 [ 
https://issues.apache.org/jira/browse/BEAM-8410?focusedWorklogId=329248=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329248
 ]

ASF GitHub Bot logged work on BEAM-8410:


Author: ASF GitHub Bot
Created on: 16/Oct/19 16:44
Start Date: 16/Oct/19 16:44
Worklog Time Spent: 10m 
  Work Description: cmachgodaddy commented on pull request #9808: 
[BEAM-8410] JdbcIO should support setConnectionInitSqls in its DataSource
URL: https://github.com/apache/beam/pull/9808
 
 
   This property is very handy for anyone who use MySql and Mariadb, to set any 
init sql statements to be executed at connection time. Note: but it's not 
applicable across databases
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build

[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read



 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=329246=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329246
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 16/Oct/19 16:43
Start Date: 16/Oct/19 16:43
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #9765: [BEAM-8382] 
Add polling interval to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-542790436
 
 
   @jfarr I agree that there is no perfect solution for that. In the same time, 
I don't see too much difference between hardcoded value and configured one 
taking into account that Kinesis is unbounded source and user pipeline has to 
run for a long time without interruption. So, what's happened if we set initial 
polling interval too small or too big? Would we need to stop a pipeline, set 
new value and restart it again?

   Unfortunately, there is no guarantee that this value will be efficient and 
work fine all the way since, as you said above, other client can start consume 
from the same shard in parallel. So, in this case it would better to provide 
adaptive solution. Even if we should expose some knobs to user, so what's about 
using `FluentBackoff` [1] instead? It can be configured with retry 
configuration object, like we do in `SolrIO` and some other IOs with 
`RetryConfiguration`. 
   
   Other option could be to allow to provide UDF object which will manage 
polling delay dynamically.
   
   Wdyt?
   
   [1] 
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/FluentBackoff.java
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329246)
Time Spent: 1h 20m  (was: 1h 10m)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4



 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=329243=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329243
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 16/Oct/19 16:40
Start Date: 16/Oct/19 16:40
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-542789129
 
 
   Ok, the last set of notes has been addressed. 
   
   A note on this:
   
   > invalid-overridden-method - I think this is worth fixing, we can file a 
Jira + a make separate PR to be safe. I think AI here is to replace deprecated 
decorator @abc.abstractproperty in filesystemio.py.
   
   I think we should exclude this, because mypy is much better at determining 
structural problems, and it will catch this kind of thing with far fewer false 
positives.  In my mypy/typing branch, which is just about ready, I've had to 
turn off a few more pylint warnings along these lines, where they overlap into 
mypy's territory, but with less accuracy.
   
   Assuming you agree with that, we just need the 2 separate followup Jira 
issues:
   
   1) 
   
   > Exclude but add a newbie task 1 (may help somebody learn bits of python):
   >
   > - consider-using-set-comprehension
   > - chained-comparison
   > - consider-using-sys-exit
   
   2) 
   
   > Exclude but add a newbie task 2:
   >
   > - unnecessary-comprehension. We should mention that in some places we 
should use list (some_iter), in some just remove the comprehension. May be 
better to do this in a separate task to avoid a blind approval during review as 
some place need to be fixed differently.
   
   Would you like to make the issue 2 so that you can get the wording how you 
want it, or do you want me to make it?
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329243)
Time Spent: 7h 20m  (was: 7h 10m)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8412) Py2 post-commit test crossLanguagePortableWordCount failing

2019-10-16 Thread Ahmet Altay (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-8412:
--
Parent: BEAM-8193
Issue Type: Sub-task  (was: Bug)

> Py2 post-commit test crossLanguagePortableWordCount  failing
> 
>
> Key: BEAM-8412
> URL: https://issues.apache.org/jira/browse/BEAM-8412
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness, test-failures
>Reporter: Ahmet Altay
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>
> Error is: 13:09:02 RuntimeError: IOError: [Errno 2] No such file or 
> directory: 
> '/tmp/beam-temp-py-wordcount-portable-67ce50e0eebe11e9892842010a80009c/fe77e28c-2e44-4914-8460-76ab4dbb8579.py-wordcount-portable'
>  [while running 'write/Write/WriteImpl/WriteBundles']
> Log:  
> https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Python2/706/consoleFull
> Last passing run was 705. Between 705 and 706 the change is 
> (https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Python2/706/)
>  : https://github.com/apache/beam/pull/9785 -- Although this PR looks 
> unrelated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8412) Py2 post-commit test crossLanguagePortableWordCount failing

2019-10-16 Thread Ahmet Altay (Jira)

Ahmet Altay created BEAM-8412:
-

 Summary: Py2 post-commit test crossLanguagePortableWordCount  
failing
 Key: BEAM-8412
 URL: https://issues.apache.org/jira/browse/BEAM-8412
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-harness, test-failures
Reporter: Ahmet Altay
Assignee: Chamikara Madhusanka Jayalath


Error is: 13:09:02 RuntimeError: IOError: [Errno 2] No such file or directory: 
'/tmp/beam-temp-py-wordcount-portable-67ce50e0eebe11e9892842010a80009c/fe77e28c-2e44-4914-8460-76ab4dbb8579.py-wordcount-portable'
 [while running 'write/Write/WriteImpl/WriteBundles']

Log:  
https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Python2/706/consoleFull

Last passing run was 705. Between 705 and 706 the change is 
(https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Python2/706/)
 : https://github.com/apache/beam/pull/9785 -- Although this PR looks unrelated.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8411) Re-enable pylint checks disabled due to pylint bug

2019-10-16 Thread Chad Dombrova (Jira)

Chad Dombrova created BEAM-8411:
---

 Summary: Re-enable pylint checks disabled due to pylint bug
 Key: BEAM-8411
 URL: https://issues.apache.org/jira/browse/BEAM-8411
 Project: Beam
  Issue Type: Task
  Components: sdk-py-core
Reporter: Chad Dombrova


The pylint github issue is here:  https://github.com/PyCQA/pylint/issues/3152

Once that's fixed, remove the exclusions in setup.py and 
apache_beam/examples/complete/juliaset/setup.py




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.



 [ 
https://issues.apache.org/jira/browse/BEAM-8372?focusedWorklogId=329218=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329218
 ]

ASF GitHub Bot logged work on BEAM-8372:


Author: ASF GitHub Bot
Created on: 16/Oct/19 15:41
Start Date: 16/Oct/19 15:41
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #9803: [BEAM-8372] Follow-up to 
Flink UberJar submission.
URL: https://github.com/apache/beam/pull/9803#issuecomment-542763924
 
 
   I think there are some issues related to the ZIP tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329218)
Time Spent: 4h 10m  (was: 4h)

> Allow submission of Flink UberJar directly to flink cluster.
> 
>
> Key: BEAM-8372
> URL: https://issues.apache.org/jira/browse/BEAM-8372
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs



 [ 
https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=329217=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329217
 ]

ASF GitHub Bot logged work on BEAM-7738:


Author: ASF GitHub Bot
Created on: 16/Oct/19 15:40
Start Date: 16/Oct/19 15:40
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9268: [BEAM-7738] Add 
external transform support to PubsubIO
URL: https://github.com/apache/beam/pull/9268#issuecomment-542763520
 
 
   We are doing some final testing in production today and then I’ll hopefully
   give you the thumbs up.
   
   -chad
   
   
   
   On Wed, Oct 16, 2019 at 3:13 AM Maximilian Michels 
   wrote:
   
   > Any update here?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or unsubscribe
   > 

   > .
   >
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329217)
Time Spent: 5.5h  (was: 5h 20m)

> Support PubSubIO to be configured externally for use with other SDKs
> 
>
> Key: BEAM-7738
> URL: https://issues.apache.org/jira/browse/BEAM-7738
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp, runner-flink, sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Labels: portability
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Now that KafkaIO is supported via the external transform API (BEAM-7029) we 
> should add support for PubSub.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8403) Race condition in request id generation of GrpcStateRequestHandler



 [ 
https://issues.apache.org/jira/browse/BEAM-8403?focusedWorklogId=329213=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329213
 ]

ASF GitHub Bot logged work on BEAM-8403:


Author: ASF GitHub Bot
Created on: 16/Oct/19 15:21
Start Date: 16/Oct/19 15:21
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #9800: [BEAM-8403] Guard 
request id generation to prevent concurrent worker access
URL: https://github.com/apache/beam/pull/9800#issuecomment-542754461
 
 
   You are right that due to the GIL there is no real parallelism. That's why 
there is the option to use multiple SDK worker processes (configurable via 
pipeline option `sdk_worker_parallelism`).
   
   As part of the verification for this change I found that using 2 worker 
processes instead of one leads to throughput increase of ~50%. It won't be 100% 
because Python is still going to switch threads while waiting for IO etc.
   
   It is still desirable to have a proper abstraction for something as basic as 
an atomic counter. It results in code that is more readable and easier to 
understand, and helps avoiding issues such as this in first place. Perhaps 
that's something the Python SDK experts can consider?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329213)
Time Spent: 2h  (was: 1h 50m)

> Race condition in request id generation of GrpcStateRequestHandler
> --
>
> Key: BEAM-8403
> URL: https://issues.apache.org/jira/browse/BEAM-8403
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> There is a race condition in {{GrpcStateRequestHandler}} which surfaced after 
> the recent changes to process append/clear state request asynchronously. The 
> race condition can occur if multiple Runner workers process a transform with 
> state requests with the same SDK Harness. For example, this setup occurs with 
> Flink when a TaskManager has multiple task slots and two or more of those 
> slots process the same stateful stage against an SDK Harness.
> CC [~robertwb]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-8390) cannot set useCorrelationId on RabbitMqIO Read

2019-10-16 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-8390:
--

Assignee: Daniel Robert

> cannot set useCorrelationId on RabbitMqIO Read
> --
>
> Key: BEAM-8390
> URL: https://issues.apache.org/jira/browse/BEAM-8390
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Affects Versions: 2.15.0
>Reporter: Daniel Robert
>Assignee: Daniel Robert
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {{RabbitMqIO.Read hard-codes setUseCorrelationId({color:#80}false{color}) 
> and there does not appear to be any means for a caller to override this.}}
> A Builder cannot be obtained (it's package-private) and there does not seem 
> to be a corresponding {{withUseCorrelationId}} on the resulting {{Read}} 
> class like there is for other properties.
> This looks like it was just an omission.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-8390) cannot set useCorrelationId on RabbitMqIO Read

2019-10-16 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-8390.

Fix Version/s: 2.17.0
   Resolution: Fixed

> cannot set useCorrelationId on RabbitMqIO Read
> --
>
> Key: BEAM-8390
> URL: https://issues.apache.org/jira/browse/BEAM-8390
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Affects Versions: 2.15.0
>Reporter: Daniel Robert
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {{RabbitMqIO.Read hard-codes setUseCorrelationId({color:#80}false{color}) 
> and there does not appear to be any means for a caller to override this.}}
> A Builder cannot be obtained (it's package-private) and there does not seem 
> to be a corresponding {{withUseCorrelationId}} on the resulting {{Read}} 
> class like there is for other properties.
> This looks like it was just an omission.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8390) cannot set useCorrelationId on RabbitMqIO Read



 [ 
https://issues.apache.org/jira/browse/BEAM-8390?focusedWorklogId=329203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329203
 ]

ASF GitHub Bot logged work on BEAM-8390:


Author: ASF GitHub Bot
Created on: 16/Oct/19 15:04
Start Date: 16/Oct/19 15:04
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #9782: [BEAM-8390] 
Allow specifying useCorrelationId for RabbitMqIO.Read
URL: https://github.com/apache/beam/pull/9782
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329203)
Time Spent: 40m  (was: 0.5h)

> cannot set useCorrelationId on RabbitMqIO Read
> --
>
> Key: BEAM-8390
> URL: https://issues.apache.org/jira/browse/BEAM-8390
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Affects Versions: 2.15.0
>Reporter: Daniel Robert
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {{RabbitMqIO.Read hard-codes setUseCorrelationId({color:#80}false{color}) 
> and there does not appear to be any means for a caller to override this.}}
> A Builder cannot be obtained (it's package-private) and there does not seem 
> to be a corresponding {{withUseCorrelationId}} on the resulting {{Read}} 
> class like there is for other properties.
> This looks like it was just an omission.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8390) cannot set useCorrelationId on RabbitMqIO Read

2019-10-16 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8390:
---
Status: Open  (was: Triage Needed)

> cannot set useCorrelationId on RabbitMqIO Read
> --
>
> Key: BEAM-8390
> URL: https://issues.apache.org/jira/browse/BEAM-8390
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Affects Versions: 2.15.0
>Reporter: Daniel Robert
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {{RabbitMqIO.Read hard-codes setUseCorrelationId({color:#80}false{color}) 
> and there does not appear to be any means for a caller to override this.}}
> A Builder cannot be obtained (it's package-private) and there does not seem 
> to be a corresponding {{withUseCorrelationId}} on the resulting {{Read}} 
> class like there is for other properties.
> This looks like it was just an omission.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8410) JdbcIO should support setConnectionInitSqls in its DataSource

2019-10-16 Thread Cam Mach (Jira)

Cam Mach created BEAM-8410:
--

 Summary: JdbcIO should support setConnectionInitSqls in its 
DataSource
 Key: BEAM-8410
 URL: https://issues.apache.org/jira/browse/BEAM-8410
 Project: Beam
  Issue Type: Improvement
  Components: io-java-jdbc
Reporter: Cam Mach
Assignee: Cam Mach






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-1440) Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK



 [ 
https://issues.apache.org/jira/browse/BEAM-1440?focusedWorklogId=329141=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329141
 ]

ASF GitHub Bot logged work on BEAM-1440:


Author: ASF GitHub Bot
Created on: 16/Oct/19 12:25
Start Date: 16/Oct/19 12:25
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #9772: [BEAM-1440] Create a 
BigQuery source that implements iobase.BoundedSource for Python
URL: https://github.com/apache/beam/pull/9772#issuecomment-542674489
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329141)
Time Spent: 3h 40m  (was: 3.5h)

> Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
> --
>
> Key: BEAM-1440
> URL: https://issues.apache.org/jira/browse/BEAM-1440
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently we have a BigQuery native source for Python SDK [1].
> This can only be used by Dataflow runner.
> We should  implement a Beam BigQuery source that implements 
> iobase.BoundedSource [2] interface so that other runners that try to use 
> Python SDK can read from BigQuery as well. Java SDK already has a Beam 
> BigQuery source [3].
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py
> [2] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70
> [3] 
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.



 [ 
https://issues.apache.org/jira/browse/BEAM-8372?focusedWorklogId=329136=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329136
 ]

ASF GitHub Bot logged work on BEAM-8372:


Author: ASF GitHub Bot
Created on: 16/Oct/19 12:10
Start Date: 16/Oct/19 12:10
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9803: [BEAM-8372] Follow-up 
to Flink UberJar submission.
URL: https://github.com/apache/beam/pull/9803#issuecomment-542669157
 
 
   Run PortableJar_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329136)
Time Spent: 4h  (was: 3h 50m)

> Allow submission of Flink UberJar directly to flink cluster.
> 
>
> Key: BEAM-8372
> URL: https://issues.apache.org/jira/browse/BEAM-8372
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8385) Add an option to run KafkaIOIT with 10GB dataset



 [ 
https://issues.apache.org/jira/browse/BEAM-8385?focusedWorklogId=329123=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329123
 ]

ASF GitHub Bot logged work on BEAM-8385:


Author: ASF GitHub Bot
Created on: 16/Oct/19 11:36
Start Date: 16/Oct/19 11:36
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #9770: [BEAM-8385] Add a 
KafkaIOIT hash for 100M records
URL: https://github.com/apache/beam/pull/9770#issuecomment-542658245
 
 
   @pabloem Can you take a look at this, please?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329123)
Time Spent: 50m  (was: 40m)

> Add an option to run KafkaIOIT with 10GB dataset
> 
>
> Key: BEAM-8385
> URL: https://issues.apache.org/jira/browse/BEAM-8385
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-1440) Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK



 [ 
https://issues.apache.org/jira/browse/BEAM-1440?focusedWorklogId=329101=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329101
 ]

ASF GitHub Bot logged work on BEAM-1440:


Author: ASF GitHub Bot
Created on: 16/Oct/19 10:37
Start Date: 16/Oct/19 10:37
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #9772: [BEAM-1440] Create a 
BigQuery source that implements iobase.BoundedSource for Python
URL: https://github.com/apache/beam/pull/9772#issuecomment-542639254
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329101)
Time Spent: 3.5h  (was: 3h 20m)

> Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
> --
>
> Key: BEAM-1440
> URL: https://issues.apache.org/jira/browse/BEAM-1440
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently we have a BigQuery native source for Python SDK [1].
> This can only be used by Dataflow runner.
> We should  implement a Beam BigQuery source that implements 
> iobase.BoundedSource [2] interface so that other runners that try to use 
> Python SDK can read from BigQuery as well. Java SDK already has a Beam 
> BigQuery source [3].
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py
> [2] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70
> [3] 
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs



 [ 
https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=329092=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329092
 ]

ASF GitHub Bot logged work on BEAM-7738:


Author: ASF GitHub Bot
Created on: 16/Oct/19 10:13
Start Date: 16/Oct/19 10:13
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #9268: [BEAM-7738] Add external 
transform support to PubsubIO
URL: https://github.com/apache/beam/pull/9268#issuecomment-542631173
 
 
   Any update here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329092)
Time Spent: 5h 20m  (was: 5h 10m)

> Support PubSubIO to be configured externally for use with other SDKs
> 
>
> Key: BEAM-7738
> URL: https://issues.apache.org/jira/browse/BEAM-7738
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp, runner-flink, sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Labels: portability
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Now that KafkaIO is supported via the external transform API (BEAM-7029) we 
> should add support for PubSub.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-1440) Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK



 [ 
https://issues.apache.org/jira/browse/BEAM-1440?focusedWorklogId=329081=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329081
 ]

ASF GitHub Bot logged work on BEAM-1440:


Author: ASF GitHub Bot
Created on: 16/Oct/19 09:39
Start Date: 16/Oct/19 09:39
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #9772: [BEAM-1440] Create a 
BigQuery source that implements iobase.BoundedSource for Python
URL: https://github.com/apache/beam/pull/9772#issuecomment-542618530
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329081)
Time Spent: 3h 10m  (was: 3h)

> Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
> --
>
> Key: BEAM-1440
> URL: https://issues.apache.org/jira/browse/BEAM-1440
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Currently we have a BigQuery native source for Python SDK [1].
> This can only be used by Dataflow runner.
> We should  implement a Beam BigQuery source that implements 
> iobase.BoundedSource [2] interface so that other runners that try to use 
> Python SDK can read from BigQuery as well. Java SDK already has a Beam 
> BigQuery source [3].
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py
> [2] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70
> [3] 
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-1440) Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK



 [ 
https://issues.apache.org/jira/browse/BEAM-1440?focusedWorklogId=329082=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329082
 ]

ASF GitHub Bot logged work on BEAM-1440:


Author: ASF GitHub Bot
Created on: 16/Oct/19 09:39
Start Date: 16/Oct/19 09:39
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #9772: [BEAM-1440] Create a 
BigQuery source that implements iobase.BoundedSource for Python
URL: https://github.com/apache/beam/pull/9772#issuecomment-542618576
 
 
   Run Python 2 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 329082)
Time Spent: 3h 20m  (was: 3h 10m)

> Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
> --
>
> Key: BEAM-1440
> URL: https://issues.apache.org/jira/browse/BEAM-1440
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Currently we have a BigQuery native source for Python SDK [1].
> This can only be used by Dataflow runner.
> We should  implement a Beam BigQuery source that implements 
> iobase.BoundedSource [2] interface so that other runners that try to use 
> Python SDK can read from BigQuery as well. Java SDK already has a Beam 
> BigQuery source [3].
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py
> [2] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70
> [3] 
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8403) Race condition in request id generation of GrpcStateRequestHandler