date:20191111

[jira] [Updated] (BEAM-7585) Ipython usage raises AttributeError

2019-11-11 Thread Rustam Khalmurzaev (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rustam Khalmurzaev updated BEAM-7585:
-
Status: Triage Needed  (was: Open)

> Ipython usage raises AttributeError
> ---
>
> Key: BEAM-7585
> URL: https://issues.apache.org/jira/browse/BEAM-7585
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct, sdk-py-core
>Reporter: SBlackwell
>Priority: Major
>  Labels: easyfix
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/display/display_manager.py#L36]
> import IPython
> _display_progress = IPython.display.display
>  
> This doesn't work:
> In [1]: import IPython; IPython.display.display('test')
> ---
> AttributeError Traceback (most recent call last)
>  in ()
> > 1 import IPython; IPython.display.display('test')
> AttributeError: 'module' object has no attribute 'display'
> In [2]: from IPython import display; display.display('test')
> 'test'
> In [3]: import IPython; IPython.display.display('test')
> 'test'
>  
>  
> Should be:
> import IPython
> from IPython import display
> _display_progress = display.display



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-8605) Function display_graph() in example do not exist

2019-11-11 Thread Rustam Khalmurzaev (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rustam Khalmurzaev reassigned BEAM-8605:


Assignee: Rustam Khalmurzaev

> Function display_graph() in example do not exist
> 
>
> Key: BEAM-8605
> URL: https://issues.apache.org/jira/browse/BEAM-8605
> Project: Beam
>  Issue Type: Bug
>  Components: runner-py-interactive, sdk-py-core
>Reporter: Rustam Khalmurzaev
>Assignee: Rustam Khalmurzaev
>Priority: Trivial
>  Labels: easyfix
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Here in code an example of using PipelineGraph class. Example in comments 
> contains function display_graph() which do not exist.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/display/pipeline_graph.py#L47]
>  
> Examples:  graph = pipeline_graph.PipelineGraph(pipeline_proto)
>  graph.display_graph()
>  or
>  graph = pipeline_graph.PipelineGraph(pipeline)
>  graph.display_graph()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (BEAM-8605) Function display_graph() in example do not exist

2019-11-11 Thread Rustam Khalmurzaev (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rustam Khalmurzaev closed BEAM-8605.

Fix Version/s: Not applicable
   Resolution: Fixed

> Function display_graph() in example do not exist
> 
>
> Key: BEAM-8605
> URL: https://issues.apache.org/jira/browse/BEAM-8605
> Project: Beam
>  Issue Type: Bug
>  Components: runner-py-interactive, sdk-py-core
>Reporter: Rustam Khalmurzaev
>Priority: Trivial
>  Labels: easyfix
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Here in code an example of using PipelineGraph class. Example in comments 
> contains function display_graph() which do not exist.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/display/pipeline_graph.py#L47]
>  
> Examples:  graph = pipeline_graph.PipelineGraph(pipeline_proto)
>  graph.display_graph()
>  or
>  graph = pipeline_graph.PipelineGraph(pipeline)
>  graph.display_graph()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=341660=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341660
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 12/Nov/19 04:42
Start Date: 12/Nov/19 04:42
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10071: [BEAM-8575] 
Windows idempotency: Applying the same window fn (or wind…
URL: https://github.com/apache/beam/pull/10071#discussion_r345011714
 
 

 ##
 File path: sdks/python/apache_beam/transforms/window_test.py
 ##
 @@ -281,6 +283,35 @@ def test_timestamped_with_combiners(self):
   assert_that(mean_per_window, equal_to([(0, 2.0), (1, 7.0)]),
   label='assert:mean')
 
+  @attr('ValidatesRunner')
+  def test_windows_idempotency(self):
+with TestPipeline() as p:
+  pcoll = self.timestamped_key_values(p, 'key', 0, 1, 2, 3, 4)
+  result = (pcoll
+| 'window' >> WindowInto(FixedWindows(2))
+| 'same window' >> WindowInto(FixedWindows(2))
+| 'same window again' >> WindowInto(FixedWindows(2))
+| GroupByKey())
+
+  assert_that(result, equal_to([('key', [0, 1]),
+('key', [2, 3]),
+('key', [4])]))
+
+  @attr('ValidatesRunner')
+  def test_windows_gbk_idempotency(self):
 
 Review comment:
   ```suggestion
 def test_window_assignment_through_multiple_gbk_idempotency(self):
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341660)
Time Spent: 3h 20m  (was: 3h 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8442) Unify bundle register in Python SDK harness

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8442?focusedWorklogId=341659=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341659
 ]

ASF GitHub Bot logged work on BEAM-8442:


Author: ASF GitHub Bot
Created on: 12/Nov/19 04:42
Start Date: 12/Nov/19 04:42
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10004: [BEAM-8442] Unify 
bundle register in Python SDK harness
URL: https://github.com/apache/beam/pull/10004#issuecomment-552731641
 
 
   Actually, looking at the code, if SdkWorker.register (which is now
   asynchronous) does not complete before SdkWorker.process_bundle
   starts, process_bundle fails. We could be getting lucky as register is
   fast, but there's a race here now that it's not synchronous.
   
   
https://github.com/sunjincheng121/beam/blob/79d21b11ca31963bd1be84a26502e5577bbefbdf/sdks/python/apache_beam/runners/worker/sdk_worker.py#L371
   
   On Mon, Nov 11, 2019 at 7:24 PM Jincheng Sun 
   wrote:
   
   > @robertwb 
   > Thanks a lot for comments. After this change, if the registration failed,
   > the data channel between the runner and the SDK harness will not be
   > established and so the data transmission from the runner to the SDK harness
   > will be timedout
   > 
.
   > At that time, the registration failure could be reported. This is also the
   > same case for the process bundle instruction before and after this change.
   > Are you concerning that we should detect the registration failure at the
   > early time? If it's the case, how do we deal with the process bundle
   > instruction, should we keep the same behavior for both the registration and
   > process bundle instruction.
   > What do you think?
   > Best, Jincheng
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or unsubscribe
   > 

   > .
   >
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341659)
Time Spent: 5h  (was: 4h 50m)

> Unify bundle register in Python SDK harness
> ---
>
> Key: BEAM-8442
> URL: https://issues.apache.org/jira/browse/BEAM-8442
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> There are two methods for bundle register in Python SDK harness:
> `SdkHarness._request_register` and `SdkWorker.register.` It should be unfied.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=341661=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341661
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 12/Nov/19 04:42
Start Date: 12/Nov/19 04:42
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10071: [BEAM-8575] 
Windows idempotency: Applying the same window fn (or wind…
URL: https://github.com/apache/beam/pull/10071#discussion_r345011570
 
 

 ##
 File path: sdks/python/apache_beam/transforms/window_test.py
 ##
 @@ -281,6 +283,35 @@ def test_timestamped_with_combiners(self):
   assert_that(mean_per_window, equal_to([(0, 2.0), (1, 7.0)]),
   label='assert:mean')
 
+  @attr('ValidatesRunner')
+  def test_windows_idempotency(self):
 
 Review comment:
   ```suggestion
 def test_window_assignment_idempotency(self):
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341661)
Time Spent: 3.5h  (was: 3h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8597) Allow TestStream trigger tests to run on other runners.

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8597?focusedWorklogId=341657=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341657
 ]

ASF GitHub Bot logged work on BEAM-8597:


Author: ASF GitHub Bot
Created on: 12/Nov/19 04:26
Start Date: 12/Nov/19 04:26
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10043: [BEAM-8597] 
Allow TestStream trigger tests to run on other runners.
URL: https://github.com/apache/beam/pull/10043#discussion_r345012835
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream.py
 ##
 @@ -66,6 +68,28 @@ def __ne__(self, other):
 # TODO(BEAM-5949): Needed for Python 2 compatibility.
 return not self == other
 
+  @abstractmethod
+  def to_runner_api(self, element_coder):
+raise NotImplementedError
+
+  @staticmethod
+  def from_runner_api(proto, element_coder):
 
 Review comment:
   Yes, exactly. This is not a user-extensible set. (Also, the logic is 
trivial, so not worth the noise of delegating.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341657)
Time Spent: 1h  (was: 50m)

> Allow TestStream trigger tests to run on other runners.
> ---
>
> Key: BEAM-8597
> URL: https://issues.apache.org/jira/browse/BEAM-8597
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-2939) Fn API SDF support

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=341655=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341655
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 12/Nov/19 04:13
Start Date: 12/Nov/19 04:13
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10065: [BEAM-2939, 
BEAM-5600, BEAM-6868, BEAM-7233] Expose residual roots and bundle finalization 
callbacks to portable runners.
URL: https://github.com/apache/beam/pull/10065
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341655)
Time Spent: 20h 10m  (was: 20h)

> Fn API SDF support
> --
>
> Key: BEAM-2939
> URL: https://issues.apache.org/jira/browse/BEAM-2939
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 20h 10m
>  Remaining Estimate: 0h
>
> The Fn API should support streaming SDF. Detailed design TBD.
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-2939) Fn API SDF support

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=341654=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341654
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 12/Nov/19 04:12
Start Date: 12/Nov/19 04:12
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10065: [BEAM-2939, 
BEAM-5600, BEAM-6868, BEAM-7233] Expose residual roots and bundle finalization 
callbacks to portable runners.
URL: https://github.com/apache/beam/pull/10065#discussion_r345010682
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/BundleFinalizationHandler.java
 ##
 @@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.control;
+
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.ProcessBundleResponse;
+
+/**
+ * A handler for the runner when a finalization request has been received.
+ *
+ * The runner is responsible for finalizing the bundle when all output from 
the bundle has been
 
 Review comment:
   Yes
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341654)
Time Spent: 20h  (was: 19h 50m)

> Fn API SDF support
> --
>
> Key: BEAM-2939
> URL: https://issues.apache.org/jira/browse/BEAM-2939
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 20h
>  Remaining Estimate: 0h
>
> The Fn API should support streaming SDF. Detailed design TBD.
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=341652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341652
 ]

ASF GitHub Bot logged work on BEAM-8151:


Author: ASF GitHub Bot
Created on: 12/Nov/19 04:12
Start Date: 12/Nov/19 04:12
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #9477: [BEAM-8151, 
BEAM-7848] Up the max number of threads inside the SDK harness to a default of 
10k
URL: https://github.com/apache/beam/pull/9477#issuecomment-552726126
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341652)
Time Spent: 8h 50m  (was: 8h 40m)

> Allow the Python SDK to use many many threads
> -
>
> Key: BEAM-8151
> URL: https://issues.apache.org/jira/browse/BEAM-8151
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> We need to use a thread pool which shrinks the number of active threads when 
> they are not being used.
>  
> This is to prevent any stuckness issues related to a runner scheduling more 
> work items then there are "work" threads inside the SDK harness.
>  
> By default the control plane should have all "requests" being processed in 
> parallel and the runner is responsible for not overloading the SDK with too 
> much work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=341647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341647
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 12/Nov/19 03:24
Start Date: 12/Nov/19 03:24
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on issue #10071: [BEAM-8575] 
Windows idempotency: Applying the same window fn (or wind…
URL: https://github.com/apache/beam/pull/10071#issuecomment-552717402
 
 
   R: @lcwik
   
   Hi Luke, I suppose I will spend longer time on the custom Window fn test, so 
I separate the other two more trivial tests on window idempotency (from 
https://github.com/apache/beam/pull/9957) into this PR to get them in faster. 
Thanks!
   
   I will later update the original PR of the custom Window Fn test.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341647)
Time Spent: 3h 10m  (was: 3h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8442) Unify bundle register in Python SDK harness

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8442?focusedWorklogId=341646=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341646
 ]

ASF GitHub Bot logged work on BEAM-8442:


Author: ASF GitHub Bot
Created on: 12/Nov/19 03:24
Start Date: 12/Nov/19 03:24
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #10004: [BEAM-8442] 
Unify bundle register in Python SDK harness
URL: https://github.com/apache/beam/pull/10004#issuecomment-552717309
 
 
   @robertwb 
   Thanks a lot for comments. After this change, if the registration failed, 
the data channel between the runner and the SDK harness will not be established 
and so the data transmission from the runner to the SDK harness will be 
[timedout](https://github.com/apache/beam/blob/74ea7496ff486640affbf2e6ab4a45d3ea25efc8/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/data/GrpcDataService.java#L170).
 At that time, the registration failure could be reported. This is also the 
same case for the process bundle instruction before and after this change. Are 
you concerning that we should detect the registration failure at the early 
time? If it's the case, how do we deal with the process bundle instruction, 
should we keep the same behavior for both the registration and process bundle 
instruction.
   What do you think?
   Best, Jincheng
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341646)
Time Spent: 4h 50m  (was: 4h 40m)

> Unify bundle register in Python SDK harness
> ---
>
> Key: BEAM-8442
> URL: https://issues.apache.org/jira/browse/BEAM-8442
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> There are two methods for bundle register in Python SDK harness:
> `SdkHarness._request_register` and `SdkWorker.register.` It should be unfied.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=341644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341644
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 12/Nov/19 03:20
Start Date: 12/Nov/19 03:20
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on issue #10071: [BEAM-8575] 
Windows idempotency: Applying the same window fn (or wind…
URL: https://github.com/apache/beam/pull/10071#issuecomment-552716522
 
 
   beam_PostCommit_Py_VR_Dataflow
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341644)
Time Spent: 3h  (was: 2h 50m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-2939) Fn API SDF support

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=341645=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341645
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 12/Nov/19 03:20
Start Date: 12/Nov/19 03:20
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #10065: [BEAM-2939, 
BEAM-5600, BEAM-6868, BEAM-7233] Expose residual roots and bundle finalization 
callbacks to portable runners.
URL: https://github.com/apache/beam/pull/10065#discussion_r345002381
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/BundleFinalizationHandler.java
 ##
 @@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.control;
+
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.ProcessBundleResponse;
+
+/**
+ * A handler for the runner when a finalization request has been received.
+ *
+ * The runner is responsible for finalizing the bundle when all output from 
the bundle has been
 
 Review comment:
   Just to clarify (based on the discussion in the doc): Flink (as an example) 
performs checkpointing periodically, independent of bundle completion. The 
Flink runner would therefore need to defer bundle finalization until after the 
next checkpoint is complete and then finalize all bundles that were completed 
since the previous checkpoint?
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341645)
Time Spent: 19h 50m  (was: 19h 40m)

> Fn API SDF support
> --
>
> Key: BEAM-2939
> URL: https://issues.apache.org/jira/browse/BEAM-2939
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 19h 50m
>  Remaining Estimate: 0h
>
> The Fn API should support streaming SDF. Detailed design TBD.
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=341643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341643
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 12/Nov/19 03:19
Start Date: 12/Nov/19 03:19
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on pull request #10071: 
[BEAM-8575] Windows idempotency: Applying the same window fn (or wind…
URL: https://github.com/apache/beam/pull/10071
 
 
   …ow fn + GBK) to the input multiple times will have the same effect as 
applying it once
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build

[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=341635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341635
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 12/Nov/19 03:09
Start Date: 12/Nov/19 03:09
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9606: [BEAM-7246] Add 
Google Spanner IO Read on Python SDK
URL: https://github.com/apache/beam/pull/9606#discussion_r344989491
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/spannerio.py
 ##
 @@ -0,0 +1,531 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Google Cloud Spanner IO
+
+This is an experimental module for reading and writing data from Google Cloud
+Spanner. Visit: https://cloud.google.com/spanner for more details.
+
+To read from Cloud Spanner apply ReadFromSpanner transformation. It will
+return a list, where each element represents an individual row returned from
+the read operation. Both Query and Read APIs are supported.
+
+ReadFromSpanner relies on the ReadOperation objects which is exposed by the
+SpannerIO API. ReadOperation holds the immutable data which is responsible to
+execute batch and naive reads on spanner cloud. This is done for more
+convenient programming.
+
+ReadFromSpanner read from cloud spanner by providing the either 'sql' param
+in the constructor or 'table' name with 'columns' as list. For example:::
+
+  records = (pipeline
+| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+sql='Select * from users'))
+
+  records = (pipeline
+| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+table='users', columns=['id', 'name', 'email']))
+
+You can also performs the multiple reads by provide the list of ReadOperation
+to the ReadFromSpanner transform constructor. ReadOperation exposes two static
+methods. Use 'query' to perform sql based reads, 'table' to perform read from
+table name. For example:::
+
+  read_operations = [
+  ReadOperation.table('customers', ['name', 'email']),
+  ReadOperation.table('vendors', ['name', 'email']),
+]
+  all_users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+read_operations=read_operations)
+
+  ...OR...
+
+  read_operations = [
+  ReadOperation.sql('Select name, email from customers'),
 
 Review comment:
   ```suggestion
 ReadOperation.query('Select name, email from 
customers'),
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341635)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=341630=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341630
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 12/Nov/19 03:09
Start Date: 12/Nov/19 03:09
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9606: [BEAM-7246] Add 
Google Spanner IO Read on Python SDK
URL: https://github.com/apache/beam/pull/9606#discussion_r343405175
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/spannerio.py
 ##
 @@ -0,0 +1,531 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Google Cloud Spanner IO
+
+This is an experimental module for reading and writing data from Google Cloud
+Spanner. Visit: https://cloud.google.com/spanner for more details.
+
+To read from Cloud Spanner apply ReadFromSpanner transformation. It will
+return a list, where each element represents an individual row returned from
 
 Review comment:
   ```suggestion
   return a PCollection, where each element represents an individual row 
returned from
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341630)
Time Spent: 0.5h  (was: 20m)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=341627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341627
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 12/Nov/19 03:09
Start Date: 12/Nov/19 03:09
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9606: [BEAM-7246] Add 
Google Spanner IO Read on Python SDK
URL: https://github.com/apache/beam/pull/9606#discussion_r341775367
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -111,6 +111,7 @@ def get_version():
 'future>=0.16.0,<1.0.0',
 'futures>=3.2.0,<4.0.0; python_version < "3.0"',
 'grpcio>=1.12.1,<2',
+'grpcio-gcp>=0.2.2,<1',
 
 Review comment:
   Could this be moved to GCP_REQUIREMENTS?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341627)
Remaining Estimate: 0h
Time Spent: 10m

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=341640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341640
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 12/Nov/19 03:09
Start Date: 12/Nov/19 03:09
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9606: [BEAM-7246] Add 
Google Spanner IO Read on Python SDK
URL: https://github.com/apache/beam/pull/9606#discussion_r345000244
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/spannerio.py
 ##
 @@ -0,0 +1,531 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Google Cloud Spanner IO
+
+This is an experimental module for reading and writing data from Google Cloud
+Spanner. Visit: https://cloud.google.com/spanner for more details.
+
+To read from Cloud Spanner apply ReadFromSpanner transformation. It will
+return a list, where each element represents an individual row returned from
+the read operation. Both Query and Read APIs are supported.
+
+ReadFromSpanner relies on the ReadOperation objects which is exposed by the
+SpannerIO API. ReadOperation holds the immutable data which is responsible to
+execute batch and naive reads on spanner cloud. This is done for more
+convenient programming.
+
+ReadFromSpanner read from cloud spanner by providing the either 'sql' param
+in the constructor or 'table' name with 'columns' as list. For example:::
+
+  records = (pipeline
+| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+sql='Select * from users'))
+
+  records = (pipeline
+| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+table='users', columns=['id', 'name', 'email']))
+
+You can also performs the multiple reads by provide the list of ReadOperation
+to the ReadFromSpanner transform constructor. ReadOperation exposes two static
+methods. Use 'query' to perform sql based reads, 'table' to perform read from
+table name. For example:::
+
+  read_operations = [
+  ReadOperation.table('customers', ['name', 'email']),
+  ReadOperation.table('vendors', ['name', 'email']),
+]
+  all_users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+read_operations=read_operations)
+
+  ...OR...
+
+  read_operations = [
+  ReadOperation.sql('Select name, email from customers'),
+  ReadOperation.table('vendors', ['name', 'email']),
+]
+  all_users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+read_operations=read_operations)
+
+For more information, please review the docs on class ReadOperation.
+
+User can also able to provide the ReadOperation in form of PCollection via
+pipeline. For example:::
+
+  users = (pipeline
+   | beam.Create([ReadOperation...])
+   | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME))
+
+User may also create cloud spanner transaction from the transform called
+`create_transaction` which is available in the SpannerIO API.
+
+The transform is guaranteed to be executed on a consistent snapshot of data,
+utilizing the power of read only transactions. Staleness of data can be
+controlled by providing the `read_timestamp` or `exact_staleness` param values
+in the constructor.
+
+This transform requires root of the pipeline (PBegin) and returns the dict
+containing 'session_id' and 'transaction_id'. This `create_transaction`
+PTransform later passed to the constructor of ReadFromSpanner. For example:::
+
+  transaction = (pipeline | create_transaction(TEST_PROJECT_ID,
+  TEST_INSTANCE_ID,
+  DB_NAME))
+
+  users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+sql='Select * from users', transaction=transaction)
+
+  tweets = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+sql='Select * from tweets', transaction=transaction)
+
+For further details of this transform, please review the docs on the
+`create_transaction` method available in the

[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=341631=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341631
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 12/Nov/19 03:09
Start Date: 12/Nov/19 03:09
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9606: [BEAM-7246] Add 
Google Spanner IO Read on Python SDK
URL: https://github.com/apache/beam/pull/9606#discussion_r343286065
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/spannerio.py
 ##
 @@ -0,0 +1,531 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Google Cloud Spanner IO
+
+This is an experimental module for reading and writing data from Google Cloud
+Spanner. Visit: https://cloud.google.com/spanner for more details.
+
+To read from Cloud Spanner apply ReadFromSpanner transformation. It will
+return a list, where each element represents an individual row returned from
+the read operation. Both Query and Read APIs are supported.
+
+ReadFromSpanner relies on the ReadOperation objects which is exposed by the
+SpannerIO API. ReadOperation holds the immutable data which is responsible to
+execute batch and naive reads on spanner cloud. This is done for more
+convenient programming.
+
+ReadFromSpanner read from cloud spanner by providing the either 'sql' param
+in the constructor or 'table' name with 'columns' as list. For example:::
+
+  records = (pipeline
+| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+sql='Select * from users'))
+
+  records = (pipeline
+| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+table='users', columns=['id', 'name', 'email']))
+
+You can also performs the multiple reads by provide the list of ReadOperation
+to the ReadFromSpanner transform constructor. ReadOperation exposes two static
+methods. Use 'query' to perform sql based reads, 'table' to perform read from
+table name. For example:::
+
+  read_operations = [
+  ReadOperation.table('customers', ['name', 'email']),
+  ReadOperation.table('vendors', ['name', 'email']),
+]
+  all_users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+read_operations=read_operations)
+
+  ...OR...
+
+  read_operations = [
+  ReadOperation.sql('Select name, email from customers'),
+  ReadOperation.table('vendors', ['name', 'email']),
+]
+  all_users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+read_operations=read_operations)
+
+For more information, please review the docs on class ReadOperation.
+
+User can also able to provide the ReadOperation in form of PCollection via
+pipeline. For example:::
+
+  users = (pipeline
+   | beam.Create([ReadOperation...])
+   | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME))
+
+User may also create cloud spanner transaction from the transform called
+`create_transaction` which is available in the SpannerIO API.
+
+The transform is guaranteed to be executed on a consistent snapshot of data,
+utilizing the power of read only transactions. Staleness of data can be
+controlled by providing the `read_timestamp` or `exact_staleness` param values
+in the constructor.
+
+This transform requires root of the pipeline (PBegin) and returns the dict
+containing 'session_id' and 'transaction_id'. This `create_transaction`
+PTransform later passed to the constructor of ReadFromSpanner. For example:::
+
+  transaction = (pipeline | create_transaction(TEST_PROJECT_ID,
+  TEST_INSTANCE_ID,
+  DB_NAME))
+
+  users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+sql='Select * from users', transaction=transaction)
+
+  tweets = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+sql='Select * from tweets', transaction=transaction)
+
+For further details of this transform, please review the docs on the
+`create_transaction` method available in the

[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=341632=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341632
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 12/Nov/19 03:09
Start Date: 12/Nov/19 03:09
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9606: [BEAM-7246] Add 
Google Spanner IO Read on Python SDK
URL: https://github.com/apache/beam/pull/9606#discussion_r343294655
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/spannerio.py
 ##
 @@ -0,0 +1,531 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Google Cloud Spanner IO
+
+This is an experimental module for reading and writing data from Google Cloud
+Spanner. Visit: https://cloud.google.com/spanner for more details.
+
+To read from Cloud Spanner apply ReadFromSpanner transformation. It will
+return a list, where each element represents an individual row returned from
+the read operation. Both Query and Read APIs are supported.
+
+ReadFromSpanner relies on the ReadOperation objects which is exposed by the
+SpannerIO API. ReadOperation holds the immutable data which is responsible to
+execute batch and naive reads on spanner cloud. This is done for more
+convenient programming.
+
+ReadFromSpanner read from cloud spanner by providing the either 'sql' param
+in the constructor or 'table' name with 'columns' as list. For example:::
+
+  records = (pipeline
+| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+sql='Select * from users'))
+
+  records = (pipeline
+| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+table='users', columns=['id', 'name', 'email']))
+
+You can also performs the multiple reads by provide the list of ReadOperation
+to the ReadFromSpanner transform constructor. ReadOperation exposes two static
+methods. Use 'query' to perform sql based reads, 'table' to perform read from
+table name. For example:::
+
+  read_operations = [
+  ReadOperation.table('customers', ['name', 'email']),
+  ReadOperation.table('vendors', ['name', 'email']),
+]
+  all_users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+read_operations=read_operations)
+
+  ...OR...
+
+  read_operations = [
+  ReadOperation.sql('Select name, email from customers'),
+  ReadOperation.table('vendors', ['name', 'email']),
+]
+  all_users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+read_operations=read_operations)
+
+For more information, please review the docs on class ReadOperation.
+
+User can also able to provide the ReadOperation in form of PCollection via
+pipeline. For example:::
+
+  users = (pipeline
+   | beam.Create([ReadOperation...])
+   | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME))
+
+User may also create cloud spanner transaction from the transform called
+`create_transaction` which is available in the SpannerIO API.
+
+The transform is guaranteed to be executed on a consistent snapshot of data,
+utilizing the power of read only transactions. Staleness of data can be
+controlled by providing the `read_timestamp` or `exact_staleness` param values
+in the constructor.
+
+This transform requires root of the pipeline (PBegin) and returns the dict
+containing 'session_id' and 'transaction_id'. This `create_transaction`
+PTransform later passed to the constructor of ReadFromSpanner. For example:::
+
+  transaction = (pipeline | create_transaction(TEST_PROJECT_ID,
+  TEST_INSTANCE_ID,
+  DB_NAME))
+
+  users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+sql='Select * from users', transaction=transaction)
+
+  tweets = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+sql='Select * from tweets', transaction=transaction)
+
+For further details of this transform, please review the docs on the
+`create_transaction` method available in the

[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=341628=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341628
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 12/Nov/19 03:09
Start Date: 12/Nov/19 03:09
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9606: [BEAM-7246] Add 
Google Spanner IO Read on Python SDK
URL: https://github.com/apache/beam/pull/9606#discussion_r341775827
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -111,6 +111,7 @@ def get_version():
 'future>=0.16.0,<1.0.0',
 'futures>=3.2.0,<4.0.0; python_version < "3.0"',
 'grpcio>=1.12.1,<2',
+'grpcio-gcp>=0.2.2,<1',
 
 Review comment:
   Also, what is this dependency used for?
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341628)
Time Spent: 20m  (was: 10m)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=341636=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341636
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 12/Nov/19 03:09
Start Date: 12/Nov/19 03:09
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9606: [BEAM-7246] Add 
Google Spanner IO Read on Python SDK
URL: https://github.com/apache/beam/pull/9606#discussion_r343405841
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/spannerio.py
 ##
 @@ -0,0 +1,531 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Google Cloud Spanner IO
+
+This is an experimental module for reading and writing data from Google Cloud
+Spanner. Visit: https://cloud.google.com/spanner for more details.
+
+To read from Cloud Spanner apply ReadFromSpanner transformation. It will
+return a list, where each element represents an individual row returned from
+the read operation. Both Query and Read APIs are supported.
+
+ReadFromSpanner relies on the ReadOperation objects which is exposed by the
+SpannerIO API. ReadOperation holds the immutable data which is responsible to
+execute batch and naive reads on spanner cloud. This is done for more
 
 Review comment:
   What are naive reads? Could you provide a more descriptive term?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341636)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=341634=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341634
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 12/Nov/19 03:09
Start Date: 12/Nov/19 03:09
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9606: [BEAM-7246] Add 
Google Spanner IO Read on Python SDK
URL: https://github.com/apache/beam/pull/9606#discussion_r343406252
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/spannerio.py
 ##
 @@ -0,0 +1,531 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Google Cloud Spanner IO
+
+This is an experimental module for reading and writing data from Google Cloud
+Spanner. Visit: https://cloud.google.com/spanner for more details.
+
+To read from Cloud Spanner apply ReadFromSpanner transformation. It will
+return a list, where each element represents an individual row returned from
+the read operation. Both Query and Read APIs are supported.
+
+ReadFromSpanner relies on the ReadOperation objects which is exposed by the
+SpannerIO API. ReadOperation holds the immutable data which is responsible to
+execute batch and naive reads on spanner cloud. This is done for more
+convenient programming.
+
+ReadFromSpanner read from cloud spanner by providing the either 'sql' param
 
 Review comment:
   ```suggestion
   ReadFromSpanner reads from Cloud Spanner by providing either an 'sql' param
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341634)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=341637=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341637
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 12/Nov/19 03:09
Start Date: 12/Nov/19 03:09
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9606: [BEAM-7246] Add 
Google Spanner IO Read on Python SDK
URL: https://github.com/apache/beam/pull/9606#discussion_r344989473
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/spannerio.py
 ##
 @@ -0,0 +1,531 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Google Cloud Spanner IO
+
+This is an experimental module for reading and writing data from Google Cloud
+Spanner. Visit: https://cloud.google.com/spanner for more details.
+
+To read from Cloud Spanner apply ReadFromSpanner transformation. It will
+return a list, where each element represents an individual row returned from
+the read operation. Both Query and Read APIs are supported.
+
+ReadFromSpanner relies on the ReadOperation objects which is exposed by the
+SpannerIO API. ReadOperation holds the immutable data which is responsible to
+execute batch and naive reads on spanner cloud. This is done for more
+convenient programming.
+
+ReadFromSpanner read from cloud spanner by providing the either 'sql' param
+in the constructor or 'table' name with 'columns' as list. For example:::
+
+  records = (pipeline
+| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+sql='Select * from users'))
+
+  records = (pipeline
+| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+table='users', columns=['id', 'name', 'email']))
+
+You can also performs the multiple reads by provide the list of ReadOperation
+to the ReadFromSpanner transform constructor. ReadOperation exposes two static
+methods. Use 'query' to perform sql based reads, 'table' to perform read from
+table name. For example:::
+
+  read_operations = [
+  ReadOperation.table('customers', ['name', 'email']),
+  ReadOperation.table('vendors', ['name', 'email']),
+]
+  all_users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+read_operations=read_operations)
+
+  ...OR...
+
+  read_operations = [
+  ReadOperation.sql('Select name, email from customers'),
+  ReadOperation.table('vendors', ['name', 'email']),
 
 Review comment:
   Should this be a second query example? Perhaps make this an example that 
uses `params`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341637)
Time Spent: 1h  (was: 50m)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=341638=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341638
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 12/Nov/19 03:09
Start Date: 12/Nov/19 03:09
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9606: [BEAM-7246] Add 
Google Spanner IO Read on Python SDK
URL: https://github.com/apache/beam/pull/9606#discussion_r343287948
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/spannerio.py
 ##
 @@ -0,0 +1,531 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Google Cloud Spanner IO
+
+This is an experimental module for reading and writing data from Google Cloud
+Spanner. Visit: https://cloud.google.com/spanner for more details.
+
+To read from Cloud Spanner apply ReadFromSpanner transformation. It will
+return a list, where each element represents an individual row returned from
+the read operation. Both Query and Read APIs are supported.
+
+ReadFromSpanner relies on the ReadOperation objects which is exposed by the
+SpannerIO API. ReadOperation holds the immutable data which is responsible to
+execute batch and naive reads on spanner cloud. This is done for more
+convenient programming.
+
+ReadFromSpanner read from cloud spanner by providing the either 'sql' param
+in the constructor or 'table' name with 'columns' as list. For example:::
+
+  records = (pipeline
+| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+sql='Select * from users'))
+
+  records = (pipeline
+| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+table='users', columns=['id', 'name', 'email']))
+
+You can also performs the multiple reads by provide the list of ReadOperation
+to the ReadFromSpanner transform constructor. ReadOperation exposes two static
+methods. Use 'query' to perform sql based reads, 'table' to perform read from
+table name. For example:::
+
+  read_operations = [
+  ReadOperation.table('customers', ['name', 'email']),
+  ReadOperation.table('vendors', ['name', 'email']),
+]
+  all_users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+read_operations=read_operations)
+
+  ...OR...
+
+  read_operations = [
+  ReadOperation.sql('Select name, email from customers'),
+  ReadOperation.table('vendors', ['name', 'email']),
+]
+  all_users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+read_operations=read_operations)
+
+For more information, please review the docs on class ReadOperation.
+
+User can also able to provide the ReadOperation in form of PCollection via
+pipeline. For example:::
+
+  users = (pipeline
+   | beam.Create([ReadOperation...])
+   | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME))
+
+User may also create cloud spanner transaction from the transform called
+`create_transaction` which is available in the SpannerIO API.
+
+The transform is guaranteed to be executed on a consistent snapshot of data,
+utilizing the power of read only transactions. Staleness of data can be
+controlled by providing the `read_timestamp` or `exact_staleness` param values
+in the constructor.
+
+This transform requires root of the pipeline (PBegin) and returns the dict
+containing 'session_id' and 'transaction_id'. This `create_transaction`
+PTransform later passed to the constructor of ReadFromSpanner. For example:::
+
+  transaction = (pipeline | create_transaction(TEST_PROJECT_ID,
+  TEST_INSTANCE_ID,
+  DB_NAME))
+
+  users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+sql='Select * from users', transaction=transaction)
+
+  tweets = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+sql='Select * from tweets', transaction=transaction)
+
+For further details of this transform, please review the docs on the
+`create_transaction` method available in the

[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=341629=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341629
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 12/Nov/19 03:09
Start Date: 12/Nov/19 03:09
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9606: [BEAM-7246] Add 
Google Spanner IO Read on Python SDK
URL: https://github.com/apache/beam/pull/9606#discussion_r343277819
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/spannerio.py
 ##
 @@ -0,0 +1,531 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Google Cloud Spanner IO
+
+This is an experimental module for reading and writing data from Google Cloud
+Spanner. Visit: https://cloud.google.com/spanner for more details.
+
+To read from Cloud Spanner apply ReadFromSpanner transformation. It will
+return a list, where each element represents an individual row returned from
+the read operation. Both Query and Read APIs are supported.
+
+ReadFromSpanner relies on the ReadOperation objects which is exposed by the
+SpannerIO API. ReadOperation holds the immutable data which is responsible to
+execute batch and naive reads on spanner cloud. This is done for more
+convenient programming.
+
+ReadFromSpanner read from cloud spanner by providing the either 'sql' param
+in the constructor or 'table' name with 'columns' as list. For example:::
+
+  records = (pipeline
+| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+sql='Select * from users'))
+
+  records = (pipeline
+| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+table='users', columns=['id', 'name', 'email']))
+
+You can also performs the multiple reads by provide the list of ReadOperation
+to the ReadFromSpanner transform constructor. ReadOperation exposes two static
+methods. Use 'query' to perform sql based reads, 'table' to perform read from
+table name. For example:::
+
+  read_operations = [
+  ReadOperation.table('customers', ['name', 'email']),
+  ReadOperation.table('vendors', ['name', 'email']),
+]
+  all_users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+read_operations=read_operations)
+
+  ...OR...
+
+  read_operations = [
+  ReadOperation.sql('Select name, email from customers'),
+  ReadOperation.table('vendors', ['name', 'email']),
+]
+  all_users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+read_operations=read_operations)
+
+For more information, please review the docs on class ReadOperation.
+
+User can also able to provide the ReadOperation in form of PCollection via
+pipeline. For example:::
+
+  users = (pipeline
+   | beam.Create([ReadOperation...])
+   | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME))
+
+User may also create cloud spanner transaction from the transform called
+`create_transaction` which is available in the SpannerIO API.
+
+The transform is guaranteed to be executed on a consistent snapshot of data,
+utilizing the power of read only transactions. Staleness of data can be
+controlled by providing the `read_timestamp` or `exact_staleness` param values
+in the constructor.
+
+This transform requires root of the pipeline (PBegin) and returns the dict
+containing 'session_id' and 'transaction_id'. This `create_transaction`
+PTransform later passed to the constructor of ReadFromSpanner. For example:::
+
+  transaction = (pipeline | create_transaction(TEST_PROJECT_ID,
+  TEST_INSTANCE_ID,
+  DB_NAME))
+
+  users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+sql='Select * from users', transaction=transaction)
+
+  tweets = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+sql='Select * from tweets', transaction=transaction)
+
+For further details of this transform, please review the docs on the
+`create_transaction` method available in the

[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=341641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341641
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 12/Nov/19 03:09
Start Date: 12/Nov/19 03:09
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9606: [BEAM-7246] Add 
Google Spanner IO Read on Python SDK
URL: https://github.com/apache/beam/pull/9606#discussion_r344987186
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/spannerio.py
 ##
 @@ -0,0 +1,531 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Google Cloud Spanner IO
+
+This is an experimental module for reading and writing data from Google Cloud
+Spanner. Visit: https://cloud.google.com/spanner for more details.
+
+To read from Cloud Spanner apply ReadFromSpanner transformation. It will
+return a list, where each element represents an individual row returned from
+the read operation. Both Query and Read APIs are supported.
+
+ReadFromSpanner relies on the ReadOperation objects which is exposed by the
+SpannerIO API. ReadOperation holds the immutable data which is responsible to
+execute batch and naive reads on spanner cloud. This is done for more
+convenient programming.
+
+ReadFromSpanner read from cloud spanner by providing the either 'sql' param
+in the constructor or 'table' name with 'columns' as list. For example:::
+
+  records = (pipeline
+| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+sql='Select * from users'))
+
+  records = (pipeline
+| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+table='users', columns=['id', 'name', 'email']))
+
+You can also performs the multiple reads by provide the list of ReadOperation
 
 Review comment:
   ```suggestion
   You can also perform multiple reads by providing a list of ReadOperations
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341641)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=341639=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341639
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 12/Nov/19 03:09
Start Date: 12/Nov/19 03:09
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9606: [BEAM-7246] Add 
Google Spanner IO Read on Python SDK
URL: https://github.com/apache/beam/pull/9606#discussion_r343287548
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/spannerio.py
 ##
 @@ -0,0 +1,531 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Google Cloud Spanner IO
+
+This is an experimental module for reading and writing data from Google Cloud
+Spanner. Visit: https://cloud.google.com/spanner for more details.
+
+To read from Cloud Spanner apply ReadFromSpanner transformation. It will
+return a list, where each element represents an individual row returned from
+the read operation. Both Query and Read APIs are supported.
+
+ReadFromSpanner relies on the ReadOperation objects which is exposed by the
+SpannerIO API. ReadOperation holds the immutable data which is responsible to
+execute batch and naive reads on spanner cloud. This is done for more
+convenient programming.
+
+ReadFromSpanner read from cloud spanner by providing the either 'sql' param
+in the constructor or 'table' name with 'columns' as list. For example:::
+
+  records = (pipeline
+| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+sql='Select * from users'))
+
+  records = (pipeline
+| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+table='users', columns=['id', 'name', 'email']))
+
+You can also performs the multiple reads by provide the list of ReadOperation
+to the ReadFromSpanner transform constructor. ReadOperation exposes two static
+methods. Use 'query' to perform sql based reads, 'table' to perform read from
+table name. For example:::
+
+  read_operations = [
+  ReadOperation.table('customers', ['name', 'email']),
+  ReadOperation.table('vendors', ['name', 'email']),
+]
+  all_users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+read_operations=read_operations)
+
+  ...OR...
+
+  read_operations = [
+  ReadOperation.sql('Select name, email from customers'),
+  ReadOperation.table('vendors', ['name', 'email']),
+]
+  all_users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+read_operations=read_operations)
+
+For more information, please review the docs on class ReadOperation.
+
+User can also able to provide the ReadOperation in form of PCollection via
+pipeline. For example:::
+
+  users = (pipeline
+   | beam.Create([ReadOperation...])
+   | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME))
+
+User may also create cloud spanner transaction from the transform called
+`create_transaction` which is available in the SpannerIO API.
+
+The transform is guaranteed to be executed on a consistent snapshot of data,
+utilizing the power of read only transactions. Staleness of data can be
+controlled by providing the `read_timestamp` or `exact_staleness` param values
+in the constructor.
+
+This transform requires root of the pipeline (PBegin) and returns the dict
+containing 'session_id' and 'transaction_id'. This `create_transaction`
+PTransform later passed to the constructor of ReadFromSpanner. For example:::
+
+  transaction = (pipeline | create_transaction(TEST_PROJECT_ID,
+  TEST_INSTANCE_ID,
+  DB_NAME))
+
+  users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+sql='Select * from users', transaction=transaction)
+
+  tweets = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME,
+sql='Select * from tweets', transaction=transaction)
+
+For further details of this transform, please review the docs on the
+`create_transaction` method available in the

[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=341633=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341633
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 12/Nov/19 03:09
Start Date: 12/Nov/19 03:09
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9606: [BEAM-7246] Add 
Google Spanner IO Read on Python SDK
URL: https://github.com/apache/beam/pull/9606#discussion_r343405585
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/spannerio.py
 ##
 @@ -0,0 +1,531 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Google Cloud Spanner IO
+
+This is an experimental module for reading and writing data from Google Cloud
+Spanner. Visit: https://cloud.google.com/spanner for more details.
+
+To read from Cloud Spanner apply ReadFromSpanner transformation. It will
+return a list, where each element represents an individual row returned from
+the read operation. Both Query and Read APIs are supported.
+
+ReadFromSpanner relies on the ReadOperation objects which is exposed by the
+SpannerIO API. ReadOperation holds the immutable data which is responsible to
+execute batch and naive reads on spanner cloud. This is done for more
 
 Review comment:
   ```suggestion
   execute batch and naive reads on Cloud Spanner. This is done for more
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341633)
Time Spent: 50m  (was: 40m)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-2939) Fn API SDF support

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=341621=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341621
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 12/Nov/19 02:28
Start Date: 12/Nov/19 02:28
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10065: [BEAM-2939, 
BEAM-5600, BEAM-6868, BEAM-7233] Expose residual roots and bundle finalization 
callbacks to portable runners.
URL: https://github.com/apache/beam/pull/10065#issuecomment-552705761
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341621)
Time Spent: 19h 40m  (was: 19.5h)

> Fn API SDF support
> --
>
> Key: BEAM-2939
> URL: https://issues.apache.org/jira/browse/BEAM-2939
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 19h 40m
>  Remaining Estimate: 0h
>
> The Fn API should support streaming SDF. Detailed design TBD.
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=341618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341618
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 12/Nov/19 02:18
Start Date: 12/Nov/19 02:18
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on issue #10070: [BEAM-8575] 
Added a unit test for Reshuffle to test that Reshuffle pr…
URL: https://github.com/apache/beam/pull/10070#issuecomment-552703511
 
 
   R: @robertwb
   
   Added a unit test for Reshuffle to test that Reshuffle preserves timestamps.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341618)
Time Spent: 2h 40m  (was: 2.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=341617=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341617
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 12/Nov/19 02:16
Start Date: 12/Nov/19 02:16
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10070: 
[BEAM-8575] Added a unit test for Reshuffle to test that Reshuffle pr…
URL: https://github.com/apache/beam/pull/10070
 
 
   …eserves timestamps.
   
   **Please** add a meaningful description for your change here
   Added a unit test for Reshuffle to test that Reshuffle preserves timestamps.
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=341612=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341612
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 12/Nov/19 02:02
Start Date: 12/Nov/19 02:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10033: [BEAM-8575] 
Add a trigger test to test Discarding accumulation mode w…
URL: https://github.com/apache/beam/pull/10033
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341612)
Time Spent: 2h 20m  (was: 2h 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=341607=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341607
 ]

ASF GitHub Bot logged work on BEAM-8539:


Author: ASF GitHub Bot
Created on: 12/Nov/19 01:59
Start Date: 12/Nov/19 01:59
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9965: [BEAM-8539] 
Make job state transitions in python-based runners consistent with java-based 
runners
URL: https://github.com/apache/beam/pull/9965
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341607)
Time Spent: 6h 10m  (was: 6h)

> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run - about to submit to executor:  STARTING
>  * run - actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8603) Add Python SqlTransform example script

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8603?focusedWorklogId=341591=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341591
 ]

ASF GitHub Bot logged work on BEAM-8603:


Author: ASF GitHub Bot
Created on: 12/Nov/19 01:48
Start Date: 12/Nov/19 01:48
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10055: 
[BEAM-8603] Add Python SqlTransform example script
URL: https://github.com/apache/beam/pull/10055#discussion_r344985062
 
 

 ##
 File path: sdks/java/extensions/sql/build.gradle
 ##
 @@ -24,6 +24,7 @@ plugins {
 }
 applyJavaNature(
   automaticModuleName: 'org.apache.beam.sdk.extensions.sql',
+  shadowClosure: {},
 
 Review comment:
   @lukecwik I added this so I can build a single jar to pass to Java workers 
for the xlang transform (see the 
`--experiment='jar_packages=./sdks/java/extensions/sql/build/libs/beam-sdks-java-extensions-sql-2.18.0-SNAPSHOT.jar'
 \` line in my "how to run" in the description), but it looks like it's causing 
`:sdks:java:extensions:sql:validateShadedJarDoesntLeakNonProjectClasses` to 
fail spectacularly. Is there some other way I can do this?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341591)
Time Spent: 0.5h  (was: 20m)

> Add Python SqlTransform example script
> --
>
> Key: BEAM-8603
> URL: https://issues.apache.org/jira/browse/BEAM-8603
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (BEAM-8425) Notifying Interactive Beam user about Beam related cell deletion or re-execution

2019-11-11 Thread Ning Kang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang closed BEAM-8425.
---
Fix Version/s: Not applicable
   Resolution: Abandoned

We decide not to notify users but proactively steer the user away from the bad 
situation.

By implementing show() that executes an implicitly built fragment, we mitigate 
the re-execution/deleted cell problems.

> Notifying Interactive Beam user about Beam related cell deletion or 
> re-execution
> 
>
> Key: BEAM-8425
> URL: https://issues.apache.org/jira/browse/BEAM-8425
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: Not applicable
>
>
> There is a general problem about Interactive Notebooks that when an end user 
> deletes a cell that has been executed or re-executes a cell, those previous 
> executions are hidden from the end user.
> However, hidden states will still have side effects in the notebook.
> This kind of problem bothers Beam even more because Beam's pipeline 
> construction statements are note idempotent and pipeline execution is 
> decoupled and deferred from pipeline construction.
> Re-executing a cell with Beam statements that build a pipeline would cause 
> unexpected pipeline state and the user wouldn't notice it due to the problem 
> of notebooks.
> We'll intercept each transform application invocation from the 
> InteractiveRunner and record the ipython/notebook prompt number. Then each 
> time a user executes a cell that applies PTransform, we'll compare the 
> recorded list of prompt numbers with current notebook file's content and 
> figure out if there is any missing number. If so, we know for sure that a 
> re-execution happens and use display manager to notify the end user of 
> potential side effects caused by hidden states of the notebook/ipython.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-7760) Interactive Beam Caching PCollections bound to user defined vars in notebook

2019-11-11 Thread Ning Kang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang resolved BEAM-7760.
-
Fix Version/s: 2.18.0
   Resolution: Fixed

> Interactive Beam Caching PCollections bound to user defined vars in notebook
> 
>
> Key: BEAM-7760
> URL: https://issues.apache.org/jira/browse/BEAM-7760
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> Cache only PCollections bound to user defined variables in a pipeline when 
> running pipeline with interactive runner in jupyter notebooks.
> [Interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
>  has been caching and using caches of "leaf" PCollections for interactive 
> execution in jupyter notebooks.
> The interactive execution is currently supported so that when appending new 
> transforms to existing pipeline for a new run, executed part of the pipeline 
> doesn't need to be re-executed. 
> A PCollection is "leaf" when it is never used as input in any PTransform in 
> the pipeline.
> The problem with building caches and pipeline to execute around "leaf" is 
> that when a PCollection is consumed by a sink with no output, the pipeline to 
> execute built will miss the subgraph generating and consuming that 
> PCollection.
> An example, "ReadFromPubSub --> WirteToPubSub" will result in an empty 
> pipeline.
> Caching around PCollections bound to user defined variables and replacing 
> transforms with source and sink of caches could resolve the pipeline to 
> execute properly under the interactive execution scenario. Also, cached 
> PCollection now can trace back to user code and can be used for user data 
> visualization if user wants to do it.
> E.g.,
> {code:java}
> // ...
> p = beam.Pipeline(interactive_runner.InteractiveRunner(),
>   options=pipeline_options)
> messages = p | "Read" >> beam.io.ReadFromPubSub(subscription='...')
> messages | "Write" >> beam.io.WriteToPubSub(topic_path)
> result = p.run()
> // ...
> visualize(messages){code}
>  The interactive runner automatically figures out that PCollection
> {code:java}
> messages{code}
> created by
> {code:java}
> p | "Read" >> beam.io.ReadFromPubSub(subscription='...'){code}
> should be cached and reused if the notebook user appends more transforms.
>  And once the pipeline gets executed, the user could use any 
> visualize(PCollection) module to visualize the data statically (batch) or 
> dynamically (stream)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-7926) Show PCollection with Interactive Beam

2019-11-11 Thread Ning Kang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-7926:

Description: 
Support auto plotting / charting of materialized data of a given PCollection 
with Interactive Beam.

Say an Interactive Beam pipeline defined as

 
{code:java}
p = beam.Pipeline(InteractiveRunner())
pcoll = p | 'Transform' >> transform()
pcoll2 = ...
pcoll3 = ...{code}
The use can call a single function and get auto-magical charting of the data.

e.g., show(pcoll, pcoll2)

Throughout the process, a pipeline fragment is built to include only transforms 
necessary to produce the desired pcolls (pcoll and pcoll2) and execute that 
fragment.

  was:
Support auto plotting / charting of materialized data of a given PCollection 
with Interactive Beam.

Say an Interactive Beam pipeline defined as

p = create_pipeline()

pcoll = p | 'Transform' >> transform()

The use can call a single function and get auto-magical charting of the data as 
materialized pcoll.

e.g., show(pcoll)


> Show PCollection with Interactive Beam
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 19h 50m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g., show(pcoll, pcoll2)
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=341577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341577
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 12/Nov/19 00:49
Start Date: 12/Nov/19 00:49
Worklog Time Spent: 10m 
  Work Description: dpmills commented on pull request #10035: [BEAM-8581] 
and [BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#discussion_r344973332
 
 

 ##
 File path: sdks/python/apache_beam/testing/util.py
 ##
 @@ -87,27 +88,56 @@ def __repr__(self):
 
 
 def equal_to_per_window(expected_window_to_elements):
-  """Matcher used by assert_that to check on values for specific windows.
+  """Matcher used by assert_that to check to assert expected windows..
+
+  The 'assert_that' statement must have reify_windows=True. This assertion 
works
 
 Review comment:
   Given that the matcher has to know if it expects windows to be reified, can 
we make that happen automatically and remove this setting?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341577)
Time Spent: 40m  (was: 0.5h)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=341576=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341576
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 12/Nov/19 00:49
Start Date: 12/Nov/19 00:49
Worklog Time Spent: 10m 
  Work Description: dpmills commented on pull request #10035: [BEAM-8581] 
and [BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#discussion_r344969372
 
 

 ##
 File path: sdks/python/apache_beam/transforms/trigger.py
 ##
 @@ -211,6 +211,16 @@ def should_fire(self, time_domain, timestamp, window, 
context):
 """
 pass
 
+  @abstractmethod
+  def has_ontime_pane(self):
+"""Whether this trigger creates an empty pane if there are no elements.
 
 Review comment:
   This isn't quite right.  Triggers with this property guarantee that there 
will always be an ON_TIME pane, regardless of whether there are elements in 
that pane
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341576)
Time Spent: 40m  (was: 0.5h)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=341578=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341578
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 12/Nov/19 00:49
Start Date: 12/Nov/19 00:49
Worklog Time Spent: 10m 
  Work Description: dpmills commented on pull request #10035: [BEAM-8581] 
and [BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#discussion_r344969158
 
 

 ##
 File path: sdks/python/apache_beam/transforms/trigger.py
 ##
 @@ -1153,11 +1196,10 @@ def merge(_, to_be_merged, merge_result):  # pylint: 
disable=no-self-argument
 self.trigger_fn.on_element(value, window, context)
 
   # Maybe fire this window.
-  watermark = MIN_TIMESTAMP
-  if self.trigger_fn.should_fire(TimeDomain.WATERMARK, watermark,
+  if self.trigger_fn.should_fire(TimeDomain.WATERMARK, input_watermark,
  window, context):
-finished = self.trigger_fn.on_fire(watermark, window, context)
-yield self._output(window, finished, state, output_watermark, False)
+finished = self.trigger_fn.on_fire(input_watermark, window, context)
+yield self._output(window, finished, state, input_watermark, False)
 
 Review comment:
   Changing this from output_watermark to input_watermark looks like it will 
change which panes get marked EARLY vs ON_TIME.  It's a bit worrying that this 
didn't cause any tests to fail
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341578)
Time Spent: 50m  (was: 40m)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8612) Convert []beam.T to the underlying type []T when passed to a DoFn with universal typed (beam.X) input

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8612?focusedWorklogId=341574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341574
 ]

ASF GitHub Bot logged work on BEAM-8612:


Author: ASF GitHub Bot
Created on: 12/Nov/19 00:45
Start Date: 12/Nov/19 00:45
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #10066: [BEAM-8612] 
Convert []beam.T to the underlying type []T when passed to a universal type.
URL: https://github.com/apache/beam/pull/10066#discussion_r344972612
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/exec/fullvalue.go
 ##
 @@ -151,6 +156,12 @@ func ConvertFn(from, to reflect.Type) func(interface{}) 
interface{} {
}
return ret.Interface()
}
+
+   case typex.IsList(from) && typex.IsUniversal(from.Elem()) && 
typex.IsUniversal(to):
+   return func(v interface{}) interface{} {
+   return Convert(v, to)
 
 Review comment:
   Never call Convert from ConvertFn. The point of ConvertFn is to avoid 
re-computing the type comparisons, and the typex.IsUniversals and similar per 
element.  
   
   Instead, write it the other way around. Convert can call ConvertFn, since 
it's already paying the heavier call cost anyway.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341574)
Time Spent: 20m  (was: 10m)

> Convert []beam.T to the underlying type []T when passed to a DoFn with 
> universal typed (beam.X) input
> -
>
> Key: BEAM-8612
> URL: https://issues.apache.org/jira/browse/BEAM-8612
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Tianyang Hu
>Assignee: Tianyang Hu
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Say there are two DoFn: f1, f2.
> - f1 declares the output type as []beam.T, and each element has the 
> underlying type int.
> - f2 declares the input type as []int
> Passing f1 output to f2 works well. The conversion from []beam.T to []int 
> happens at: 
> https://github.com/apache/beam/blob/c7be0643934a87d73483cf1fd3199a425508b03c/sdks/go/pkg/beam/core/runtime/exec/fullvalue.go#L108
> But it doesn't work if f2 declares the input type as beam.X and type casts it 
> to []int. This is because there's no type conversion when passing []beam.T to 
> beam.X.
> We may consider supporting the above case by converting []beam.T to the 
> underlying type []T when it's passed to a universal type.
> An issue is that if []beam.T is nil or empty, we don't know its underlying 
> element type (unless we know which concrete type beam.T or beam.X is bound 
> to, but this mapping doesn't seem to be kept at runtime?). In such case, we 
> have to pass []beam.T to beam.X as is.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8591) Exception is thrown while running Beam Pipeline on Kubernetes Flink Cluster.

2019-11-11 Thread Mingliang Gong (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971930#comment-16971930
 ] 

Mingliang Gong commented on BEAM-8591:
--

Hi, [~kenn] Could you please help share some information about this issue for 
Apache Beam on Kubernetes Flink? Thanks very much :)

> Exception is thrown while running Beam Pipeline on Kubernetes Flink Cluster.
> 
>
> Key: BEAM-8591
> URL: https://issues.apache.org/jira/browse/BEAM-8591
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Mingliang Gong
>Priority: Major
>
> h2. Setup Clusters
>  * Setup Local Flink Cluster: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]
>  * Setup Kubernetes Flink Cluster using Minikube: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]
> h2. Verify Clusters
> Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both 
> Local and K8S Flink Cluster work fine.
> h2. Using Apache Beam Flink Runner
> Instruction: [https://beam.apache.org/documentation/runners/flink/]
> Sample Pipeline Code:
> {code:java}
> import apache_beam as beam 
> from apache_beam.options.pipeline_options import PipelineOptions
> options = PipelineOptions([
> "--runner=PortableRunner",
> "--job_endpoint=localhost:8099",
> "--environment_type=LOOPBACK"
> ])
> with beam.Pipeline(options=options) as pipeline:
> data = ["Sample data",
> "Sample data - 0",
> "Sample data - 1"]
> raw_data = (pipeline
> | 'CreateHardCodeData' >> beam.Create(data)
> | 'Map' >> beam.Map(lambda line : line + '.')
> | 'Print' >> beam.Map(print)){code}
> Verify different environment_type in Python SDK Harness Configuration
>  *environment_type=LOOPBACK*
>  # Run pipeline on local cluster: Works Fine
>  # Run pipeline on K8S cluster, Exceptions are thrown:
>  java.lang.Exception: The user defined 'open()' method caused an exception: 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception Caused by: 
> org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: localhost/127.0.0.1:51017
> *environment_type=DOCKER*
>  # Run pipeline on local cluster: Work fine
>  # Run pipeline on K8S cluster, Exception are thrown:
>  Caused by: java.io.IOException: Cannot run program "docker": error=2, No 
> such file or directory.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=341567=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341567
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 12/Nov/19 00:30
Start Date: 12/Nov/19 00:30
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r344967383
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -0,0 +1,179 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamHeader
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamRecord
+from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.utils import timestamp
+from apache_beam.utils.timestamp import Timestamp
+
+
+class StreamingCache(object):
+  """Abstraction that holds the logic for reading and writing to cache.
+  """
+  def __init__(self, readers):
+self._readers = readers
+
+  class Reader(object):
+"""Abstraction that reads from PCollection readers.
+
+This class is an Abstraction layer over multiple PCollection readers to be
+used for supplying the Interactive Service with TestStream events.
+
+This class is also responsible for holding the state of the clock, 
injecting
+clock advancement events, and watermark advancement events.
+"""
+def __init__(self, readers):
+  # This timestamp is used as the monotonic clock to order events in the
+  # replay.
+  self._monotonic_clock = timestamp.Timestamp.of(0)
+
+  # The maximum timestamp read.
+  self._target_timestamp = timestamp.Timestamp.of(0)
+
+  # The PCollection cache readers.
+  self._readers = {}
+
+  # The file headers that are metadata for that particular PCollection.
+  self._headers = {}
+
+  # The header allows for metadata about an entire stream, so that the data
+  # isn't copied per record.
+  readers = [r.read() for r in readers]
+  for r in readers:
+header = InteractiveStreamHeader()
+header.ParseFromString(next(r))
+
+# Main PCollections in Beam have a tag as None. Deserializing a Proto
+# with an empty tag becomes an empty string. Here we normalize to what
+# Beam expects.
+self._headers[header.tag if header.tag else None] = header
+self._readers[header.tag if header.tag else None] = r
+
+  # The watermarks per tag. Useful for introspection in the stream.
+  self._watermarks = {tag: timestamp.MIN_TIMESTAMP for tag in 
self._headers}
+
+  # The most recently read timestamp per tag.
+  self._stream_times = {tag: timestamp.MIN_TIMESTAMP
+for tag in self._headers}
+
+def _read_next(self):
+  """Reads the next iteration of elements from each stream.
+  """
+  records = []
+  for tag, r in self._readers.items():
+# The target_timestamp is the maximum timestamp that was read from the
+# stream. Some readers may have elements that are less than this. Thus,
+# we skip all readers that already have elements that are at this
+# timestamp so that we don't read everything into memory.
+if self._stream_times[tag] >= self._target_timestamp:
+  continue
+try:
+  record = InteractiveStreamRecord()
+  record.ParseFromString(next(r))
+  records.append((tag, record))
+  self._stream_times[tag] = Timestamp.from_proto(
+  record.processing_time)
+except StopIteration:
+  pass
+  return records
+
+def read(self):
+  """Reads records from PCollection readers.
+  """
+  # We use a generator here because the underlying readers may have to much
+  # data to read into memory.
+
+  events = []
+  while True:
+# Read the

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=341568=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341568
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 12/Nov/19 00:30
Start Date: 12/Nov/19 00:30
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r344969049
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -0,0 +1,124 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamHeader
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamRecord
+from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.utils import timestamp
+from apache_beam.utils.timestamp import Timestamp
+
+
+class StreamingCache(object):
+  """Abstraction that holds the logic for reading and writing to cache.
+  """
+  def __init__(self, readers):
+self._readers = readers
+
+  class Reader(object):
+"""Abstraction that reads from PCollection readers.
+
+This class is an Abstraction layer over multiple PCollection readers to be
+used for supplying the Interactive Service with TestStream events.
+
+This class is also responsible for holding the state of the clock, 
injecting
+clock advancement events, and watermark advancement events.
+"""
+def __init__(self, readers):
+  self._timestamp = Timestamp.of(0)
+  self._readers = {}
+  self._headers = {}
+  readers = [r.read() for r in readers]
+
+  # The header allows for metadata about an entire stream, so that the data
+  # isn't copied per record.
+  for r in readers:
+header = InteractiveStreamHeader()
+header.ParseFromString(next(r))
+
+# Main PCollections in Beam have a tag as None. Deserializing a Proto
+# with an empty tag becomes an empty string. Here we normalize to what
+# Beam expects.
+self._headers[header.tag if header.tag else None] = header
+self._readers[header.tag if header.tag else None] = r
+
+  self._watermarks = {tag: timestamp.MIN_TIMESTAMP for tag in 
self._headers}
+
+def read(self):
+  """Reads records from PCollection readers.
+  """
+  records = {}
+  for tag, r in self._readers.items():
 
 Review comment:
   I'm still having  hard time convincing myself that the logic between 
_read_next() and read() is correct. 
   
   What you're really trying to do is perform a merge sort of the reader on 
timestamp. For readability and maintainability, it would probably be best 
expressed that way. Whether these are Events that already have the tag embedded 
(I'm assuming the backing file format will be compressed, so little to no 
overhead) or the input is a dict of {tag: iterator} and the tags applied when 
merging depends on which is easier (e.g. when writing). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341568)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will

[jira] [Work logged] (BEAM-8612) Convert []beam.T to the underlying type []T when passed to a DoFn with universal typed (beam.X) input

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8612?focusedWorklogId=341562=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341562
 ]

ASF GitHub Bot logged work on BEAM-8612:


Author: ASF GitHub Bot
Created on: 12/Nov/19 00:22
Start Date: 12/Nov/19 00:22
Worklog Time Spent: 10m 
  Work Description: htyleo commented on pull request #10066: [BEAM-8612] 
Convert []beam.T to the underlying type []T when passed to a universal type.
URL: https://github.com/apache/beam/pull/10066
 
 
   Did some minor refactoring as well, but unsure if there's efficiency concern.
   
   R: @lostluck 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=341556=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341556
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 12/Nov/19 00:16
Start Date: 12/Nov/19 00:16
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r344964485
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -0,0 +1,179 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamHeader
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamRecord
+from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.utils import timestamp
+from apache_beam.utils.timestamp import Timestamp
+
+
+class StreamingCache(object):
+  """Abstraction that holds the logic for reading and writing to cache.
+  """
+  def __init__(self, readers):
+self._readers = readers
+
+  class Reader(object):
+"""Abstraction that reads from PCollection readers.
+
+This class is an Abstraction layer over multiple PCollection readers to be
+used for supplying the Interactive Service with TestStream events.
+
+This class is also responsible for holding the state of the clock, 
injecting
+clock advancement events, and watermark advancement events.
+"""
+def __init__(self, readers):
+  # This timestamp is used as the monotonic clock to order events in the
+  # replay.
+  self._monotonic_clock = timestamp.Timestamp.of(0)
+
+  # The maximum timestamp read.
+  self._target_timestamp = timestamp.Timestamp.of(0)
+
+  # The PCollection cache readers.
+  self._readers = {}
+
+  # The file headers that are metadata for that particular PCollection.
+  self._headers = {}
+
+  # The header allows for metadata about an entire stream, so that the data
+  # isn't copied per record.
+  readers = [r.read() for r in readers]
+  for r in readers:
+header = InteractiveStreamHeader()
+header.ParseFromString(next(r))
+
+# Main PCollections in Beam have a tag as None. Deserializing a Proto
+# with an empty tag becomes an empty string. Here we normalize to what
+# Beam expects.
+self._headers[header.tag if header.tag else None] = header
+self._readers[header.tag if header.tag else None] = r
+
+  # The watermarks per tag. Useful for introspection in the stream.
+  self._watermarks = {tag: timestamp.MIN_TIMESTAMP for tag in 
self._headers}
+
+  # The most recently read timestamp per tag.
+  self._stream_times = {tag: timestamp.MIN_TIMESTAMP
+for tag in self._headers}
+
+def _read_next(self):
+  """Reads the next iteration of elements from each stream.
+  """
+  records = []
+  for tag, r in self._readers.items():
+# The target_timestamp is the maximum timestamp that was read from the
+# stream. Some readers may have elements that are less than this. Thus,
+# we skip all readers that already have elements that are at this
+# timestamp so that we don't read everything into memory.
+if self._stream_times[tag] >= self._target_timestamp:
+  continue
+try:
+  record = InteractiveStreamRecord()
+  record.ParseFromString(next(r))
+  records.append((tag, record))
+  self._stream_times[tag] = Timestamp.from_proto(
+  record.processing_time)
+except StopIteration:
+  pass
+  return records
+
+def read(self):
+  """Reads records from PCollection readers.
+  """
+  # We use a generator here because the underlying readers may have to much
+  # data to read into memory.
+
+  events = []
+  while True:
+# Read the

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=341558=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341558
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 12/Nov/19 00:16
Start Date: 12/Nov/19 00:16
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r344962364
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+}
+
+message EventsRequest { }
+message EventsResponse {
+  // The TestStreamPayloads that will be sent to the TestStream.
+  repeated org.apache.beam.model.pipeline.v1.TestStreamPayload.Event events = 
1;
+
+  // Is true when there are no more events to read.
+  bool end_of_stream = 2;
+}
+
+// The first record to be read in an interactive stream. This contains metadata
+// about the stream and how to properly process it.
+message InteractiveStreamHeader {
+  // The PCollection tag this stream is associated with.
+  string tag = 1;
+}
+
+// A record is a recorded element that sound source produced. Its function is
+// to give enough information to the InteractiveService to create a faithful
+// recreation of the original source of data.
+message InteractiveStreamRecord {
 
 Review comment:
   No, I'm saying that "repeated InteractiveStreamRecord" is isomorphic to 
"repeated TestStreamPayload.Events." So let your file be a sequence of 
TestStreamPayload.Events rather than make a new proto. This will save having to 
do conversions below as well. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341558)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 20.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-8614) Expose SDK harness status to Runner through FnApi

2019-11-11 Thread Yichi Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yichi Zhang reassigned BEAM-8614:
-

Assignee: Yichi Zhang

> Expose SDK harness status to Runner through FnApi
> -
>
> Key: BEAM-8614
> URL: https://issues.apache.org/jira/browse/BEAM-8614
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-harness, sdk-py-harness
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>
> Expose SDK harness debug infomation to runner for better debuggability of SDK 
> harness running with beam fn api.
>  
> doc: 
> [https://docs.google.com/document/d/1W77buQtdSEIPUKd9zemAM38fb-x3CvOoaTF4P2mSxmI/edit#heading=h.mersh3vo53ar]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=341560=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341560
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 12/Nov/19 00:16
Start Date: 12/Nov/19 00:16
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r344966778
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
 
 Review comment:
   Whether this is streaming or not has nothing to do with quiescence. By 
spreading the call across several RPC calls the service has become stateful, 
more complicated, and idempotent. Much better for the RPC to be "give me the 
events of this test stream" than have a (one-time-use) multi-RPC protocol to 
fetch them. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341560)
Time Spent: 20h 40m  (was: 20.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 20h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=341559=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341559
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 12/Nov/19 00:16
Start Date: 12/Nov/19 00:16
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r344964732
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache_test.py
 ##
 @@ -0,0 +1,167 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import unittest
+
+from google.protobuf import timestamp_pb2
+
+from apache_beam import coders
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamHeader
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamRecord
+from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.runners.interactive.caching.streaming_cache import 
StreamingCache
+from apache_beam.utils import timestamp
+
+
+def to_timestamp_proto(timestamp_secs):
+  """Converts seconds since epoch to a google.protobuf.Timestamp.
+  """
+  seconds = int(timestamp_secs)
+  nanos = int((timestamp_secs - seconds) * 10**9)
+  return timestamp_pb2.Timestamp(seconds=seconds, nanos=nanos)
+
+
+class InMemoryReader(object):
+  def __init__(self, tag=None):
+self._records = [InteractiveStreamHeader(tag=tag).SerializeToString()]
 
 Review comment:
   The point stands whether or not we'll merge with PCollectionCache later. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341559)
Time Spent: 20h 40m  (was: 20.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 20h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=341555=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341555
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 12/Nov/19 00:16
Start Date: 12/Nov/19 00:16
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r344961488
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+
+
+service InteractiveService {
 
 Review comment:
   I disagree, there are a number of reasons one may want to externalize data 
stream for tests (e.g. size). It's also cleaner for dependencies (code and 
concepts) to go only one direction (e.g. interactive clearly depends on test 
stream, so don't have TestStream have part of its definition reference 
Interactive. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341555)
Time Spent: 20h 20m  (was: 20h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 20h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8614) Expose SDK harness status to Runner through FnApi

2019-11-11 Thread Yichi Zhang (Jira)

Yichi Zhang created BEAM-8614:
-

 Summary: Expose SDK harness status to Runner through FnApi
 Key: BEAM-8614
 URL: https://issues.apache.org/jira/browse/BEAM-8614
 Project: Beam
  Issue Type: New Feature
  Components: sdk-java-harness, sdk-py-harness
Reporter: Yichi Zhang


Expose SDK harness debug infomation to runner for better debuggability of SDK 
harness running with beam fn api.

 

doc: 

[https://docs.google.com/document/d/1W77buQtdSEIPUKd9zemAM38fb-x3CvOoaTF4P2mSxmI/edit#heading=h.mersh3vo53ar]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=341561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341561
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 12/Nov/19 00:16
Start Date: 12/Nov/19 00:16
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r344965522
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/interactive_stream.py
 ##
 @@ -0,0 +1,84 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+from concurrent.futures import ThreadPoolExecutor
+
+import grpc
+
+from apache_beam.portability.api import beam_interactive_api_pb2
+from apache_beam.portability.api import beam_interactive_api_pb2_grpc
+from apache_beam.portability.api.beam_interactive_api_pb2_grpc import 
InteractiveServiceServicer
+
+
+class InteractiveStreamController(InteractiveServiceServicer):
+  def __init__(self, streaming_cache, endpoint=None):
+self._server = grpc.server(ThreadPoolExecutor(max_workers=10))
+
+if endpoint:
+  self.endpoint = endpoint
+  self._server.add_insecure_port(self.endpoint)
+else:
+  port = self._server.add_insecure_port('[::]:0')
+  self.endpoint = '[::]:{}'.format(port)
+
+beam_interactive_api_pb2_grpc.add_InteractiveServiceServicer_to_server(
+self, self._server)
+self._streaming_cache = streaming_cache
+
+self._sessions = {}
+self._session_id = 0
+
+  def start(self):
+self._server.start()
+self._reader = self._streaming_cache.reader()
+
+  def stop(self):
+self._server.stop(0)
+self._server.wait_for_termination()
+
+  def Connect(self, request, context):
+"""Starts a session.
+
+Callers should use the returned session id in all future requests.
+"""
+session_id = str(self._session_id)
+self._session_id += 1
+self._sessions[session_id] = self._streaming_cache.reader().read()
+return beam_interactive_api_pb2.ConnectResponse(session_id=session_id)
+
+  def Events(self, request, context):
+"""Returns the next event from the streaming cache.
+
+Token behavior: the first request should have a token of "None". Each
+subsequent request should use the previously received token from the
+response. The stream ends when the returned token is the empty string.
+"""
+assert request.session_id in self._sessions, (\
+'Session "{}" was not found. Did you forget to call Connect ' +
+'first?').format(request.session_id)
+
+reader = self._sessions[request.session_id]
+token = (int(request.token) if request.token else 0) + 1
 
 Review comment:
   There is no checking done here whether the token is actually sequential, nor 
any support for multiple callers of this API. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341561)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 20h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=341557=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341557
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 12/Nov/19 00:16
Start Date: 12/Nov/19 00:16
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r344964336
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -0,0 +1,124 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamHeader
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamRecord
+from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.utils import timestamp
+from apache_beam.utils.timestamp import Timestamp
+
+
+class StreamingCache(object):
+  """Abstraction that holds the logic for reading and writing to cache.
+  """
+  def __init__(self, readers):
+self._readers = readers
+
+  class Reader(object):
+"""Abstraction that reads from PCollection readers.
+
+This class is an Abstraction layer over multiple PCollection readers to be
+used for supplying the Interactive Service with TestStream events.
+
+This class is also responsible for holding the state of the clock, 
injecting
+clock advancement events, and watermark advancement events.
+"""
+def __init__(self, readers):
+  self._timestamp = Timestamp.of(0)
+  self._readers = {}
+  self._headers = {}
+  readers = [r.read() for r in readers]
+
+  # The header allows for metadata about an entire stream, so that the data
+  # isn't copied per record.
+  for r in readers:
+header = InteractiveStreamHeader()
+header.ParseFromString(next(r))
+
+# Main PCollections in Beam have a tag as None. Deserializing a Proto
+# with an empty tag becomes an empty string. Here we normalize to what
+# Beam expects.
+self._headers[header.tag if header.tag else None] = header
+self._readers[header.tag if header.tag else None] = r
+
+  self._watermarks = {tag: timestamp.MIN_TIMESTAMP for tag in 
self._headers}
+
+def read(self):
+  """Reads records from PCollection readers.
+  """
+  records = {}
+  for tag, r in self._readers.items():
+try:
+  record = InteractiveStreamRecord()
+  record.ParseFromString(next(r))
+  records[tag] = record
+except StopIteration:
+  pass
+
+  events = []
+  if not records:
+self.advance_watermark(timestamp.MAX_TIMESTAMP, events)
+
+  records = sorted(records.items(), key=lambda x: x[1].processing_time)
+  for tag, r in records:
+# We always send the processing time event first so that the TestStream
+# can sleep so as to emulate the original stream.
+self.advance_processing_time(
+Timestamp.from_proto(r.processing_time), events)
+self.advance_watermark(Timestamp.from_proto(r.watermark), events,
+   tag=tag)
+
+events.append(TestStreamPayload.Event(
+element_event=TestStreamPayload.Event.AddElements(
+elements=[r.element], tag=tag)))
+  return events
+
+def advance_processing_time(self, processing_time, events):
+  """Advances the internal clock state and injects an AdvanceProcessingTime
+ event.
+  """
+  if self._timestamp != processing_time:
+duration = timestamp.Duration(
+micros=processing_time.micros - self._timestamp.micros)
+
+self._timestamp = processing_time
+processing_time_event = TestStreamPayload.Event.AdvanceProcessingTime(
+advance_duration=duration.micros)
+events.append(TestStreamPayload.Event(
+

[jira] [Work logged] (BEAM-2939) Fn API SDF support

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=341553=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341553
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 12/Nov/19 00:12
Start Date: 12/Nov/19 00:12
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10065: [BEAM-2939, 
BEAM-6868, BEAM-7233] Expose residual roots and bundle finalization callbacks 
to portable runners.
URL: https://github.com/apache/beam/pull/10065#issuecomment-552675381
 
 
   R: @boyuanzz 
   CC: @mxm @tweise @iemejia 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341553)
Time Spent: 19.5h  (was: 19h 20m)

> Fn API SDF support
> --
>
> Key: BEAM-2939
> URL: https://issues.apache.org/jira/browse/BEAM-2939
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 19.5h
>  Remaining Estimate: 0h
>
> The Fn API should support streaming SDF. Detailed design TBD.
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-5600) Splitting for SplittableDoFn should be exposed within runner shared libraries

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-5600?focusedWorklogId=341554=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341554
 ]

ASF GitHub Bot logged work on BEAM-5600:


Author: ASF GitHub Bot
Created on: 12/Nov/19 00:12
Start Date: 12/Nov/19 00:12
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10045: [BEAM-5600, 
BEAM-2939] Add SplittableParDo expansion logic to runner's core.
URL: https://github.com/apache/beam/pull/10045
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341554)
Time Spent: 1h 10m  (was: 1h)

> Splitting for SplittableDoFn should be exposed within runner shared libraries
> -
>
> Key: BEAM-5600
> URL: https://issues.apache.org/jira/browse/BEAM-5600
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Priority: Major
>  Labels: portability
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-2939) Fn API SDF support

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=341552=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341552
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 12/Nov/19 00:11
Start Date: 12/Nov/19 00:11
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10065: [BEAM-2939, 
BEAM-6868, BEAM-7233] Expose residual roots and bundle finalization callbacks 
to portable runners.
URL: https://github.com/apache/beam/pull/10065
 
 
   This is towards supporting bundle finalization and SDK initiated self 
checkpointing of bundles for portable runners.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=341548=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341548
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 11/Nov/19 23:55
Start Date: 11/Nov/19 23:55
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9892: [BEAM-8427] [SQL] 
buildIOWrite for MongoDb Table
URL: https://github.com/apache/beam/pull/9892#issuecomment-552671228
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341548)
Time Spent: 7h 50m  (was: 7h 40m)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=341547=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341547
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 11/Nov/19 23:55
Start Date: 11/Nov/19 23:55
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9892: [BEAM-8427] [SQL] 
buildIOWrite for MongoDb Table
URL: https://github.com/apache/beam/pull/9892#issuecomment-552671228
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341547)
Time Spent: 7h 40m  (was: 7.5h)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8613) Add environment variable support to Docker environment

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8613?focusedWorklogId=341545=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341545
 ]

ASF GitHub Bot logged work on BEAM-8613:


Author: ASF GitHub Bot
Created on: 11/Nov/19 23:51
Start Date: 11/Nov/19 23:51
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #10064: [BEAM-8613] Add 
environment variable support to Docker environment
URL: https://github.com/apache/beam/pull/10064#issuecomment-552670434
 
 
   R: @robertwb 
   R: @lukecwik 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341545)
Time Spent: 50m  (was: 40m)

> Add environment variable support to Docker environment
> --
>
> Key: BEAM-8613
> URL: https://issues.apache.org/jira/browse/BEAM-8613
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution, runner-core, runner-direct
>Reporter: Nathan Rusch
>Priority: Trivial
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The Process environment allows specifying environment variables via a map 
> field on its payload message. The Docker environment should support this same 
> pattern, and forward the contents of the map through to the container runtime.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8613) Add environment variable support to Docker environment

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8613?focusedWorklogId=341544=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341544
 ]

ASF GitHub Bot logged work on BEAM-8613:


Author: ASF GitHub Bot
Created on: 11/Nov/19 23:48
Start Date: 11/Nov/19 23:48
Worklog Time Spent: 10m 
  Work Description: nrusch commented on pull request #10064: [BEAM-8613] 
Add environment variable support to Docker environment
URL: https://github.com/apache/beam/pull/10064#discussion_r344960620
 
 

 ##
 File path: sdks/python/apache_beam/transforms/environments.py
 ##
 @@ -37,6 +37,11 @@
'SubprocessSDKEnvironment', 'RunnerAPIEnvironmentHolder']
 
 
+def _looks_like_json(config_string):
+  import re
+  return re.match(r'\s*\{.*\}\s*$', config_string)
 
 Review comment:
   Ah, good call. Updated.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341544)
Time Spent: 40m  (was: 0.5h)

> Add environment variable support to Docker environment
> --
>
> Key: BEAM-8613
> URL: https://issues.apache.org/jira/browse/BEAM-8613
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution, runner-core, runner-direct
>Reporter: Nathan Rusch
>Priority: Trivial
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The Process environment allows specifying environment variables via a map 
> field on its payload message. The Docker environment should support this same 
> pattern, and forward the contents of the map through to the container runtime.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8557) Clean up useless null check.

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8557?focusedWorklogId=341542=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341542
 ]

ASF GitHub Bot logged work on BEAM-8557:


Author: ASF GitHub Bot
Created on: 11/Nov/19 23:45
Start Date: 11/Nov/19 23:45
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9991: 
[BEAM-8557]Add log for the dropped unknown response
URL: https://github.com/apache/beam/pull/9991#discussion_r344959866
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java
 ##
 @@ -158,6 +158,8 @@ public void onNext(BeamFnApi.InstructionResponse response) 
{
   "Error received from SDK harness for instruction %s: %s",
   response.getInstructionId(), response.getError(;
 }
+  } else {
+LOG.warn("Dropped unknown InstructionResponse {}", response);
 
 Review comment:
   Tiny style suggestion: move the error case to the top. Reduce one level of 
curly braces.
   
   ```java
   if (responseFuture == null) {
 LOG.warn(...)
 return;
   }
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341542)
Time Spent: 2h  (was: 1h 50m)

> Clean up useless null check.
> 
>
> Key: BEAM-8557
> URL: https://issues.apache.org/jira/browse/BEAM-8557
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core, sdk-java-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> I think we do not need null check here:
> [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]
> Because before the the `onNext` call, the `Future` already put into the queue 
> in `handle` method.
>  
> I found the test as follows:
> {code:java}
>  @Test
>  public void testUnknownResponseIgnored() throws Exception{code}
> I do not know why we need test this case? I think it would be better if we 
> throw the Exception for an UnknownResponse, otherwise, this may hidden a 
> potential bug. 
> Please correct me if there anything I misunderstand @kennknowles
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8579) Strip UTF-8 BOM bytes (if present) in TextSource.

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8579?focusedWorklogId=341540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341540
 ]

ASF GitHub Bot logged work on BEAM-8579:


Author: ASF GitHub Bot
Created on: 11/Nov/19 23:44
Start Date: 11/Nov/19 23:44
Worklog Time Spent: 10m 
  Work Description: cmm08 commented on issue #10046: [BEAM-8579] Strip 
UTF-8 BOM from TextSource output.
URL: https://github.com/apache/beam/pull/10046#issuecomment-552668608
 
 
   Opened https://issues.apache.org/jira/browse/BEAM-8611 to move the test 
cases.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341540)
Time Spent: 1h 20m  (was: 1h 10m)

> Strip UTF-8 BOM bytes (if present) in TextSource.
> -
>
> Key: BEAM-8579
> URL: https://issues.apache.org/jira/browse/BEAM-8579
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-text
>Affects Versions: 2.15.0
>Reporter: Changming Ma
>Assignee: Changming Ma
>Priority: Minor
> Fix For: 2.18.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> TextSource in the org.apache.beam.sdk.io package can handle UTF-8 encoded 
> files, and when the file contains byte order mark (BOM), it will preserve it 
> in the output. According to Unicode standard 
> ([http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf|https://www.google.com/url?q=http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf=D=AFQjCNF_PW0McUUnM1UrvZSIwgvAj1uUKw]):
>  "Use of a BOM is neither required nor recommended for UTF-8". UTF-8 with a 
> BOM will also be a potential problem for some Java implementations (e.g., 
> [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4508058|https://www.google.com/url?q=https://bugs.java.com/bugdatabase/view_bug.do?bug_id%3D4508058=D=AFQjCNEdT7vUK99N5bxQc9fkCt-uIG2v7Q]).
>  As a general practice, it's suggested to use UTF-8 without BOM.
> Proposal: remove BOM bytes in the output from TextSource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8613) Add environment variable support to Docker environment

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8613?focusedWorklogId=341541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341541
 ]

ASF GitHub Bot logged work on BEAM-8613:


Author: ASF GitHub Bot
Created on: 11/Nov/19 23:44
Start Date: 11/Nov/19 23:44
Worklog Time Spent: 10m 
  Work Description: nrusch commented on pull request #10064: [BEAM-8613] 
Add environment variable support to Docker environment
URL: https://github.com/apache/beam/pull/10064#discussion_r344959562
 
 

 ##
 File path: sdks/python/apache_beam/transforms/environments.py
 ##
 @@ -138,20 +145,33 @@ def __hash__(self):
 return hash((self.__class__, self.container_image))
 
   def __repr__(self):
-return 'DockerEnvironment(container_image=%s)' % self.container_image
+return 'DockerEnvironment(container_image=%s,env=%s)' % (
 
 Review comment:
   I was matching the prevailing style of `ProcessEnvironment` and 
`ExternalEnvironment`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341541)
Time Spent: 0.5h  (was: 20m)

> Add environment variable support to Docker environment
> --
>
> Key: BEAM-8613
> URL: https://issues.apache.org/jira/browse/BEAM-8613
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution, runner-core, runner-direct
>Reporter: Nathan Rusch
>Priority: Trivial
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The Process environment allows specifying environment variables via a map 
> field on its payload message. The Docker environment should support this same 
> pattern, and forward the contents of the map through to the container runtime.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-7824) Set a default environment for Python SDK jobs for Dataflow runner

2019-11-11 Thread Chamikara Madhusanka Jayalath (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Madhusanka Jayalath resolved BEAM-7824.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> Set a default environment for Python SDK jobs for Dataflow runner
> -
>
> Key: BEAM-7824
> URL: https://issues.apache.org/jira/browse/BEAM-7824
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently default environment is set to empty. We should change the default 
> environment to urn: beam:env:docker:v1 and payload to a DockerPayload where 
> container_image is set to container image used by Dataflow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8613) Add environment variable support to Docker environment

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8613?focusedWorklogId=341538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341538
 ]

ASF GitHub Bot logged work on BEAM-8613:


Author: ASF GitHub Bot
Created on: 11/Nov/19 23:40
Start Date: 11/Nov/19 23:40
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10064: [BEAM-8613] 
Add environment variable support to Docker environment
URL: https://github.com/apache/beam/pull/10064#discussion_r344958406
 
 

 ##
 File path: sdks/python/apache_beam/transforms/environments.py
 ##
 @@ -37,6 +37,11 @@
'SubprocessSDKEnvironment', 'RunnerAPIEnvironmentHolder']
 
 
+def _looks_like_json(config_string):
+  import re
+  return re.match(r'\s*\{.*\}\s*$', config_string)
 
 Review comment:
   I know you're not responsible for this function, but I think it should be
   
   ```python
   return re.match(r'\s*\{.*\}\s*$', config_string) is not None
   ```
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341538)
Time Spent: 20m  (was: 10m)

> Add environment variable support to Docker environment
> --
>
> Key: BEAM-8613
> URL: https://issues.apache.org/jira/browse/BEAM-8613
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution, runner-core, runner-direct
>Reporter: Nathan Rusch
>Priority: Trivial
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Process environment allows specifying environment variables via a map 
> field on its payload message. The Docker environment should support this same 
> pattern, and forward the contents of the map through to the container runtime.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8613) Add environment variable support to Docker environment

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8613?focusedWorklogId=341537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341537
 ]

ASF GitHub Bot logged work on BEAM-8613:


Author: ASF GitHub Bot
Created on: 11/Nov/19 23:40
Start Date: 11/Nov/19 23:40
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10064: [BEAM-8613] 
Add environment variable support to Docker environment
URL: https://github.com/apache/beam/pull/10064#discussion_r344958730
 
 

 ##
 File path: sdks/python/apache_beam/transforms/environments.py
 ##
 @@ -203,7 +223,7 @@ def from_runner_api_parameter(payload, context):
   def from_options(cls, options):
 config = json.loads(options.environment_config)
 return cls(config.get('command'), os=config.get('os', ''),
-   arch=config.get('arch', ''), env=config.get('env', ''))
+   arch=config.get('arch', ''), env=config.get('env'))
 
 Review comment:
   yeah, I have a fix for this in my typing branch.   fine to fix it here as 
long as the reviewers agree. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341537)
Time Spent: 20m  (was: 10m)

> Add environment variable support to Docker environment
> --
>
> Key: BEAM-8613
> URL: https://issues.apache.org/jira/browse/BEAM-8613
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution, runner-core, runner-direct
>Reporter: Nathan Rusch
>Priority: Trivial
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Process environment allows specifying environment variables via a map 
> field on its payload message. The Docker environment should support this same 
> pattern, and forward the contents of the map through to the container runtime.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8613) Add environment variable support to Docker environment

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8613?focusedWorklogId=341539=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341539
 ]

ASF GitHub Bot logged work on BEAM-8613:


Author: ASF GitHub Bot
Created on: 11/Nov/19 23:40
Start Date: 11/Nov/19 23:40
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10064: [BEAM-8613] 
Add environment variable support to Docker environment
URL: https://github.com/apache/beam/pull/10064#discussion_r344958092
 
 

 ##
 File path: sdks/python/apache_beam/transforms/environments.py
 ##
 @@ -138,20 +145,33 @@ def __hash__(self):
 return hash((self.__class__, self.container_image))
 
   def __repr__(self):
-return 'DockerEnvironment(container_image=%s)' % self.container_image
+return 'DockerEnvironment(container_image=%s,env=%s)' % (
 
 Review comment:
   space after the comma, I think
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341539)
Time Spent: 20m  (was: 10m)

> Add environment variable support to Docker environment
> --
>
> Key: BEAM-8613
> URL: https://issues.apache.org/jira/browse/BEAM-8613
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution, runner-core, runner-direct
>Reporter: Nathan Rusch
>Priority: Trivial
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Process environment allows specifying environment variables via a map 
> field on its payload message. The Docker environment should support this same 
> pattern, and forward the contents of the map through to the container runtime.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8613) Add environment variable support to Docker environment

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8613?focusedWorklogId=341535=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341535
 ]

ASF GitHub Bot logged work on BEAM-8613:


Author: ASF GitHub Bot
Created on: 11/Nov/19 23:33
Start Date: 11/Nov/19 23:33
Worklog Time Spent: 10m 
  Work Description: nrusch commented on pull request #10064: [BEAM-8613] 
Add environment variable support to Docker environment
URL: https://github.com/apache/beam/pull/10064
 
 
   This extends the Docker environment to support passing a JSON configuration 
string, similar to the Process and External environments, in order to 
facilitate the passing of environment variables to the docker runtime. As part 
of this change, the `DockerPayload` grows a new `env` map field.
   
   If a JSON configuration string is used to construct a Docker environment, it 
must include a `docker_image` key, and can optionally include an `env` 
sub-object representing an environment variable map. These variables will be 
forwarded to the `docker` command via `--env` flags.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [X] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [X] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [X] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build

[jira] [Created] (BEAM-8613) Add environment variable support to Docker environment

2019-11-11 Thread Nathan Rusch (Jira)

Nathan Rusch created BEAM-8613:
--

 Summary: Add environment variable support to Docker environment
 Key: BEAM-8613
 URL: https://issues.apache.org/jira/browse/BEAM-8613
 Project: Beam
  Issue Type: Improvement
  Components: java-fn-execution, runner-core, runner-direct
Reporter: Nathan Rusch


The Process environment allows specifying environment variables via a map field 
on its payload message. The Docker environment should support this same 
pattern, and forward the contents of the map through to the container runtime.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=341533=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341533
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 11/Nov/19 23:20
Start Date: 11/Nov/19 23:20
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #9923: [BEAM-7389] Add 
code snippets for Count
URL: https://github.com/apache/beam/pull/9923#issuecomment-552661940
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341533)
Time Spent: 76h  (was: 75h 50m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 76h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=341534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341534
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 11/Nov/19 23:20
Start Date: 11/Nov/19 23:20
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #9923: [BEAM-7389] Add 
code snippets for Count
URL: https://github.com/apache/beam/pull/9923#issuecomment-552661940
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341534)
Time Spent: 76h 10m  (was: 76h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 76h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=341532=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341532
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 11/Nov/19 23:19
Start Date: 11/Nov/19 23:19
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #10035: [BEAM-8581] and 
[BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#issuecomment-552661625
 
 
   This is blocking https://github.com/apache/beam/pull/9953
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341532)
Time Spent: 0.5h  (was: 20m)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=341531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341531
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 11/Nov/19 23:16
Start Date: 11/Nov/19 23:16
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9953: 
[BEAM-8335] Adds support for multi-output TestStream
URL: https://github.com/apache/beam/pull/9953#discussion_r344952821
 
 

 ##
 File path: sdks/python/apache_beam/pvalue.py
 ##
 @@ -201,6 +201,43 @@ class PDone(PValue):
   pass
 
 
+class PTuple(object):
+  """An object grouping multiple PCollections.
+
+  This class is useful for returning a named tuple of PCollections from a
+  composite.
+  """
+
+  def __init__(self, pcoll_dict):
+"""Initializes this named tuple with a dictionary of tagged PCollections.
+"""
+self._pcolls = pcoll_dict
+
+  def __str__(self):
+return '<%s>' % self._str_internal()
 
 Review comment:
   Removed this file and changed output to dict.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341531)
Time Spent: 20h 10m  (was: 20h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 20h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8612) Convert []beam.T to the underlying type []T when passed to a DoFn with universal typed (beam.X) input

2019-11-11 Thread Tianyang Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianyang Hu updated BEAM-8612:
--
Description: 
Say there are two DoFn: f1, f2.
- f1 declares the output type as []beam.T, and each element has the underlying 
type int.
- f2 declares the input type as []int

Passing f1 output to f2 works well. The conversion from []beam.T to []int 
happens at: 
https://github.com/apache/beam/blob/c7be0643934a87d73483cf1fd3199a425508b03c/sdks/go/pkg/beam/core/runtime/exec/fullvalue.go#L108

But it doesn't work if f2 declares the input type as beam.X and type casts it 
to []int. This is because there's no type conversion when passing []beam.T to 
beam.X.

We may consider supporting the above case by converting []beam.T to the 
underlying type []T when it's passed to a universal type.

An issue is that if []beam.T is nil or empty, we don't know its underlying 
element type (unless we know which concrete type beam.T or beam.X is bound to, 
but this mapping doesn't seem to be kept at runtime?). In such case, we have to 
pass []beam.T to beam.X as is.

  was:
Say there are two DoFn: f1, f2.
- f1 declares the output type as []beam.T, and each element has the underlying 
type int.
- f2 declares the input type as []int

Passing f1 output to f2 works well. The conversion from []beam.T to []int 
happens at: 
https://github.com/apache/beam/blob/c7be0643934a87d73483cf1fd3199a425508b03c/sdks/go/pkg/beam/core/runtime/exec/fullvalue.go#L108

But it doesn't work if f2 declares the input type as beam.X and type casts it 
to []int. This is because there's no type conversion when passing []beam.T to 
beam.X.

We may consider supporting the above case by converting []beam.T to the 
underlying type []T when it's passed to a universal type.

An issue is that if []beam.T is nil or empty, we don't know its underlying 
element type (unless we know which concrete type beam.T or beam.X is bound to, 
which doesn't seem to be kept at runtime?). In such case, we have to pass 
[]beam.T to beam.X as is.


> Convert []beam.T to the underlying type []T when passed to a DoFn with 
> universal typed (beam.X) input
> -
>
> Key: BEAM-8612
> URL: https://issues.apache.org/jira/browse/BEAM-8612
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Tianyang Hu
>Assignee: Tianyang Hu
>Priority: Minor
>
> Say there are two DoFn: f1, f2.
> - f1 declares the output type as []beam.T, and each element has the 
> underlying type int.
> - f2 declares the input type as []int
> Passing f1 output to f2 works well. The conversion from []beam.T to []int 
> happens at: 
> https://github.com/apache/beam/blob/c7be0643934a87d73483cf1fd3199a425508b03c/sdks/go/pkg/beam/core/runtime/exec/fullvalue.go#L108
> But it doesn't work if f2 declares the input type as beam.X and type casts it 
> to []int. This is because there's no type conversion when passing []beam.T to 
> beam.X.
> We may consider supporting the above case by converting []beam.T to the 
> underlying type []T when it's passed to a universal type.
> An issue is that if []beam.T is nil or empty, we don't know its underlying 
> element type (unless we know which concrete type beam.T or beam.X is bound 
> to, but this mapping doesn't seem to be kept at runtime?). In such case, we 
> have to pass []beam.T to beam.X as is.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8612) Convert []beam.T to the underlying type []T when passed to a DoFn with universal typed (beam.X) input

2019-11-11 Thread Tianyang Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianyang Hu updated BEAM-8612:
--
Description: 
Say there are two DoFn: f1, f2.
- f1 declares the output type as []beam.T, and each element has the underlying 
type int.
- f2 declares the input type as []int

Passing f1 output to f2 works well. The conversion from []beam.T to []int 
happens at: 
https://github.com/apache/beam/blob/c7be0643934a87d73483cf1fd3199a425508b03c/sdks/go/pkg/beam/core/runtime/exec/fullvalue.go#L108

But it doesn't work if f2 declares the input type as beam.X and type casts it 
to []int. This is because there's no type conversion when passing []beam.T to 
beam.X.

We may consider supporting the above case by converting []beam.T to the 
underlying type []T when it's passed to a universal type.

An issue is that if []beam.T is nil or empty, we don't know its underlying 
element type (unless we know which concrete type beam.T or beam.X is bound to, 
which doesn't seem to be kept at runtime?). In such case, we have to pass 
[]beam.T to beam.X as is.

  was:
Say there are two DoFn: f1, f2.
- f1 declares the output type as []beam.T, and each element has the underlying 
type int.
- f2 declares the input type as []int

Passing f1 output to f2 works well. The conversion from []beam.T to []int 
happens at: 
https://github.com/apache/beam/blob/c7be0643934a87d73483cf1fd3199a425508b03c/sdks/go/pkg/beam/core/runtime/exec/fullvalue.go#L108

But it doesn't work if f2 declares the input type as beam.X, and type cast it 
to []int. This is because there's no type conversion when passing []beam.T to 
beam.X.

We may consider support the above case by converting []beam.T to the underlying 
type []T when passed to a universal type.

An issue is that if []beam.T is nil or empty, we don't know its underlying 
element type (unless we know which concrete type beam.T or beam.X is bound to, 
which doesn't seem to be kept at runtime?). In such case, we have to pass 
[]beam.T to beam.X as is.


> Convert []beam.T to the underlying type []T when passed to a DoFn with 
> universal typed (beam.X) input
> -
>
> Key: BEAM-8612
> URL: https://issues.apache.org/jira/browse/BEAM-8612
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Tianyang Hu
>Assignee: Tianyang Hu
>Priority: Minor
>
> Say there are two DoFn: f1, f2.
> - f1 declares the output type as []beam.T, and each element has the 
> underlying type int.
> - f2 declares the input type as []int
> Passing f1 output to f2 works well. The conversion from []beam.T to []int 
> happens at: 
> https://github.com/apache/beam/blob/c7be0643934a87d73483cf1fd3199a425508b03c/sdks/go/pkg/beam/core/runtime/exec/fullvalue.go#L108
> But it doesn't work if f2 declares the input type as beam.X and type casts it 
> to []int. This is because there's no type conversion when passing []beam.T to 
> beam.X.
> We may consider supporting the above case by converting []beam.T to the 
> underlying type []T when it's passed to a universal type.
> An issue is that if []beam.T is nil or empty, we don't know its underlying 
> element type (unless we know which concrete type beam.T or beam.X is bound 
> to, which doesn't seem to be kept at runtime?). In such case, we have to pass 
> []beam.T to beam.X as is.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8612) Convert []beam.T to the underlying type []T when passed to a DoFn with universal typed (beam.X) input

2019-11-11 Thread Tianyang Hu (Jira)

Tianyang Hu created BEAM-8612:
-

 Summary: Convert []beam.T to the underlying type []T when passed 
to a DoFn with universal typed (beam.X) input
 Key: BEAM-8612
 URL: https://issues.apache.org/jira/browse/BEAM-8612
 Project: Beam
  Issue Type: New Feature
  Components: sdk-go
Reporter: Tianyang Hu
Assignee: Tianyang Hu


Say there are two DoFn: f1, f2.
- f1 declares the output type as []beam.T, and each element has the underlying 
type int.
- f2 declares the input type as []int

Passing f1 output to f2 works well. The conversion from []beam.T to []int 
happens at: 
https://github.com/apache/beam/blob/c7be0643934a87d73483cf1fd3199a425508b03c/sdks/go/pkg/beam/core/runtime/exec/fullvalue.go#L108

But it doesn't work if f2 declares the input type as beam.X, and type cast it 
to []int. This is because there's no type conversion when passing []beam.T to 
beam.X.

We may consider support the above case by converting []beam.T to the underlying 
type []T when passed to a universal type.

An issue is that if []beam.T is nil or empty, we don't know its underlying 
element type (unless we know which concrete type beam.T or beam.X is bound to, 
which doesn't seem to be kept at runtime?). In such case, we have to pass 
[]beam.T to beam.X as is.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8611) Move TextSourceTest into TextIOReadTest

2019-11-11 Thread Changming Ma (Jira)

Changming Ma created BEAM-8611:
--

 Summary: Move TextSourceTest into TextIOReadTest
 Key: BEAM-8611
 URL: https://issues.apache.org/jira/browse/BEAM-8611
 Project: Beam
  Issue Type: Bug
  Components: io-java-text
Reporter: Changming Ma


PR 10046 ([https://github.com/apache/beam/pull/10046], fix BEAM-8579) has been 
merged, but the test added in the PR (TextSourceTest) needs to be moved to 
TextIOReadTest. Open this bug for the tracking purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8597) Allow TestStream trigger tests to run on other runners.

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8597?focusedWorklogId=341527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341527
 ]

ASF GitHub Bot logged work on BEAM-8597:


Author: ASF GitHub Bot
Created on: 11/Nov/19 22:52
Start Date: 11/Nov/19 22:52
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on pull request #10043: 
[BEAM-8597] Allow TestStream trigger tests to run on other runners.
URL: https://github.com/apache/beam/pull/10043#discussion_r344946461
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream.py
 ##
 @@ -66,6 +68,28 @@ def __ne__(self, other):
 # TODO(BEAM-5949): Needed for Python 2 compatibility.
 return not self == other
 
+  @abstractmethod
+  def to_runner_api(self, element_coder):
+raise NotImplementedError
+
+  @staticmethod
+  def from_runner_api(proto, element_coder):
 
 Review comment:
   For my understanding: Is the reason to group the from_runner_api 
implementation into the abstract class (while triggers/windows leave the 
from_runner_api to the concrete classes) that we believe TestStream will only 
have very limited types of element?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341527)
Time Spent: 50m  (was: 40m)

> Allow TestStream trigger tests to run on other runners.
> ---
>
> Key: BEAM-8597
> URL: https://issues.apache.org/jira/browse/BEAM-8597
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=341526=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341526
 ]

ASF GitHub Bot logged work on BEAM-8151:


Author: ASF GitHub Bot
Created on: 11/Nov/19 22:48
Start Date: 11/Nov/19 22:48
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #9477: [BEAM-8151, 
BEAM-7848] Up the max number of threads inside the SDK harness to a default of 
10k
URL: https://github.com/apache/beam/pull/9477#issuecomment-552652356
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341526)
Time Spent: 8h 40m  (was: 8.5h)

> Allow the Python SDK to use many many threads
> -
>
> Key: BEAM-8151
> URL: https://issues.apache.org/jira/browse/BEAM-8151
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> We need to use a thread pool which shrinks the number of active threads when 
> they are not being used.
>  
> This is to prevent any stuckness issues related to a runner scheduling more 
> work items then there are "work" threads inside the SDK harness.
>  
> By default the control plane should have all "requests" being processed in 
> parallel and the runner is responsible for not overloading the SDK with too 
> much work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8586) [SQL] Add a server for MongoDb Integration Test

2019-11-11 Thread Kirill Kozlov (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8586:

Summary: [SQL] Add a server for MongoDb Integration Test  (was: Add a 
server for MongoDb Integration Test)

> [SQL] Add a server for MongoDb Integration Test
> ---
>
> Key: BEAM-8586
> URL: https://issues.apache.org/jira/browse/BEAM-8586
> Project: Beam
>  Issue Type: Test
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need to pass pipeline options with server information to the 
> MongoDbReadWriteIT.
> For now that test is ignored and excluded from the build.gradle file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8586) [SQL] Add a server for MongoDb Integration Test

2019-11-11 Thread Kirill Kozlov (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8586:

Status: Open  (was: Triage Needed)

> [SQL] Add a server for MongoDb Integration Test
> ---
>
> Key: BEAM-8586
> URL: https://issues.apache.org/jira/browse/BEAM-8586
> Project: Beam
>  Issue Type: Test
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need to pass pipeline options with server information to the 
> MongoDbReadWriteIT.
> For now that test is ignored and excluded from the build.gradle file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-11 Thread Kirill Kozlov (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8427:

Description: 
* Create a MongoDB table and table provider.
 * Implement buildIOReader
 * Support primitive types
 * Implement buildIOWrite
 * improve getTableStatistics

  was:
In progress:
 * Create a MongoDB table and table provider.
 * Implement buildIOReader
 * Support primitive types

Still needs to be done:
 * Implement buildIOWrite
 * improve getTableStatistics


> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=341522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341522
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 11/Nov/19 22:29
Start Date: 11/Nov/19 22:29
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9892: [BEAM-8427] [SQL] 
buildIOWrite for MongoDb Table
URL: https://github.com/apache/beam/pull/9892#issuecomment-552646139
 
 
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341522)
Time Spent: 7.5h  (was: 7h 20m)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (BEAM-8589) Add instrumentation to portable runner to print pipeline proto and options when logging level is set to Debug.

2019-11-11 Thread Valentyn Tymofieiev (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev closed BEAM-8589.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> Add instrumentation to portable runner to print pipeline proto and options 
> when logging level is set to Debug.
> --
>
> Key: BEAM-8589
> URL: https://issues.apache.org/jira/browse/BEAM-8589
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Similar capability in Dataflow runner: 
> https://github.com/apache/beam/blob/90d587843172143c15ed392513e396b74569a98c/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py#L567.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8589) Add instrumentation to portable runner to print pipeline proto and options when logging level is set to Debug.

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8589?focusedWorklogId=341521=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341521
 ]

ASF GitHub Bot logged work on BEAM-8589:


Author: ASF GitHub Bot
Created on: 11/Nov/19 22:25
Start Date: 11/Nov/19 22:25
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10036: [BEAM-8589] 
Print pipeline proto and pipeline options in DEBUG loglevel.
URL: https://github.com/apache/beam/pull/10036
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341521)
Time Spent: 20m  (was: 10m)

> Add instrumentation to portable runner to print pipeline proto and options 
> when logging level is set to Debug.
> --
>
> Key: BEAM-8589
> URL: https://issues.apache.org/jira/browse/BEAM-8589
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Similar capability in Dataflow runner: 
> https://github.com/apache/beam/blob/90d587843172143c15ed392513e396b74569a98c/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py#L567.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-8579) Strip UTF-8 BOM bytes (if present) in TextSource.

2019-11-11 Thread Luke Cwik (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-8579.
-
Fix Version/s: 2.18.0
   Resolution: Fixed

> Strip UTF-8 BOM bytes (if present) in TextSource.
> -
>
> Key: BEAM-8579
> URL: https://issues.apache.org/jira/browse/BEAM-8579
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-text
>Affects Versions: 2.15.0
>Reporter: Changming Ma
>Assignee: Changming Ma
>Priority: Minor
> Fix For: 2.18.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> TextSource in the org.apache.beam.sdk.io package can handle UTF-8 encoded 
> files, and when the file contains byte order mark (BOM), it will preserve it 
> in the output. According to Unicode standard 
> ([http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf|https://www.google.com/url?q=http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf=D=AFQjCNF_PW0McUUnM1UrvZSIwgvAj1uUKw]):
>  "Use of a BOM is neither required nor recommended for UTF-8". UTF-8 with a 
> BOM will also be a potential problem for some Java implementations (e.g., 
> [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4508058|https://www.google.com/url?q=https://bugs.java.com/bugdatabase/view_bug.do?bug_id%3D4508058=D=AFQjCNEdT7vUK99N5bxQc9fkCt-uIG2v7Q]).
>  As a general practice, it's suggested to use UTF-8 without BOM.
> Proposal: remove BOM bytes in the output from TextSource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8579) Strip UTF-8 BOM bytes (if present) in TextSource.

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8579?focusedWorklogId=341519=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341519
 ]

ASF GitHub Bot logged work on BEAM-8579:


Author: ASF GitHub Bot
Created on: 11/Nov/19 22:18
Start Date: 11/Nov/19 22:18
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10046: [BEAM-8579] Strip 
UTF-8 BOM from TextSource output.
URL: https://github.com/apache/beam/pull/10046#issuecomment-552642565
 
 
   I'll merge this as is and please open up a follow-up PR with the test move.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341519)
Time Spent: 1h  (was: 50m)

> Strip UTF-8 BOM bytes (if present) in TextSource.
> -
>
> Key: BEAM-8579
> URL: https://issues.apache.org/jira/browse/BEAM-8579
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-text
>Affects Versions: 2.15.0
>Reporter: Changming Ma
>Assignee: Changming Ma
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> TextSource in the org.apache.beam.sdk.io package can handle UTF-8 encoded 
> files, and when the file contains byte order mark (BOM), it will preserve it 
> in the output. According to Unicode standard 
> ([http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf|https://www.google.com/url?q=http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf=D=AFQjCNF_PW0McUUnM1UrvZSIwgvAj1uUKw]):
>  "Use of a BOM is neither required nor recommended for UTF-8". UTF-8 with a 
> BOM will also be a potential problem for some Java implementations (e.g., 
> [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4508058|https://www.google.com/url?q=https://bugs.java.com/bugdatabase/view_bug.do?bug_id%3D4508058=D=AFQjCNEdT7vUK99N5bxQc9fkCt-uIG2v7Q]).
>  As a general practice, it's suggested to use UTF-8 without BOM.
> Proposal: remove BOM bytes in the output from TextSource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8579) Strip UTF-8 BOM bytes (if present) in TextSource.

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8579?focusedWorklogId=341520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341520
 ]

ASF GitHub Bot logged work on BEAM-8579:


Author: ASF GitHub Bot
Created on: 11/Nov/19 22:18
Start Date: 11/Nov/19 22:18
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10046: [BEAM-8579] 
Strip UTF-8 BOM from TextSource output.
URL: https://github.com/apache/beam/pull/10046
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341520)
Time Spent: 1h 10m  (was: 1h)

> Strip UTF-8 BOM bytes (if present) in TextSource.
> -
>
> Key: BEAM-8579
> URL: https://issues.apache.org/jira/browse/BEAM-8579
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-text
>Affects Versions: 2.15.0
>Reporter: Changming Ma
>Assignee: Changming Ma
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> TextSource in the org.apache.beam.sdk.io package can handle UTF-8 encoded 
> files, and when the file contains byte order mark (BOM), it will preserve it 
> in the output. According to Unicode standard 
> ([http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf|https://www.google.com/url?q=http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf=D=AFQjCNF_PW0McUUnM1UrvZSIwgvAj1uUKw]):
>  "Use of a BOM is neither required nor recommended for UTF-8". UTF-8 with a 
> BOM will also be a potential problem for some Java implementations (e.g., 
> [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4508058|https://www.google.com/url?q=https://bugs.java.com/bugdatabase/view_bug.do?bug_id%3D4508058=D=AFQjCNEdT7vUK99N5bxQc9fkCt-uIG2v7Q]).
>  As a general practice, it's suggested to use UTF-8 without BOM.
> Proposal: remove BOM bytes in the output from TextSource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8587) Add TestStream support for Dataflow runner

2019-11-11 Thread Luke Cwik (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-8587:

Status: Open  (was: Triage Needed)

> Add TestStream support for Dataflow runner
> --
>
> Key: BEAM-8587
> URL: https://issues.apache.org/jira/browse/BEAM-8587
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, testing
>Reporter: Andrew Crites
>Assignee: Andrew Crites
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> TestStream support needed to test features like late data and processing time 
> triggers on local Dataflow runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8379) Cache Eviction for Interactive Beam

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8379?focusedWorklogId=341516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341516
 ]

ASF GitHub Bot logged work on BEAM-8379:


Author: ASF GitHub Bot
Created on: 11/Nov/19 22:14
Start Date: 11/Nov/19 22:14
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10062: [BEAM-8379] 
Cache Eviction
URL: https://github.com/apache/beam/pull/10062
 
 
   1. Implemented cache eviction for Interactive Beam whenever Python 
interpreter exits.
   2. Cache for PCollections is grouped by PCollections as the
   Interactive Beam user flow is now data-centric. And cache including
   its eviction is  managed by a global interactive environment instance
   created/retrieved/reset implicitly by runners in the same main
   thread/loop.
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build

[jira] [Commented] (BEAM-8587) Add TestStream support for Dataflow runner

2019-11-11 Thread Luke Cwik (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971879#comment-16971879
 ] 

Luke Cwik commented on BEAM-8587:
-

Were you planning on adding support to Java/Go?

> Add TestStream support for Dataflow runner
> --
>
> Key: BEAM-8587
> URL: https://issues.apache.org/jira/browse/BEAM-8587
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, testing
>Reporter: Andrew Crites
>Assignee: Andrew Crites
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> TestStream support needed to test features like late data and processing time 
> triggers on local Dataflow runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8587) Add TestStream support for Dataflow runner

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8587?focusedWorklogId=341515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341515
 ]

ASF GitHub Bot logged work on BEAM-8587:


Author: ASF GitHub Bot
Created on: 11/Nov/19 22:13
Start Date: 11/Nov/19 22:13
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10041: [BEAM-8587] 
TestStream for Dataflow runner
URL: https://github.com/apache/beam/pull/10041
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341515)
Time Spent: 1.5h  (was: 1h 20m)

> Add TestStream support for Dataflow runner
> --
>
> Key: BEAM-8587
> URL: https://issues.apache.org/jira/browse/BEAM-8587
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, testing
>Reporter: Andrew Crites
>Assignee: Andrew Crites
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> TestStream support needed to test features like late data and processing time 
> triggers on local Dataflow runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=341511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341511
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 11/Nov/19 22:11
Start Date: 11/Nov/19 22:11
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #9923: 
[BEAM-7389] Add code snippets for Count
URL: https://github.com/apache/beam/pull/9923#discussion_r344933236
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/snippets/transforms/aggregation/count.py
 ##
 @@ -0,0 +1,89 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+from __future__ import print_function
+
+
+def count_globally(test=None):
+  # [START count_globally]
+  import apache_beam as beam
+
+  with beam.Pipeline() as pipeline:
+total_elements = (
+pipeline
+| 'Create produce counts' >> beam.Create([
+('凌', 3),
 
 Review comment:
   That's a good point, changed!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341511)
Time Spent: 75h 50m  (was: 75h 40m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 75h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8587) Add TestStream support for Dataflow runner

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8587?focusedWorklogId=341514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341514
 ]

ASF GitHub Bot logged work on BEAM-8587:


Author: ASF GitHub Bot
Created on: 11/Nov/19 22:11
Start Date: 11/Nov/19 22:11
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10041: [BEAM-8587] 
TestStream for Dataflow runner
URL: https://github.com/apache/beam/pull/10041#discussion_r344933349
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -1195,6 +1196,49 @@ def run__NativeWrite(self, transform_node, options):
  PropertyNames.STEP_NAME: input_step.proto.name,
  PropertyNames.OUTPUT_NAME: input_step.get_output(input_tag)})
 
+  @unittest.skip("This is not a test, despite the name.")
+  def run_TestStream(self, transform_node, options):
+from apache_beam.portability.api import beam_runner_api_pb2
+from apache_beam.testing.test_stream import ElementEvent
+from apache_beam.testing.test_stream import ProcessingTimeEvent
+from apache_beam.testing.test_stream import WatermarkEvent
+standard_options = options.view_as(StandardOptions)
+if not standard_options.streaming:
+  raise ValueError('TestStream is currently available for use '
+   'only in streaming pipelines.')
+
+transform = transform_node.transform
+step = self._add_step(TransformNames.READ, transform_node.full_label,
+  transform_node)
+step.add_property(PropertyNames.FORMAT, 'test_stream')
+test_stream_payload = beam_runner_api_pb2.TestStreamPayload()
+# TestStream source doesn't do any decoding of elements,
+# so we won't set test_stream_payload.coder_id.
+output_coder = transform._infer_output_coder()  # pylint: 
disable=protected-access
+for event in transform.events:
+  new_event = test_stream_payload.events.add()
+  if isinstance(event, ElementEvent):
+for tv in event.timestamped_values:
+  element = new_event.element_event.elements.add()
+  element.encoded_element = output_coder.encode(tv.value)
+  element.timestamp = tv.timestamp.micros
 
 Review comment:
   Details in 
https://lists.apache.org/thread.html/27fe9aa5b33dbee97db1bc924ee410048137e4fe97d9c79d131c010c@%3Cdev.beam.apache.org%3E
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341514)
Time Spent: 1h 20m  (was: 1h 10m)

> Add TestStream support for Dataflow runner
> --
>
> Key: BEAM-8587
> URL: https://issues.apache.org/jira/browse/BEAM-8587
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, testing
>Reporter: Andrew Crites
>Assignee: Andrew Crites
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> TestStream support needed to test features like late data and processing time 
> triggers on local Dataflow runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8587) Add TestStream support for Dataflow runner

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8587?focusedWorklogId=341509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341509
 ]

ASF GitHub Bot logged work on BEAM-8587:


Author: ASF GitHub Bot
Created on: 11/Nov/19 22:05
Start Date: 11/Nov/19 22:05
Worklog Time Spent: 10m 
  Work Description: acrites commented on pull request #10041: [BEAM-8587] 
TestStream for Dataflow runner
URL: https://github.com/apache/beam/pull/10041#discussion_r344931199
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -1195,6 +1196,49 @@ def run__NativeWrite(self, transform_node, options):
  PropertyNames.STEP_NAME: input_step.proto.name,
  PropertyNames.OUTPUT_NAME: input_step.get_output(input_tag)})
 
+  @unittest.skip("This is not a test, despite the name.")
+  def run_TestStream(self, transform_node, options):
+from apache_beam.portability.api import beam_runner_api_pb2
+from apache_beam.testing.test_stream import ElementEvent
+from apache_beam.testing.test_stream import ProcessingTimeEvent
+from apache_beam.testing.test_stream import WatermarkEvent
+standard_options = options.view_as(StandardOptions)
+if not standard_options.streaming:
+  raise ValueError('TestStream is currently available for use '
+   'only in streaming pipelines.')
+
+transform = transform_node.transform
+step = self._add_step(TransformNames.READ, transform_node.full_label,
+  transform_node)
+step.add_property(PropertyNames.FORMAT, 'test_stream')
+test_stream_payload = beam_runner_api_pb2.TestStreamPayload()
+# TestStream source doesn't do any decoding of elements,
+# so we won't set test_stream_payload.coder_id.
+output_coder = transform._infer_output_coder()  # pylint: 
disable=protected-access
+for event in transform.events:
+  new_event = test_stream_payload.events.add()
+  if isinstance(event, ElementEvent):
+for tv in event.timestamped_values:
+  element = new_event.element_event.elements.add()
+  element.encoded_element = output_coder.encode(tv.value)
+  element.timestamp = tv.timestamp.micros
 
 Review comment:
   We're punting on this for now due to differences in what ranges of times 
protobuf Timestamp and Duration allow vs. what Beam allows.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341509)
Time Spent: 1h 10m  (was: 1h)

> Add TestStream support for Dataflow runner
> --
>
> Key: BEAM-8587
> URL: https://issues.apache.org/jira/browse/BEAM-8587
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, testing
>Reporter: Andrew Crites
>Assignee: Andrew Crites
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> TestStream support needed to test features like late data and processing time 
> triggers on local Dataflow runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-5600) Splitting for SplittableDoFn should be exposed within runner shared libraries

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-5600?focusedWorklogId=341508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341508
 ]

ASF GitHub Bot logged work on BEAM-5600:


Author: ASF GitHub Bot
Created on: 11/Nov/19 22:04
Start Date: 11/Nov/19 22:04
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10045: [BEAM-5600, 
BEAM-2939] Add SplittableParDo expansion logic to runner's core.
URL: https://github.com/apache/beam/pull/10045#discussion_r344930720
 
 

 ##
 File path: 
runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaPipelineRunner.java
 ##
 @@ -36,8 +39,16 @@
 
   @Override
   public PortablePipelineResult run(final Pipeline pipeline, JobInfo jobInfo) {
+// Expand any splittable DoFns within the graph to enable sizing and 
splitting of bundles.
 
 Review comment:
   I was thinking about that but the preparation will likely diverge once more 
forms of splitting are added since the various graph expansions will change.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341508)
Time Spent: 1h  (was: 50m)

> Splitting for SplittableDoFn should be exposed within runner shared libraries
> -
>
> Key: BEAM-5600
> URL: https://issues.apache.org/jira/browse/BEAM-5600
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Priority: Major
>  Labels: portability
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8587) Add TestStream support for Dataflow runner

2019-11-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8587?focusedWorklogId=341507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341507
 ]

ASF GitHub Bot logged work on BEAM-8587:


Author: ASF GitHub Bot
Created on: 11/Nov/19 22:04
Start Date: 11/Nov/19 22:04
Worklog Time Spent: 10m 
  Work Description: acrites commented on pull request #10041: [BEAM-8587] 
TestStream for Dataflow runner
URL: https://github.com/apache/beam/pull/10041#discussion_r344930626
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -1195,6 +1196,49 @@ def run__NativeWrite(self, transform_node, options):
  PropertyNames.STEP_NAME: input_step.proto.name,
  PropertyNames.OUTPUT_NAME: input_step.get_output(input_tag)})
 
+  @unittest.skip("This is not a test, despite the name.")
 
 Review comment:
   We opted to set __test__ = False instead of using a unittest annotation, 
since the annotation caused a different lint test to fail.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341507)
Time Spent: 1h  (was: 50m)

> Add TestStream support for Dataflow runner
> --
>
> Key: BEAM-8587
> URL: https://issues.apache.org/jira/browse/BEAM-8587
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, testing
>Reporter: Andrew Crites
>Assignee: Andrew Crites
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> TestStream support needed to test features like late data and processing time 
> triggers on local Dataflow runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 3 >

1 - 100 of 215 matches

Mail list logo