[jira] [Work logged] (BEAM-7274) Protobuf Beam Schema support

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7274?focusedWorklogId=345831&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345831
 ]

ASF GitHub Bot logged work on BEAM-7274:


Author: ASF GitHub Bot
Created on: 19/Nov/19 06:48
Start Date: 19/Nov/19 06:48
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #8690: [BEAM-7274] 
Implement the Protobuf schema provider
URL: https://github.com/apache/beam/pull/8690#issuecomment-555359927
 
 
   Another comment: this delegates everything to Proto's reflection API, which 
can be quite inefficient. Compare with AvroSchema where we delegate straight to 
generated classes when they exist. Reflection might be necessary for the case 
of DynamicMessage (though in that case I think we should use RowWithStorage 
instead of RowWithGetters), but shouldn't be necessary when we have generated 
classes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345831)
Time Spent: 9h 40m  (was: 9.5h)

> Protobuf Beam Schema support
> 
>
> Key: BEAM-7274
> URL: https://issues.apache.org/jira/browse/BEAM-7274
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Minor
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> Add support for the new Beam Schema to the Protobuf extension.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7274) Protobuf Beam Schema support

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7274?focusedWorklogId=345812&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345812
 ]

ASF GitHub Bot logged work on BEAM-7274:


Author: ASF GitHub Bot
Created on: 19/Nov/19 06:02
Start Date: 19/Nov/19 06:02
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #8690: [BEAM-7274] 
Implement the Protobuf schema provider
URL: https://github.com/apache/beam/pull/8690#discussion_r347741250
 
 

 ##
 File path: 
sdks/java/extensions/protobuf/src/main/java/org/apache/beam/sdk/extensions/protobuf/ProtoSchemaProvider.java
 ##
 @@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.protobuf;
+
+import com.google.protobuf.DynamicMessage;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.SchemaProvider;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.sdk.values.TypeDescriptor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Schema provider for Protobuf messages. The provider is able to handle pre 
compiled Message file
+ * without external help. For Dynamic Messages a Descriptor needs to be 
registered up front on a
+ * specific URN.
+ *
+ * It's possible to inherit this class for a specific implementation that 
communicates with an
+ * external registry that maps those URN's with Descriptors.
+ */
+@Experimental(Experimental.Kind.SCHEMAS)
+public class ProtoSchemaProvider implements SchemaProvider {
 
 Review comment:
   I'm not sure I understand this comment.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345812)
Time Spent: 9h 20m  (was: 9h 10m)

> Protobuf Beam Schema support
> 
>
> Key: BEAM-7274
> URL: https://issues.apache.org/jira/browse/BEAM-7274
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Minor
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> Add support for the new Beam Schema to the Protobuf extension.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7116) Remove KV from Schema transforms

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7116?focusedWorklogId=345814&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345814
 ]

ASF GitHub Bot logged work on BEAM-7116:


Author: ASF GitHub Bot
Created on: 19/Nov/19 06:03
Start Date: 19/Nov/19 06:03
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #10151: [BEAM-7116] Remove 
use of KV in Schema transforms
URL: https://github.com/apache/beam/pull/10151#issuecomment-555347866
 
 
   run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345814)
Time Spent: 0.5h  (was: 20m)

> Remove KV from Schema transforms
> 
>
> Key: BEAM-7116
> URL: https://issues.apache.org/jira/browse/BEAM-7116
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Instead of returning KV objects, we should return a Schema with two fields. 
> The Convert transform should be able to convert these to KV objects.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7274) Protobuf Beam Schema support

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7274?focusedWorklogId=345813&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345813
 ]

ASF GitHub Bot logged work on BEAM-7274:


Author: ASF GitHub Bot
Created on: 19/Nov/19 06:02
Start Date: 19/Nov/19 06:02
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #8690: [BEAM-7274] 
Implement the Protobuf schema provider
URL: https://github.com/apache/beam/pull/8690#discussion_r347743622
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/values/Row.java
 ##
 @@ -554,6 +555,12 @@ public Builder withFieldValueGetters(
   return this;
 }
 
+/** The FieldValueGetters will handle the conversion for Arrays, Maps and 
Rows. */
+public Builder withFieldValueGettersHandleCollections(boolean 
collectionHandledByGetter) {
+  this.collectionHandledByGetter = collectionHandledByGetter;
+  return this;
+}
 
 Review comment:
   Can you help me understand this a bit more? Why does it not work to cache 
lists for protocol buffers? We saw repeated array conversion to be a big 
problem (which is why we cache them). I'm wondering if we could instead cache a 
lazy array like we do with iterables.
   
   I'll take a closer look at this code to figure it out.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345813)
Time Spent: 9.5h  (was: 9h 20m)

> Protobuf Beam Schema support
> 
>
> Key: BEAM-7274
> URL: https://issues.apache.org/jira/browse/BEAM-7274
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Minor
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> Add support for the new Beam Schema to the Protobuf extension.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8654) [Java] beam_Dependency_Check's not getting correct report from Gradle dependencyUpdates

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8654?focusedWorklogId=345801&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345801
 ]

ASF GitHub Bot logged work on BEAM-8654:


Author: ASF GitHub Bot
Created on: 19/Nov/19 04:58
Start Date: 19/Nov/19 04:58
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10127: [BEAM-8654] Fixes 
resolutionStrategy's interference with dependency check
URL: https://github.com/apache/beam/pull/10127#issuecomment-555333591
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345801)
Time Spent: 2h 20m  (was: 2h 10m)

> [Java] beam_Dependency_Check's not getting correct report from Gradle 
> dependencyUpdates
> ---
>
> Key: BEAM-8654
> URL: https://issues.apache.org/jira/browse/BEAM-8654
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Cont. of https://issues.apache.org/jira/browse/BEAM-8621
> https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_Dependency_Check/234/consoleFull
>  says
> {noformat}
> 18:20:07 > Task :dependencyUpdates
> ...
> 18:23:12 The following dependencies are using the latest release version:
> ...
> 18:23:12  - com.google.cloud.bigdataoss:util:1.9.16
> 18:23:12  - com.google.cloud.bigtable:bigtable-client-core:1.8.0
> {noformat}
> But they are not the latest release.
> * 
> https://search.maven.org/artifact/com.google.cloud.bigdataoss/util/2.0.0/jar 
> * 
> https://search.maven.org/artifact/com.google.cloud.bigtable/bigtable-client-core/1.12.1/jar
> Why does Gradle think they're the latest release?
> It seems that " -Drevision=release" flag plays some role here. Without the 
> flag, Gradle reports these artifacts are not the latest.
> https://gist.github.com/suztomo/1460f2be48025c8ea764e86a2c6e39a8
> Even with the flag, it should report the following
> {noformat}
> The following dependencies have later release versions:
>  - com.google.cloud.bigtable:bigtable-client-core [1.8.0 -> 1.12.1]
>  https://cloud.google.com/bigtable/
> {noformat}
> https://gist.github.com/suztomo/13473e6b9765c0e96c22aeffab18ef66



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7116) Remove KV from Schema transforms

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7116?focusedWorklogId=345792&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345792
 ]

ASF GitHub Bot logged work on BEAM-7116:


Author: ASF GitHub Bot
Created on: 19/Nov/19 04:35
Start Date: 19/Nov/19 04:35
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #10151: [BEAM-7116] Remove 
use of KV in Schema transforms
URL: https://github.com/apache/beam/pull/10151#issuecomment-555328927
 
 
   R: @TheNeuralBit 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345792)
Time Spent: 20m  (was: 10m)

> Remove KV from Schema transforms
> 
>
> Key: BEAM-7116
> URL: https://issues.apache.org/jira/browse/BEAM-7116
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Instead of returning KV objects, we should return a Schema with two fields. 
> The Convert transform should be able to convert these to KV objects.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7116) Remove KV from Schema transforms

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7116?focusedWorklogId=345790&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345790
 ]

ASF GitHub Bot logged work on BEAM-7116:


Author: ASF GitHub Bot
Created on: 19/Nov/19 04:32
Start Date: 19/Nov/19 04:32
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #10151: [BEAM-7116] 
Remove use of KV in Schema transforms
URL: https://github.com/apache/beam/pull/10151
 
 
   Beam's KV type has no schema and due to special casing of KvCoder in Beam it 
is difficult to give it one. Here we modify the Beam schema transforms that 
return PCollection to instead return PCollection where the Row 
contains key and value fields. This is possible now that we support large 
iterables in schemas.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345790)
Remaining Estimate: 0h
Time Spent: 10m

> Remove KV from Schema transforms
> 
>
> Key: BEAM-7116
> URL: https://issues.apache.org/jira/browse/BEAM-7116
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Instead of returning KV objects, we should return a Schema with two fields. 
> The Convert transform should be able to convert these to KV objects.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6756) Support lazy iterables in schemas

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6756?focusedWorklogId=345789&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345789
 ]

ASF GitHub Bot logged work on BEAM-6756:


Author: ASF GitHub Bot
Created on: 19/Nov/19 04:15
Start Date: 19/Nov/19 04:15
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #10003: [BEAM-6756] 
Create Iterable type for Schema
URL: https://github.com/apache/beam/pull/10003
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345789)
Time Spent: 2.5h  (was: 2h 20m)

> Support lazy iterables in schemas
> -
>
> Key: BEAM-6756
> URL: https://issues.apache.org/jira/browse/BEAM-6756
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The iterables returned by GroupByKey and CoGroupByKey are lazy; this allows a 
> runner to page data into memory if the full iterable is too large. We 
> currently don't support this in Schemas, so the Schema Group and CoGroup 
> transforms materialize all data into memory. We should add support for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8251) Add worker_region and worker_zone options

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8251?focusedWorklogId=345786&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345786
 ]

ASF GitHub Bot logged work on BEAM-8251:


Author: ASF GitHub Bot
Created on: 19/Nov/19 04:05
Start Date: 19/Nov/19 04:05
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10150: [BEAM-8251] 
plumb worker_(region|zone) to Environment proto
URL: https://github.com/apache/beam/pull/10150
 
 
   Added these pipeline options a while back, but they need to be present in 
the `Environment` to actually take effect.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_P

[jira] [Work logged] (BEAM-7951) Allow runner to configure customization WindowedValue coder such as ValueOnlyWindowedValueCoder

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7951?focusedWorklogId=345774&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345774
 ]

ASF GitHub Bot logged work on BEAM-7951:


Author: ASF GitHub Bot
Created on: 19/Nov/19 02:43
Start Date: 19/Nov/19 02:43
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #9979: [BEAM-7951] 
Allow runner to configure customization WindowedValue coder.
URL: https://github.com/apache/beam/pull/9979#issuecomment-555306699
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345774)
Time Spent: 2h 10m  (was: 2h)

> Allow runner to configure customization WindowedValue coder such as 
> ValueOnlyWindowedValueCoder
> ---
>
> Key: BEAM-7951
> URL: https://issues.apache.org/jira/browse/BEAM-7951
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The coder of WindowedValue cannot be configured and it’s always 
> FullWindowedValueCoder. We don't need to serialize the timestamp, window and 
> pane properties in Flink and so it will be better to make the coder 
> configurable (i.e. allowing to use ValueOnlyWindowedValueCoder)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8629) WithTypeHints._get_or_create_type_hints may return a mutable copy of the class type hints.

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8629?focusedWorklogId=345769&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345769
 ]

ASF GitHub Bot logged work on BEAM-8629:


Author: ASF GitHub Bot
Created on: 19/Nov/19 02:29
Start Date: 19/Nov/19 02:29
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10080: [BEAM-8629] 
Don't return mutable class type hints.
URL: https://github.com/apache/beam/pull/10080#discussion_r347705356
 
 

 ##
 File path: sdks/python/apache_beam/typehints/decorators.py
 ##
 @@ -340,6 +340,22 @@ def __repr__(self):
 return 'IOTypeHints[inputs=%s, outputs=%s]' % (
 self.input_types, self.output_types)
 
+  def __eq__(self, other):
 
 Review comment:
   Why were these 3 methods needed?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345769)
Time Spent: 0.5h  (was: 20m)

> WithTypeHints._get_or_create_type_hints may return a mutable copy of the 
> class type hints.
> --
>
> Key: BEAM-8629
> URL: https://issues.apache.org/jira/browse/BEAM-8629
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8646) PR #9814 appears to cause failures in fnapi_runner tests on Windows

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8646?focusedWorklogId=345763&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345763
 ]

ASF GitHub Bot logged work on BEAM-8646:


Author: ASF GitHub Bot
Created on: 19/Nov/19 02:18
Start Date: 19/Nov/19 02:18
Worklog Time Spent: 10m 
  Work Description: violalyu commented on issue #10110: [BEAM-8646] Restore 
original behavior of evaluating worker host on Windows until a better solution 
is available.
URL: https://github.com/apache/beam/pull/10110#issuecomment-555300788
 
 
   Thanks @tvalentyn !
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345763)
Time Spent: 0.5h  (was: 20m)

> PR #9814 appears to cause failures in fnapi_runner tests on Windows
> ---
>
> Key: BEAM-8646
> URL: https://issues.apache.org/jira/browse/BEAM-8646
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Wanqi Lyu
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It appears that changes in 
>  
> https://github.com/apache/beam/commit/d6bcb03f586b5430c30f6ca4a1af9e42711e529c
>  cause test failures in Beam test suite on Windows, for example:
> python setup.py nosetests --tests 
> apache_beam/runners/portability/portable_runner_test.py:PortableRunnerTestWithExternalEnv.test_callbacks_with_exception
>  
> does not finish on a Windows VM machine within at least 60 seconds but passes 
> within a second if  we change host_from_worker to return 'localhost'  in [1].
>  [~violalyu] , do you think you could take a look? Thanks! 
> cc: [~chadrik] [~thw]
> [1] 
> https://github.com/apache/beam/blob/808cb35018cd228a59b152234b655948da2455fa/sdks/python/apache_beam/runners/portability/fn_api_runner.py#L1377.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345758&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345758
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 19/Nov/19 02:12
Start Date: 19/Nov/19 02:12
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on issue #10145: [BEAM-8575] Add a 
Python test to test windowing in DoFn finish_bundle()
URL: https://github.com/apache/beam/pull/10145#issuecomment-555299364
 
 
   Thanks Yichi for the review:)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345758)
Time Spent: 11h 10m  (was: 11h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6756) Support lazy iterables in schemas

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6756?focusedWorklogId=345754&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345754
 ]

ASF GitHub Bot logged work on BEAM-6756:


Author: ASF GitHub Bot
Created on: 19/Nov/19 02:07
Start Date: 19/Nov/19 02:07
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #10003: [BEAM-6756] Create 
Iterable type for Schema
URL: https://github.com/apache/beam/pull/10003#issuecomment-555298098
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345754)
Time Spent: 2h 20m  (was: 2h 10m)

> Support lazy iterables in schemas
> -
>
> Key: BEAM-6756
> URL: https://issues.apache.org/jira/browse/BEAM-6756
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The iterables returned by GroupByKey and CoGroupByKey are lazy; this allows a 
> runner to page data into memory if the full iterable is too large. We 
> currently don't support this in Schemas, so the Schema Group and CoGroup 
> transforms materialize all data into memory. We should add support for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=345751&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345751
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 19/Nov/19 01:57
Start Date: 19/Nov/19 01:57
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #10132: [BEAM-8016] Pipeline 
Graph
URL: https://github.com/apache/beam/pull/10132#issuecomment-555295902
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345751)
Time Spent: 3h 50m  (was: 3h 40m)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345748&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345748
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 19/Nov/19 01:48
Start Date: 19/Nov/19 01:48
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #10145: [BEAM-8575] Add a 
Python test to test windowing in DoFn finish_bundle()
URL: https://github.com/apache/beam/pull/10145#issuecomment-555293777
 
 
   @tvalentyn could you help to merge?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345748)
Time Spent: 11h  (was: 10h 50m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8586) [SQL] Add a server for MongoDb Integration Test

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8586?focusedWorklogId=345747&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345747
 ]

ASF GitHub Bot logged work on BEAM-8586:


Author: ASF GitHub Bot
Created on: 19/Nov/19 01:39
Start Date: 19/Nov/19 01:39
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10061: [BEAM-8586] [SQL] 
Fix MongoDb integration tests
URL: https://github.com/apache/beam/pull/10061#issuecomment-553659742
 
 
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345747)
Time Spent: 50m  (was: 40m)

> [SQL] Add a server for MongoDb Integration Test
> ---
>
> Key: BEAM-8586
> URL: https://issues.apache.org/jira/browse/BEAM-8586
> Project: Beam
>  Issue Type: Test
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We need to pass pipeline options with server information to the 
> MongoDbReadWriteIT.
> For now that test is ignored and excluded from the build.gradle file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8586) [SQL] Add a server for MongoDb Integration Test

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8586?focusedWorklogId=345746&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345746
 ]

ASF GitHub Bot logged work on BEAM-8586:


Author: ASF GitHub Bot
Created on: 19/Nov/19 01:39
Start Date: 19/Nov/19 01:39
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10061: [BEAM-8586] [SQL] 
Fix MongoDb integration tests
URL: https://github.com/apache/beam/pull/10061#issuecomment-555291639
 
 
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345746)
Time Spent: 40m  (was: 0.5h)

> [SQL] Add a server for MongoDb Integration Test
> ---
>
> Key: BEAM-8586
> URL: https://issues.apache.org/jira/browse/BEAM-8586
> Project: Beam
>  Issue Type: Test
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We need to pass pipeline options with server information to the 
> MongoDbReadWriteIT.
> For now that test is ignored and excluded from the build.gradle file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=345745&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345745
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 19/Nov/19 01:34
Start Date: 19/Nov/19 01:34
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #10132: [BEAM-8016] Pipeline 
Graph
URL: https://github.com/apache/beam/pull/10132#issuecomment-555276401
 
 
   Thanks for the comments!
   > * Are we getting rid of the tooltips displaying the intermediate results? 
Do they not fit in the new model?
   
   We are offering a `show` API to users so that they can visualize a larger 
set of their data dynamically instead of peeking through a random static sample 
(which we still offer if the user calls `show` in an ipython terminal not a 
notebook web frontend).
   And with new Beam pipeline graph proposal and some pipeline graph library 
WIP, the tooltip in the future might show other metadata such as 
elapse/throughput to provide a consistent user experience that is similar to 
what users have on Dataflow.
   
   > * What do the PCollections look like if the user did not specify the 
PCollection name as a variable?
   
   Those PCollections will not be cached. The idea is when building a pipeline, 
if the user does not assign a PCollection to a variable, they would not be able 
to build the pipeline further from it and they cannot invoke `show(pcoll)` 
because they don't have access to `pcoll` in their code.
   Before, we have had the `leaf pcollection` concept for PCollections who have 
never been used as inputs. It doesn't work for PCollections consumed by sinks 
(with input but no output) even if the user has assigned them to a variable and 
they look like hanging PCollections. It also doesn't work in a notebook 
environment where new transforms can be added at different locations and 
PCollections can be re-evaluated due to cell-re-execution.
   
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345745)
Time Spent: 3h 40m  (was: 3.5h)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8613) Add environment variable support to Docker environment

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8613?focusedWorklogId=345742&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345742
 ]

ASF GitHub Bot logged work on BEAM-8613:


Author: ASF GitHub Bot
Created on: 19/Nov/19 01:20
Start Date: 19/Nov/19 01:20
Worklog Time Spent: 10m 
  Work Description: nrusch commented on issue #10064: [BEAM-8613] Add 
environment variable support to Docker environment
URL: https://github.com/apache/beam/pull/10064#issuecomment-555287194
 
 
   Rebased to current master.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345742)
Time Spent: 1h  (was: 50m)

> Add environment variable support to Docker environment
> --
>
> Key: BEAM-8613
> URL: https://issues.apache.org/jira/browse/BEAM-8613
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution, runner-core, runner-direct
>Reporter: Nathan Rusch
>Priority: Trivial
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The Process environment allows specifying environment variables via a map 
> field on its payload message. The Docker environment should support this same 
> pattern, and forward the contents of the map through to the container runtime.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=345741&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345741
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 19/Nov/19 01:18
Start Date: 19/Nov/19 01:18
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10132: [BEAM-8016] 
Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r347682578
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/interactive_runner.py
 ##
 @@ -117,123 +119,43 @@ def apply(self, transform, pvalueish, options):
 return self._underlying_runner.apply(transform, pvalueish, options)
 
   def run_pipeline(self, pipeline, options):
-if not hasattr(self, '_desired_cache_labels'):
-  self._desired_cache_labels = set()
-
-# Invoke a round trip through the runner API. This makes sure the Pipeline
-# proto is stable.
-pipeline = beam.pipeline.Pipeline.from_runner_api(
-pipeline.to_runner_api(use_fake_coders=True),
-pipeline.runner,
-options)
-
-# Snapshot the pipeline in a portable proto before mutating it.
-pipeline_proto, original_context = pipeline.to_runner_api(
-return_context=True, use_fake_coders=True)
-pcolls_to_pcoll_id = self._pcolls_to_pcoll_id(pipeline, original_context)
-
-analyzer = pipeline_analyzer.PipelineAnalyzer(self._cache_manager,
-  pipeline_proto,
-  self._underlying_runner,
-  options,
-  self._desired_cache_labels)
-# Should be only accessed for debugging purpose.
-self._analyzer = analyzer
+pin = inst.pin(pipeline, options)
 
 pipeline_to_execute = beam.pipeline.Pipeline.from_runner_api(
-analyzer.pipeline_proto_to_execute(),
+pin.instrumented_pipeline_proto(),
 self._underlying_runner,
 options)
 
 if not self._skip_display:
-  display = display_manager.DisplayManager(
-  pipeline_proto=pipeline_proto,
-  pipeline_analyzer=analyzer,
-  cache_manager=self._cache_manager,
-  pipeline_graph_renderer=self._renderer)
-  display.start_periodic_update()
+  pg = pipeline_graph.PipelineGraph(pin.original_pipeline,
 
 Review comment:
   Sure! I'll use the full name `a_pipeline_graph`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345741)
Time Spent: 3.5h  (was: 3h 20m)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=345740&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345740
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 19/Nov/19 01:16
Start Date: 19/Nov/19 01:16
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10132: [BEAM-8016] 
Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r347031607
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pipeline_graph.py
 ##
 @@ -136,14 +159,50 @@ def _generate_graph_dicts(self):
   vertex_dict[invisible_leaf] = {'style': 'invis'}
   self._edge_to_vertex_pairs[pcoll_id].append(
   (transform.unique_name, invisible_leaf))
-  edge_dict[(transform.unique_name, invisible_leaf)] = {}
+  if self._pin:
+edge_label = {'label':
+self._pin.cacheable_var_by_pcoll_id(pcoll_id)}
+edge_dict[(transform.unique_name, invisible_leaf)] = edge_label
+  else:
+edge_dict[(transform.unique_name, invisible_leaf)] = {}
+# For PCollections with more than one consuming PTransform, we also add
+# an invisible dummy node to diverge the edge in the middle as the
+# single output is used by multiple down stream PTransforms as inputs
+# instead of emitting multiple edges.
+elif len(self._consumers[pcoll_id]) > 1:
+  intermediate_dummy = 'diverge{}'.format(
+  hash(pcoll_id) % 1)
+  vertex_dict[intermediate_dummy] = {'shape': 'point',
+ 'width': '0'}
+  for consumer in self._consumers[pcoll_id]:
+producer_name = transform.unique_name
+consumer_name = transforms[consumer].unique_name
+self._edge_to_vertex_pairs[pcoll_id].append(
+(producer_name, intermediate_dummy))
+if self._pin:
+  edge_dict[(producer_name, intermediate_dummy)] = {
+  'arrowhead': 'none',
+  'label':
+self._pin.cacheable_var_by_pcoll_id(pcoll_id)}
+else:
+  edge_dict[(producer_name, intermediate_dummy)] = {
+  'arrowhead': 'none'}
+self._edge_to_vertex_pairs[pcoll_id].append(
+(intermediate_dummy, consumer_name))
+edge_dict[(intermediate_dummy, consumer_name)] = {}
 else:
   for consumer in self._consumers[pcoll_id]:
 producer_name = transform.unique_name
 consumer_name = transforms[consumer].unique_name
 self._edge_to_vertex_pairs[pcoll_id].append(
 (producer_name, consumer_name))
-edge_dict[(producer_name, consumer_name)] = {}
+if self._pin:
+  edge_dict[(producer_name, consumer_name)] = {
+  'label':
+self._pin.cacheable_var_by_pcoll_id(pcoll_id)
+  }
+else:
+  edge_dict[(producer_name, consumer_name)] = {}
 
 Review comment:
   The difference introduced:
   Before:
   
![1](https://user-images.githubusercontent.com/4423149/68979012-364f6480-07b1-11ea-9ec3-8022154a8499.png)
   After (edge diverge for single output, user variable names labelled 
PCollections):
   
![2](https://user-images.githubusercontent.com/4423149/69100500-2bdfd580-0a12-11ea-9ef3-cb735fe69775.png)
   
   In the future:
   With data-centric user flow, notebook cell execution metadata and new Beam 
pipeline graph proposal:
   
![3](https://user-images.githubusercontent.com/4423149/68979139-93e3b100-07b1-11ea-8931-a6825b39b7a4.png)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345740)
Time Spent: 3h 20m  (was: 3h 10m)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, 

[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=345739&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345739
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 19/Nov/19 01:15
Start Date: 19/Nov/19 01:15
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10132: [BEAM-8016] 
Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r347689574
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pipeline_graph.py
 ##
 @@ -92,9 +104,20 @@ def __init__(self,
   default_vertex_attrs,
   default_edge_attrs)
 
+self._renderer = pipeline_graph_renderer.get_renderer(render_option)
+
   def get_dot(self):
 return self._get_graph().to_string()
 
+  def display_graph(self):
+rendered_graph = self._renderer.render_pipeline_graph(self)
+if ie.current_env().is_in_notebook:
+  try:
+from IPython.core import display
+display.display(display.HTML(rendered_graph))
+  except ImportError:  # Unlikely to happen when is_in_notebook.
+pass
 
 Review comment:
   Added a warning level logging here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345739)
Time Spent: 3h 10m  (was: 3h)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=345738&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345738
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 19/Nov/19 01:11
Start Date: 19/Nov/19 01:11
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10132: [BEAM-8016] 
Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r347688641
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/interactive_runner.py
 ##
 @@ -117,123 +119,43 @@ def apply(self, transform, pvalueish, options):
 return self._underlying_runner.apply(transform, pvalueish, options)
 
   def run_pipeline(self, pipeline, options):
-if not hasattr(self, '_desired_cache_labels'):
-  self._desired_cache_labels = set()
-
-# Invoke a round trip through the runner API. This makes sure the Pipeline
-# proto is stable.
-pipeline = beam.pipeline.Pipeline.from_runner_api(
-pipeline.to_runner_api(use_fake_coders=True),
-pipeline.runner,
-options)
-
-# Snapshot the pipeline in a portable proto before mutating it.
-pipeline_proto, original_context = pipeline.to_runner_api(
-return_context=True, use_fake_coders=True)
-pcolls_to_pcoll_id = self._pcolls_to_pcoll_id(pipeline, original_context)
-
-analyzer = pipeline_analyzer.PipelineAnalyzer(self._cache_manager,
-  pipeline_proto,
-  self._underlying_runner,
-  options,
-  self._desired_cache_labels)
-# Should be only accessed for debugging purpose.
-self._analyzer = analyzer
+pin = inst.pin(pipeline, options)
 
 pipeline_to_execute = beam.pipeline.Pipeline.from_runner_api(
-analyzer.pipeline_proto_to_execute(),
+pin.instrumented_pipeline_proto(),
 self._underlying_runner,
 options)
 
 if not self._skip_display:
-  display = display_manager.DisplayManager(
-  pipeline_proto=pipeline_proto,
-  pipeline_analyzer=analyzer,
-  cache_manager=self._cache_manager,
-  pipeline_graph_renderer=self._renderer)
-  display.start_periodic_update()
+  pg = pipeline_graph.PipelineGraph(pin.original_pipeline,
+render_option=self._render_option)
+  pg.display_graph()
 
 result = pipeline_to_execute.run()
 result.wait_until_finish()
 
-if not self._skip_display:
-  display.stop_periodic_update()
-
-return PipelineResult(result, self, self._analyzer.pipeline_info(),
-  self._cache_manager, pcolls_to_pcoll_id)
-
-  def _pcolls_to_pcoll_id(self, pipeline, original_context):
-"""Returns a dict mapping PCollections string to PCollection IDs.
-
-Using a PipelineVisitor to iterate over every node in the pipeline,
-records the mapping from PCollections to PCollections IDs. This mapping
-will be used to query cached PCollections.
-
-Args:
-  pipeline: (pipeline.Pipeline)
-  original_context: (pipeline_context.PipelineContext)
-
-Returns:
-  (dict from str to str) a dict mapping str(pcoll) to pcoll_id.
-"""
-pcolls_to_pcoll_id = {}
-
-from apache_beam.pipeline import PipelineVisitor  # pylint: 
disable=import-error
-
-class PCollVisitor(PipelineVisitor):  # pylint: 
disable=used-before-assignment
-  A visitor that records input and output values to be replaced.
-
-  Input and output values that should be updated are recorded in maps
-  input_replacements and output_replacements respectively.
-
-  We cannot update input and output values while visiting since that
-  results in validation errors.
-  """
-
-  def enter_composite_transform(self, transform_node):
-self.visit_transform(transform_node)
-
-  def visit_transform(self, transform_node):
-for pcoll in transform_node.outputs.values():
-  pcolls_to_pcoll_id[str(pcoll)] = 
original_context.pcollections.get_id(
-  pcoll)
-
-pipeline.visit(PCollVisitor())
-return pcolls_to_pcoll_id
+return PipelineResult(result, pin)
 
 
 class PipelineResult(beam.runners.runner.PipelineResult):
   """Provides access to information about a pipeline."""
 
-  def __init__(self, underlying_result, runner, pipeline_info, cache_manager,
-   pcolls_to_pcoll_id):
+  def __init__(self, underlying_result, pin):
 super(PipelineResult, self).__init__(underlying_result.state)
-self._runner = runner
-self._pipeline_info = pipeline_info
-self._cache_manager = cache_manager
-self._pcolls_to_pcoll_id = pcolls_to_pcoll_id
-
-  def _cache_label(self, pcoll):
-

[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=345737&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345737
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 19/Nov/19 01:07
Start Date: 19/Nov/19 01:07
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10132: [BEAM-8016] 
Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r347687605
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/pipeline_instrument.py
 ##
 @@ -337,6 +358,7 @@ def cacheables(pcolls_to_pcoll_id):
   """
   pcoll_version_map = {}
   cacheables = {}
+  cacheable_var_by_pcoll_id = {}
 
 Review comment:
   Yes, it is actually PCollections that need to be cached. I thought it was a 
little bit wordy and chose the shorter form. I'll rephrase it to "PCollections 
that need to be cached" instead for clarity.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345737)
Time Spent: 2h 50m  (was: 2h 40m)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8629) WithTypeHints._get_or_create_type_hints may return a mutable copy of the class type hints.

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8629?focusedWorklogId=345734&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345734
 ]

ASF GitHub Bot logged work on BEAM-8629:


Author: ASF GitHub Bot
Created on: 19/Nov/19 01:02
Start Date: 19/Nov/19 01:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10080: [BEAM-8629] Don't 
return mutable class type hints.
URL: https://github.com/apache/beam/pull/10080#issuecomment-555282487
 
 
   Ping @udim 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345734)
Time Spent: 20m  (was: 10m)

> WithTypeHints._get_or_create_type_hints may return a mutable copy of the 
> class type hints.
> --
>
> Key: BEAM-8629
> URL: https://issues.apache.org/jira/browse/BEAM-8629
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=345735&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345735
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 19/Nov/19 01:02
Start Date: 19/Nov/19 01:02
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10132: [BEAM-8016] 
Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r347686447
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/pipeline_instrument.py
 ##
 @@ -314,10 +316,29 @@ def cache_key(self, pcoll):
cacheable['producer_version']))
 return ''
 
+  def cacheable_var_by_pcoll_id(self, pcoll_id):
+"""Retrieves the variable name of a PCollection.
+
+In source code, PCollection variables are defined in the user pipeline. 
When
+it's converted to the runner api representation, each PCollection 
referenced
+in the user pipeline is assigned a unique-within-pipeline pcoll_id. Given
+such pcoll_id, retrieves the str variable name defined in user pipeline for
+that referenced PCollection. If the PCollection is anonymous, return ''.
+"""
+return self._cacheable_var_by_pcoll_id.get(pcoll_id, '')
+
 
 def pin(pipeline, options=None):
-  """Creates PipelineInstrument for a pipeline and its options with cache."""
+  """Creates PipelineInstrument for a pipeline and its options with cache.
+
+  This is the shorthand for doing 3 steps: 1) compute once for metadata of 
given
 
 Review comment:
   Thanks, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345735)
Time Spent: 2.5h  (was: 2h 20m)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=345736&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345736
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 19/Nov/19 01:02
Start Date: 19/Nov/19 01:02
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10132: [BEAM-8016] 
Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r347686453
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/pipeline_instrument.py
 ##
 @@ -314,10 +316,29 @@ def cache_key(self, pcoll):
cacheable['producer_version']))
 return ''
 
+  def cacheable_var_by_pcoll_id(self, pcoll_id):
+"""Retrieves the variable name of a PCollection.
+
+In source code, PCollection variables are defined in the user pipeline. 
When
+it's converted to the runner api representation, each PCollection 
referenced
+in the user pipeline is assigned a unique-within-pipeline pcoll_id. Given
+such pcoll_id, retrieves the str variable name defined in user pipeline for
+that referenced PCollection. If the PCollection is anonymous, return ''.
+"""
+return self._cacheable_var_by_pcoll_id.get(pcoll_id, '')
+
 
 def pin(pipeline, options=None):
-  """Creates PipelineInstrument for a pipeline and its options with cache."""
+  """Creates PipelineInstrument for a pipeline and its options with cache.
+
+  This is the shorthand for doing 3 steps: 1) compute once for metadata of 
given
+  runner pipeline and everything watched from user pipelines; 2) associate info
+  between runner pipeline and its corresponding user pipeline, eliminate data
 
 Review comment:
   Thanks, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345736)
Time Spent: 2h 40m  (was: 2.5h)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=345732&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345732
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:52
Start Date: 19/Nov/19 00:52
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10132: [BEAM-8016] 
Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r347684208
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/pipeline_instrument.py
 ##
 @@ -80,7 +80,10 @@ def __init__(self, pipeline, options=None):
 # A mapping from PCollection id to python id() value in user defined
 # pipeline instance.
 (self._pcoll_version_map,
- self._cacheables) = cacheables(self.pcolls_to_pcoll_id)
+ self._cacheables,
 
 Review comment:
   Yes, I agree! It's similar to what pipeline_analyzer was doing but using 
`PipelineVisitor` and some `globals` instead of pipeline protos. We haven't 
achieved a consensus about what this should be called. We also have `cache 
augmented pipeline` and etc.. But none of the names describe what the module 
does accurately because the implementation details cover so many different 
execution routes and scenarios.
   We'll definitely get back to the documentation here once we make the final 
naming decision. Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345732)
Time Spent: 2h 20m  (was: 2h 10m)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8739) Consistently use with Pipeline(...) syntax

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8739?focusedWorklogId=345726&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345726
 ]

ASF GitHub Bot logged work on BEAM-8739:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:51
Start Date: 19/Nov/19 00:51
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10149: [BEAM-8739] 
Consistently use with Pipeline(...) syntax
URL: https://github.com/apache/beam/pull/10149
 
 
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![B

[jira] [Created] (BEAM-8739) Consistently use with Pipeline(...) syntax

2019-11-18 Thread Robert Bradshaw (Jira)
Robert Bradshaw created BEAM-8739:
-

 Summary: Consistently use with Pipeline(...) syntax
 Key: BEAM-8739
 URL: https://issues.apache.org/jira/browse/BEAM-8739
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Robert Bradshaw


I've run into a couple of tests that forgot to do p.run(). In addition, I'm 
seeing new tests written in this old style. We should consistently use the with 
syntax where possible for our examples and tests. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=345724&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345724
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:44
Start Date: 19/Nov/19 00:44
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10132: [BEAM-8016] 
Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r347682578
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/interactive_runner.py
 ##
 @@ -117,123 +119,43 @@ def apply(self, transform, pvalueish, options):
 return self._underlying_runner.apply(transform, pvalueish, options)
 
   def run_pipeline(self, pipeline, options):
-if not hasattr(self, '_desired_cache_labels'):
-  self._desired_cache_labels = set()
-
-# Invoke a round trip through the runner API. This makes sure the Pipeline
-# proto is stable.
-pipeline = beam.pipeline.Pipeline.from_runner_api(
-pipeline.to_runner_api(use_fake_coders=True),
-pipeline.runner,
-options)
-
-# Snapshot the pipeline in a portable proto before mutating it.
-pipeline_proto, original_context = pipeline.to_runner_api(
-return_context=True, use_fake_coders=True)
-pcolls_to_pcoll_id = self._pcolls_to_pcoll_id(pipeline, original_context)
-
-analyzer = pipeline_analyzer.PipelineAnalyzer(self._cache_manager,
-  pipeline_proto,
-  self._underlying_runner,
-  options,
-  self._desired_cache_labels)
-# Should be only accessed for debugging purpose.
-self._analyzer = analyzer
+pin = inst.pin(pipeline, options)
 
 pipeline_to_execute = beam.pipeline.Pipeline.from_runner_api(
-analyzer.pipeline_proto_to_execute(),
+pin.instrumented_pipeline_proto(),
 self._underlying_runner,
 options)
 
 if not self._skip_display:
-  display = display_manager.DisplayManager(
-  pipeline_proto=pipeline_proto,
-  pipeline_analyzer=analyzer,
-  cache_manager=self._cache_manager,
-  pipeline_graph_renderer=self._renderer)
-  display.start_periodic_update()
+  pg = pipeline_graph.PipelineGraph(pin.original_pipeline,
 
 Review comment:
   Sure! I'll use the full name `pipeline_graph`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345724)
Time Spent: 2h 10m  (was: 2h)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345723&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345723
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:43
Start Date: 19/Nov/19 00:43
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on issue #10145: [BEAM-8575] Add a 
Python test to test windowing in DoFn finish_bundle()
URL: https://github.com/apache/beam/pull/10145#issuecomment-555277935
 
 
   Thanks for the quick review!
   
   >move the timestamp assignment to beam.Create()
   
   This can be done, but does not make much difference, since we still want 
both the process() and finish_bundle() to do something in this test, and see 
the reason below.
   
   >and combine the map function into the ParDo.
   
   I think it's clearer to separate the Map from the test DoFn. The purpose of 
this test is to verify that after a DoFn with finish_bundle() implemented, it 
will produce results both from process() and finish_bundle(). More 
specifically, it wants to make sure that when windowing is involved, the output 
will be correct after the DoFn. The last Map is simply to print out all outputs 
from the test DoFn. Thoughts?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345723)
Time Spent: 10h 50m  (was: 10h 40m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345722&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345722
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:43
Start Date: 19/Nov/19 00:43
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on issue #10145: [BEAM-8575] Add a 
Python test to test windowing in DoFn finish_bundle()
URL: https://github.com/apache/beam/pull/10145#issuecomment-555277935
 
 
   Thanks for the quick review!
   
   >move the timestamp assignment to beam.Create()
   This can be done, but does not make much difference, since we still want 
both the process() and finish_bundle() to do something in this test, and see 
the reason below.
   
   >and combine the map function into the ParDo.
   I think it's clearer to separate the Map from the test DoFn. The purpose of 
this test is to verify that after a DoFn with finish_bundle() implemented, it 
will produce results both from process() and finish_bundle(). More 
specifically, it wants to make sure that when windowing is involved, the output 
will be correct after the DoFn. The last Map is simply to print out all outputs 
from the test DoFn. Thoughts?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345722)
Time Spent: 10h 40m  (was: 10.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=345720&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345720
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:37
Start Date: 19/Nov/19 00:37
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #10132: [BEAM-8016] Pipeline 
Graph
URL: https://github.com/apache/beam/pull/10132#issuecomment-555276401
 
 
   Thanks for the comments!
   > * Are we getting rid of the tooltips displaying the intermediate results? 
Do they not fit in the new model?
   We are offering a `show` API to users so that they can visualize a larger 
set of their data dynamically instead of peeking through a random static sample 
(which we still offer if the user calls `show` in an ipython terminal not a 
notebook web frontend).
   And with new Beam pipeline graph proposal and some pipeline graph library 
WIP, the tooltip in the future might show other metadata such as 
elapse/throughput to provide a consistent user experience that is similar to 
what users have on Dataflow.
   > * What do the PCollections look like if the user did not specify the 
PCollection name as a variable?
   Those PCollections will not be cached. The idea is when building a pipeline, 
if the user does not assign a PCollection to a variable, they would not be able 
to build the pipeline further from it and they cannot invoke `show(pcoll)` 
because they don't have access to `pcoll` in their code.
   Before, we have had the `leaf pcollection` concept for PCollections who have 
never been used as inputs. It doesn't work for PCollections consumed by sinks 
(with input but no output) even if the user has assigned them to a variable and 
they look like hanging PCollections. It also doesn't work in a notebook 
environment where new transforms can be added at different locations and 
PCollections can be re-evaluated due to cell-re-execution.
   
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345720)
Time Spent: 2h  (was: 1h 50m)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345715&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345715
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:30
Start Date: 19/Nov/19 00:30
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10070: 
[BEAM-8575] Added a unit test for Reshuffle to test that Reshuffle pr…
URL: https://github.com/apache/beam/pull/10070#discussion_r347679000
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util_test.py
 ##
 @@ -276,6 +279,13 @@ def process(self, element):
 with self.assertRaisesRegex(ValueError, r'window.*None.*add_timestamps2'):
   pipeline.run()
 
+class AddTimestamp(beam.DoFn):
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345715)
Time Spent: 10h 20m  (was: 10h 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345716&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345716
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:30
Start Date: 19/Nov/19 00:30
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10070: 
[BEAM-8575] Added a unit test for Reshuffle to test that Reshuffle pr…
URL: https://github.com/apache/beam/pull/10070#discussion_r347679045
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util_test.py
 ##
 @@ -276,6 +279,13 @@ def process(self, element):
 with self.assertRaisesRegex(ValueError, r'window.*None.*add_timestamps2'):
   pipeline.run()
 
+class AddTimestamp(beam.DoFn):
+  def process(self, element, timestamp=beam.DoFn.TimestampParam):
+yield beam.window.TimestampedValue(element, timestamp)
+
+class GetTimestamp(beam.DoFn):
+  def process(self, element, timestamp=beam.DoFn.TimestampParam):
+yield '{} - {}'.format(timestamp, element['name'])
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345716)
Time Spent: 10.5h  (was: 10h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345713&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345713
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:29
Start Date: 19/Nov/19 00:29
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10070: 
[BEAM-8575] Added a unit test for Reshuffle to test that Reshuffle pr…
URL: https://github.com/apache/beam/pull/10070#discussion_r347678737
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util_test.py
 ##
 @@ -276,6 +279,13 @@ def process(self, element):
 with self.assertRaisesRegex(ValueError, r'window.*None.*add_timestamps2'):
   pipeline.run()
 
+class AddTimestamp(beam.DoFn):
+  def process(self, element, timestamp=beam.DoFn.TimestampParam):
+yield beam.window.TimestampedValue(element, timestamp)
 
 Review comment:
   My understanding is that it should be a no-op, if Reshuffle preserves 
timestamps. This is what this test is testing. 
   Its Java parity is the testReshufflePreservesTimestamps in file
   
beam/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ReshuffleTest.java
   which wrapped the string element into TimestampedValue twice. The first time 
the element becomes TimestampedValue; the second time the element 
becomes 
   TimestampedValue>.
   Python doesn't have nested TimestampedValue type and doesn't have 
getTimestamp() either, so I used beam.DoFn.TimestampParam to get the timestamp 
twice, before and after Reshuffle.  Assuming beam.DoFn.TimestampParam is always 
the current timestamp bounded with an element, then it should work.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345713)
Time Spent: 10h  (was: 9h 50m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345714&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345714
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:29
Start Date: 19/Nov/19 00:29
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10070: 
[BEAM-8575] Added a unit test for Reshuffle to test that Reshuffle pr…
URL: https://github.com/apache/beam/pull/10070#discussion_r347678737
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util_test.py
 ##
 @@ -276,6 +279,13 @@ def process(self, element):
 with self.assertRaisesRegex(ValueError, r'window.*None.*add_timestamps2'):
   pipeline.run()
 
+class AddTimestamp(beam.DoFn):
+  def process(self, element, timestamp=beam.DoFn.TimestampParam):
+yield beam.window.TimestampedValue(element, timestamp)
 
 Review comment:
   My understanding is that it should be a no-op, if Reshuffle preserves 
timestamps. This is what this test is testing. 
   
   Its Java parity is the testReshufflePreservesTimestamps in file
   
beam/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ReshuffleTest.java
   which wrapped the string element into TimestampedValue twice. 
   The first time the element becomes TimestampedValue; 
   the second time the element becomes 
   TimestampedValue>.
   
   Python doesn't have nested TimestampedValue type and doesn't have 
getTimestamp() either, so I used beam.DoFn.TimestampParam to get the timestamp 
twice, before and after Reshuffle.  Assuming beam.DoFn.TimestampParam is always 
the current timestamp bounded with an element, then it should work.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345714)
Time Spent: 10h 10m  (was: 10h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345712&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345712
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:28
Start Date: 19/Nov/19 00:28
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #10145: [BEAM-8575] Add a 
Python test to test windowing in DoFn finish_bundle()
URL: https://github.com/apache/beam/pull/10145#issuecomment-555273207
 
 
   LGTM, I wonder would it be better to move the timestamp assignment to 
beam.Create() and combine the map function into the MyDoFn.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345712)
Time Spent: 9h 50m  (was: 9h 40m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6756) Support lazy iterables in schemas

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6756?focusedWorklogId=345710&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345710
 ]

ASF GitHub Bot logged work on BEAM-6756:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:28
Start Date: 19/Nov/19 00:28
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #10003: [BEAM-6756] 
Create Iterable type for Schema
URL: https://github.com/apache/beam/pull/10003#discussion_r347678482
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java
 ##
 @@ -292,7 +293,7 @@ private static Schema 
fromTableFieldSchema(List tableFieldSche
   if (!schemaField.getType().getNullable()) {
 field.setMode(Mode.REQUIRED.toString());
   }
-  if (TypeName.ARRAY == type.getTypeName()) {
+  if (type.getTypeName().isCollectionType()) {
 
 Review comment:
   I did another deep search - I found one case in SQL's codegen which I fixed, 
but can't find anything else.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345710)
Time Spent: 2h  (was: 1h 50m)

> Support lazy iterables in schemas
> -
>
> Key: BEAM-6756
> URL: https://issues.apache.org/jira/browse/BEAM-6756
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The iterables returned by GroupByKey and CoGroupByKey are lazy; this allows a 
> runner to page data into memory if the full iterable is too large. We 
> currently don't support this in Schemas, so the Schema Group and CoGroup 
> transforms materialize all data into memory. We should add support for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6756) Support lazy iterables in schemas

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6756?focusedWorklogId=345711&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345711
 ]

ASF GitHub Bot logged work on BEAM-6756:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:28
Start Date: 19/Nov/19 00:28
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #10003: [BEAM-6756] 
Create Iterable type for Schema
URL: https://github.com/apache/beam/pull/10003#discussion_r347678490
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/values/Row.java
 ##
 @@ -450,28 +469,28 @@ static int deepHashCodeForMap(
   return h;
 }
 
-static boolean deepEqualsForList(List a, List b, 
Schema.FieldType elementType) {
+static boolean deepEqualsForIterable(
+Iterable a, Iterable b, Schema.FieldType elementType) {
   if (a == b) {
 return true;
   }
 
-  if (a.size() != b.size()) {
-return false;
-  }
 
 Review comment:
   good catch - fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345711)
Time Spent: 2h 10m  (was: 2h)

> Support lazy iterables in schemas
> -
>
> Key: BEAM-6756
> URL: https://issues.apache.org/jira/browse/BEAM-6756
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The iterables returned by GroupByKey and CoGroupByKey are lazy; this allows a 
> runner to page data into memory if the full iterable is too large. We 
> currently don't support this in Schemas, so the Schema Group and CoGroup 
> transforms materialize all data into memory. We should add support for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345708&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345708
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:25
Start Date: 19/Nov/19 00:25
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #10145: [BEAM-8575] Add a 
Python test to test windowing in DoFn finish_bundle()
URL: https://github.com/apache/beam/pull/10145#issuecomment-555273207
 
 
   LGTM, I wonder would it be better to move the timestamp assignment to 
beam.Create() and combine the map function into the ParDo.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345708)
Time Spent: 9h 40m  (was: 9.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6756) Support lazy iterables in schemas

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6756?focusedWorklogId=345706&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345706
 ]

ASF GitHub Bot logged work on BEAM-6756:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:24
Start Date: 19/Nov/19 00:24
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #10003: [BEAM-6756] Create 
Iterable type for Schema
URL: https://github.com/apache/beam/pull/10003#issuecomment-555273059
 
 
   run dataflow validatesrunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345706)
Time Spent: 1h 40m  (was: 1.5h)

> Support lazy iterables in schemas
> -
>
> Key: BEAM-6756
> URL: https://issues.apache.org/jira/browse/BEAM-6756
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The iterables returned by GroupByKey and CoGroupByKey are lazy; this allows a 
> runner to page data into memory if the full iterable is too large. We 
> currently don't support this in Schemas, so the Schema Group and CoGroup 
> transforms materialize all data into memory. We should add support for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6756) Support lazy iterables in schemas

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6756?focusedWorklogId=345707&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345707
 ]

ASF GitHub Bot logged work on BEAM-6756:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:24
Start Date: 19/Nov/19 00:24
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #10003: [BEAM-6756] Create 
Iterable type for Schema
URL: https://github.com/apache/beam/pull/10003#issuecomment-555273094
 
 
   run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345707)
Time Spent: 1h 50m  (was: 1h 40m)

> Support lazy iterables in schemas
> -
>
> Key: BEAM-6756
> URL: https://issues.apache.org/jira/browse/BEAM-6756
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The iterables returned by GroupByKey and CoGroupByKey are lazy; this allows a 
> runner to page data into memory if the full iterable is too large. We 
> currently don't support this in Schemas, so the Schema Group and CoGroup 
> transforms materialize all data into memory. We should add support for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=345699&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345699
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:22
Start Date: 19/Nov/19 00:22
Worklog Time Spent: 10m 
  Work Description: qinyeli commented on pull request #10132: [BEAM-8016] 
Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r347667912
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/interactive_runner.py
 ##
 @@ -117,123 +119,43 @@ def apply(self, transform, pvalueish, options):
 return self._underlying_runner.apply(transform, pvalueish, options)
 
   def run_pipeline(self, pipeline, options):
-if not hasattr(self, '_desired_cache_labels'):
-  self._desired_cache_labels = set()
-
-# Invoke a round trip through the runner API. This makes sure the Pipeline
-# proto is stable.
-pipeline = beam.pipeline.Pipeline.from_runner_api(
-pipeline.to_runner_api(use_fake_coders=True),
-pipeline.runner,
-options)
-
-# Snapshot the pipeline in a portable proto before mutating it.
-pipeline_proto, original_context = pipeline.to_runner_api(
-return_context=True, use_fake_coders=True)
-pcolls_to_pcoll_id = self._pcolls_to_pcoll_id(pipeline, original_context)
-
-analyzer = pipeline_analyzer.PipelineAnalyzer(self._cache_manager,
-  pipeline_proto,
-  self._underlying_runner,
-  options,
-  self._desired_cache_labels)
-# Should be only accessed for debugging purpose.
-self._analyzer = analyzer
+pin = inst.pin(pipeline, options)
 
 pipeline_to_execute = beam.pipeline.Pipeline.from_runner_api(
-analyzer.pipeline_proto_to_execute(),
+pin.instrumented_pipeline_proto(),
 self._underlying_runner,
 options)
 
 if not self._skip_display:
-  display = display_manager.DisplayManager(
-  pipeline_proto=pipeline_proto,
-  pipeline_analyzer=analyzer,
-  cache_manager=self._cache_manager,
-  pipeline_graph_renderer=self._renderer)
-  display.start_periodic_update()
+  pg = pipeline_graph.PipelineGraph(pin.original_pipeline,
+render_option=self._render_option)
+  pg.display_graph()
 
 result = pipeline_to_execute.run()
 result.wait_until_finish()
 
-if not self._skip_display:
-  display.stop_periodic_update()
-
-return PipelineResult(result, self, self._analyzer.pipeline_info(),
-  self._cache_manager, pcolls_to_pcoll_id)
-
-  def _pcolls_to_pcoll_id(self, pipeline, original_context):
-"""Returns a dict mapping PCollections string to PCollection IDs.
-
-Using a PipelineVisitor to iterate over every node in the pipeline,
-records the mapping from PCollections to PCollections IDs. This mapping
-will be used to query cached PCollections.
-
-Args:
-  pipeline: (pipeline.Pipeline)
-  original_context: (pipeline_context.PipelineContext)
-
-Returns:
-  (dict from str to str) a dict mapping str(pcoll) to pcoll_id.
-"""
-pcolls_to_pcoll_id = {}
-
-from apache_beam.pipeline import PipelineVisitor  # pylint: 
disable=import-error
-
-class PCollVisitor(PipelineVisitor):  # pylint: 
disable=used-before-assignment
-  A visitor that records input and output values to be replaced.
-
-  Input and output values that should be updated are recorded in maps
-  input_replacements and output_replacements respectively.
-
-  We cannot update input and output values while visiting since that
-  results in validation errors.
-  """
-
-  def enter_composite_transform(self, transform_node):
-self.visit_transform(transform_node)
-
-  def visit_transform(self, transform_node):
-for pcoll in transform_node.outputs.values():
-  pcolls_to_pcoll_id[str(pcoll)] = 
original_context.pcollections.get_id(
-  pcoll)
-
-pipeline.visit(PCollVisitor())
-return pcolls_to_pcoll_id
+return PipelineResult(result, pin)
 
 
 class PipelineResult(beam.runners.runner.PipelineResult):
   """Provides access to information about a pipeline."""
 
-  def __init__(self, underlying_result, runner, pipeline_info, cache_manager,
-   pcolls_to_pcoll_id):
+  def __init__(self, underlying_result, pin):
 super(PipelineResult, self).__init__(underlying_result.state)
-self._runner = runner
-self._pipeline_info = pipeline_info
-self._cache_manager = cache_manager
-self._pcolls_to_pcoll_id = pcolls_to_pcoll_id
-
-  def _cache_label(self, pcoll):
-

[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=345702&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345702
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:22
Start Date: 19/Nov/19 00:22
Worklog Time Spent: 10m 
  Work Description: qinyeli commented on pull request #10132: [BEAM-8016] 
Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r347674135
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/pipeline_instrument.py
 ##
 @@ -314,10 +316,29 @@ def cache_key(self, pcoll):
cacheable['producer_version']))
 return ''
 
+  def cacheable_var_by_pcoll_id(self, pcoll_id):
+"""Retrieves the variable name of a PCollection.
+
+In source code, PCollection variables are defined in the user pipeline. 
When
+it's converted to the runner api representation, each PCollection 
referenced
+in the user pipeline is assigned a unique-within-pipeline pcoll_id. Given
+such pcoll_id, retrieves the str variable name defined in user pipeline for
+that referenced PCollection. If the PCollection is anonymous, return ''.
+"""
+return self._cacheable_var_by_pcoll_id.get(pcoll_id, '')
+
 
 def pin(pipeline, options=None):
-  """Creates PipelineInstrument for a pipeline and its options with cache."""
+  """Creates PipelineInstrument for a pipeline and its options with cache.
+
+  This is the shorthand for doing 3 steps: 1) compute once for metadata of 
given
 
 Review comment:
   nit: of the given
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345702)
Time Spent: 1h 40m  (was: 1.5h)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=345704&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345704
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:22
Start Date: 19/Nov/19 00:22
Worklog Time Spent: 10m 
  Work Description: qinyeli commented on pull request #10132: [BEAM-8016] 
Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r347674214
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/pipeline_instrument.py
 ##
 @@ -314,10 +316,29 @@ def cache_key(self, pcoll):
cacheable['producer_version']))
 return ''
 
+  def cacheable_var_by_pcoll_id(self, pcoll_id):
+"""Retrieves the variable name of a PCollection.
+
+In source code, PCollection variables are defined in the user pipeline. 
When
+it's converted to the runner api representation, each PCollection 
referenced
+in the user pipeline is assigned a unique-within-pipeline pcoll_id. Given
+such pcoll_id, retrieves the str variable name defined in user pipeline for
+that referenced PCollection. If the PCollection is anonymous, return ''.
+"""
+return self._cacheable_var_by_pcoll_id.get(pcoll_id, '')
+
 
 def pin(pipeline, options=None):
-  """Creates PipelineInstrument for a pipeline and its options with cache."""
+  """Creates PipelineInstrument for a pipeline and its options with cache.
+
+  This is the shorthand for doing 3 steps: 1) compute once for metadata of 
given
+  runner pipeline and everything watched from user pipelines; 2) associate info
+  between runner pipeline and its corresponding user pipeline, eliminate data
 
 Review comment:
   nit: between the runner pipeline
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345704)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=345701&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345701
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:22
Start Date: 19/Nov/19 00:22
Worklog Time Spent: 10m 
  Work Description: qinyeli commented on pull request #10132: [BEAM-8016] 
Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r347673119
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/interactive_runner.py
 ##
 @@ -117,123 +119,43 @@ def apply(self, transform, pvalueish, options):
 return self._underlying_runner.apply(transform, pvalueish, options)
 
   def run_pipeline(self, pipeline, options):
-if not hasattr(self, '_desired_cache_labels'):
-  self._desired_cache_labels = set()
-
-# Invoke a round trip through the runner API. This makes sure the Pipeline
-# proto is stable.
-pipeline = beam.pipeline.Pipeline.from_runner_api(
-pipeline.to_runner_api(use_fake_coders=True),
-pipeline.runner,
-options)
-
-# Snapshot the pipeline in a portable proto before mutating it.
-pipeline_proto, original_context = pipeline.to_runner_api(
-return_context=True, use_fake_coders=True)
-pcolls_to_pcoll_id = self._pcolls_to_pcoll_id(pipeline, original_context)
-
-analyzer = pipeline_analyzer.PipelineAnalyzer(self._cache_manager,
-  pipeline_proto,
-  self._underlying_runner,
-  options,
-  self._desired_cache_labels)
-# Should be only accessed for debugging purpose.
-self._analyzer = analyzer
+pin = inst.pin(pipeline, options)
 
 pipeline_to_execute = beam.pipeline.Pipeline.from_runner_api(
-analyzer.pipeline_proto_to_execute(),
+pin.instrumented_pipeline_proto(),
 self._underlying_runner,
 options)
 
 if not self._skip_display:
-  display = display_manager.DisplayManager(
-  pipeline_proto=pipeline_proto,
-  pipeline_analyzer=analyzer,
-  cache_manager=self._cache_manager,
-  pipeline_graph_renderer=self._renderer)
-  display.start_periodic_update()
+  pg = pipeline_graph.PipelineGraph(pin.original_pipeline,
 
 Review comment:
   Let's rename this into something more descriptive.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345701)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=345703&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345703
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:22
Start Date: 19/Nov/19 00:22
Worklog Time Spent: 10m 
  Work Description: qinyeli commented on pull request #10132: [BEAM-8016] 
Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r347675402
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/pipeline_instrument.py
 ##
 @@ -337,6 +358,7 @@ def cacheables(pcolls_to_pcoll_id):
   """
   pcoll_version_map = {}
   cacheables = {}
+  cacheable_var_by_pcoll_id = {}
 
 Review comment:
   Can't comment on already-committed lines, so commenting here.
   I didn't get what a "cache desired PCollection" is. cache-desired 
PCollection looks slightly better, but still doesn't tell what it is. 
PCollections that need to be cached?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345703)
Time Spent: 1h 40m  (was: 1.5h)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=345705&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345705
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:22
Start Date: 19/Nov/19 00:22
Worklog Time Spent: 10m 
  Work Description: qinyeli commented on pull request #10132: [BEAM-8016] 
Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r347676584
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/pipeline_instrument.py
 ##
 @@ -80,7 +80,10 @@ def __init__(self, pipeline, options=None):
 # A mapping from PCollection id to python id() value in user defined
 # pipeline instance.
 (self._pcoll_version_map,
- self._cacheables) = cacheables(self.pcolls_to_pcoll_id)
+ self._cacheables,
 
 Review comment:
   Adding comment here since GitHub does not allow comments on unchanged lines.
   It's probably not obvious what it means to "instrument" a pipeline. From my 
context back when working on Interactive Beam, I would guess it's something 
similar to the original pipeline_analyzer, (and I have to admit that "analyze" 
pipeline was not a much better.), but it's still fuzzy.
   
   Maybe adding some comment summarizing what pipeline_instrument does would be 
good. In a separate PR
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345705)
Time Spent: 1h 50m  (was: 1h 40m)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=345700&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345700
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:22
Start Date: 19/Nov/19 00:22
Worklog Time Spent: 10m 
  Work Description: qinyeli commented on pull request #10132: [BEAM-8016] 
Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r347666766
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pipeline_graph.py
 ##
 @@ -92,9 +104,20 @@ def __init__(self,
   default_vertex_attrs,
   default_edge_attrs)
 
+self._renderer = pipeline_graph_renderer.get_renderer(render_option)
+
   def get_dot(self):
 return self._get_graph().to_string()
 
+  def display_graph(self):
+rendered_graph = self._renderer.render_pipeline_graph(self)
+if ie.current_env().is_in_notebook:
+  try:
+from IPython.core import display
+display.display(display.HTML(rendered_graph))
+  except ImportError:  # Unlikely to happen when is_in_notebook.
+pass
 
 Review comment:
   Maybe print some error message instead of silently failing?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345700)
Time Spent: 1.5h  (was: 1h 20m)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=345698&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345698
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:21
Start Date: 19/Nov/19 00:21
Worklog Time Spent: 10m 
  Work Description: qinyeli commented on issue #10132: [BEAM-8016] Pipeline 
Graph
URL: https://github.com/apache/beam/pull/10132#issuecomment-555272354
 
 
   Thanks Ning! I haven't been catching up with the latest changes in 
Interactive Beam, so consider my comments as from someone with somewhat context 
but not all. A lot of my comments are "This is not very obvious, can you 
elaborate?"
   
   So it looks good in general. Some comments in line. Also a couple of 
questions out of curiosity.
   * Are we getting rid of the tooltips displaying the intermediate results? Do 
they not fit in the new model?
   * What do the PCollections look like if the user did not specify the 
PCollection name as a variable?
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345698)
Time Spent: 1h 20m  (was: 1h 10m)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=345697&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345697
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Nov/19 00:17
Start Date: 19/Nov/19 00:17
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #10119: [BEAM-8335] Adds 
the StreamingCache
URL: https://github.com/apache/beam/pull/10119#issuecomment-555271279
 
 
   Cool thank you, I pulled out one more PR from the previous mega-PR into 
https://github.com/apache/beam/pull/10146. It adds the timestamp proto 
translation utils.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345697)
Time Spent: 32h 40m  (was: 32.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 32h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=345677&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345677
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 18/Nov/19 23:53
Start Date: 18/Nov/19 23:53
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #10146: [BEAM-8335] Add 
timestamp to/from protos to Python SDK
URL: https://github.com/apache/beam/pull/10146#issuecomment-555265250
 
 
   > It would also be useful to cover the duration proto as well since both are 
used. Do you mind doing it in this or a follow-up PR?
   
   I don't mind, it's a small enough change.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345677)
Time Spent: 32.5h  (was: 32h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 32.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=345676&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345676
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 18/Nov/19 23:52
Start Date: 18/Nov/19 23:52
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10146: 
[BEAM-8335] Add timestamp to/from protos to Python SDK
URL: https://github.com/apache/beam/pull/10146#discussion_r347669663
 
 

 ##
 File path: sdks/python/apache_beam/utils/timestamp.py
 ##
 @@ -140,6 +141,23 @@ def to_rfc3339(self):
 # Append 'Z' for UTC timezone.
 return self.to_utc_datetime().isoformat() + 'Z'
 
+  def to_proto(self):
+"""Returns the `google.protobuf.timestamp_pb2` representation."""
+secs = self.micros // 100
+nanos = (self.micros - (secs * 100)) * 1000
+return timestamp_pb2.Timestamp(seconds=secs, nanos=nanos)
+
+  @staticmethod
+  def from_proto(timestamp_proto):
+"""Creates a Timestamp from a `google.protobuf.timestamp_pb2`.
+
+Note that the google has a sub-second resolution of nanoseconds whereas 
this
+class has a resolution of microsends. This class will truncate the
+nanosecond resolution down to the microsecond.
+"""
+return Timestamp(seconds=timestamp_proto.seconds,
+ micros=timestamp_proto.nanos // 1000)
 
 Review comment:
   Ack, it now raises a ValueError with a reference to a JIRA as well as a TODO 
comment.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345676)
Time Spent: 32h 20m  (was: 32h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 32h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=345675&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345675
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 18/Nov/19 23:51
Start Date: 18/Nov/19 23:51
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10146: 
[BEAM-8335] Add timestamp to/from protos to Python SDK
URL: https://github.com/apache/beam/pull/10146#discussion_r347669525
 
 

 ##
 File path: sdks/python/apache_beam/utils/timestamp.py
 ##
 @@ -140,6 +141,23 @@ def to_rfc3339(self):
 # Append 'Z' for UTC timezone.
 return self.to_utc_datetime().isoformat() + 'Z'
 
+  def to_proto(self):
+"""Returns the `google.protobuf.timestamp_pb2` representation."""
+secs = self.micros // 100
+nanos = (self.micros - (secs * 100)) * 1000
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345675)
Time Spent: 32h 10m  (was: 32h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 32h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8738) Revisit timestamp and duration representation

2019-11-18 Thread Sam Rohde (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Rohde updated BEAM-8738:

Description: 
The current proto representation of timesetamp and durations in Beam is either 
raw int64s or the well-known Google protobuf types "google.protobuf.timestamp" 
and "google.protobuf.duration". Apache Beam uses int64 MAX and MIN as sentinel 
values for an +inf watermark and -inf watermark. However, the 
google.protobuf.timestamp is compliant with RFC3339, meaning it can only 
represent date-times between 0001-01-01 and -12-31. This is not the same as 
the int64 MAX and MIN representation. The questions remain:
 * What should the timestamp and duration representations be?
 * What units should the timestamps and duration be? Nanos? Micros?
 * What algebra is allowed when dealing with timestamps and durations? What is 
needed?

See:
 *  
[https://lists.apache.org/thread.html/c8e7d8dc7d94487fae23fa2b74ee61aec93c94abbcbef3329ffbf3bd@%3Cdev.beam.apache.org%3E]
 * 
[https://lists.apache.org/thread.html/27fe9aa5b33dbee97db1bc924ee410048137e4fe97d9c79d131c010c@%3Cdev.beam.apache.org%3E]

 

  was:
The current proto representation of timesetamp and durations in Beam is either 
raw int64s or the well-known Google protobuf types "google.protobuf.timestamp" 
and "google.protobuf.duration". Apache Beam uses int64 MAX and MIN as sentinel 
values for an +inf watermark and -inf watermark. However, the 
google.protobuf.timestamp is compliant with RFC3339, meaning it can only 
represent date-times between 0001-01-01 and -12-31. This is not the same as 
the int64 MAX and MIN representation. The questions remain:
 * What should the timestamp and duration representations be?
 * What units should the timestamps and duration be? Nanos? Micros?
 * What algebra is allowed when dealing with timestamps and durations? What is 
needed?


> Revisit timestamp and duration representation
> -
>
> Key: BEAM-8738
> URL: https://issues.apache.org/jira/browse/BEAM-8738
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Sam Rohde
>Priority: Minor
>
> The current proto representation of timesetamp and durations in Beam is 
> either raw int64s or the well-known Google protobuf types 
> "google.protobuf.timestamp" and "google.protobuf.duration". Apache Beam uses 
> int64 MAX and MIN as sentinel values for an +inf watermark and -inf 
> watermark. However, the google.protobuf.timestamp is compliant with RFC3339, 
> meaning it can only represent date-times between 0001-01-01 and -12-31. 
> This is not the same as the int64 MAX and MIN representation. The questions 
> remain:
>  * What should the timestamp and duration representations be?
>  * What units should the timestamps and duration be? Nanos? Micros?
>  * What algebra is allowed when dealing with timestamps and durations? What 
> is needed?
> See:
>  *  
> [https://lists.apache.org/thread.html/c8e7d8dc7d94487fae23fa2b74ee61aec93c94abbcbef3329ffbf3bd@%3Cdev.beam.apache.org%3E]
>  * 
> [https://lists.apache.org/thread.html/27fe9aa5b33dbee97db1bc924ee410048137e4fe97d9c79d131c010c@%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8738) Revisit timestamp and duration representation

2019-11-18 Thread Sam Rohde (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Rohde updated BEAM-8738:

Description: 
The current proto representation of timesetamp and durations in Beam is either 
raw int64s or the well-known Google protobuf types "google.protobuf.timestamp" 
and "google.protobuf.duration". Apache Beam uses int64 MAX and MIN as sentinel 
values for an +inf watermark and -inf watermark. However, the 
google.protobuf.timestamp is compliant with RFC3339, meaning it can only 
represent date-times between 0001-01-01 and -12-31. This is not the same as 
the int64 MAX and MIN representation. The questions remain:
 * What should the timestamp and duration representations be?
 * What units should the timestamps and duration be? Nanos? Micros?
 * What algebra is allowed when dealing with timestamps and durations? What is 
needed?

See:
 * 
[https://lists.apache.org/thread.html/c8e7d8dc7d94487fae23fa2b74ee61aec93c94abbcbef3329ffbf3bd@%3Cdev.beam.apache.org%3E]
 * 
[https://lists.apache.org/thread.html/27fe9aa5b33dbee97db1bc924ee410048137e4fe97d9c79d131c010c@%3Cdev.beam.apache.org%3E]

 

  was:
The current proto representation of timesetamp and durations in Beam is either 
raw int64s or the well-known Google protobuf types "google.protobuf.timestamp" 
and "google.protobuf.duration". Apache Beam uses int64 MAX and MIN as sentinel 
values for an +inf watermark and -inf watermark. However, the 
google.protobuf.timestamp is compliant with RFC3339, meaning it can only 
represent date-times between 0001-01-01 and -12-31. This is not the same as 
the int64 MAX and MIN representation. The questions remain:
 * What should the timestamp and duration representations be?
 * What units should the timestamps and duration be? Nanos? Micros?
 * What algebra is allowed when dealing with timestamps and durations? What is 
needed?

See:
 *  
[https://lists.apache.org/thread.html/c8e7d8dc7d94487fae23fa2b74ee61aec93c94abbcbef3329ffbf3bd@%3Cdev.beam.apache.org%3E]
 * 
[https://lists.apache.org/thread.html/27fe9aa5b33dbee97db1bc924ee410048137e4fe97d9c79d131c010c@%3Cdev.beam.apache.org%3E]

 


> Revisit timestamp and duration representation
> -
>
> Key: BEAM-8738
> URL: https://issues.apache.org/jira/browse/BEAM-8738
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Sam Rohde
>Priority: Minor
>
> The current proto representation of timesetamp and durations in Beam is 
> either raw int64s or the well-known Google protobuf types 
> "google.protobuf.timestamp" and "google.protobuf.duration". Apache Beam uses 
> int64 MAX and MIN as sentinel values for an +inf watermark and -inf 
> watermark. However, the google.protobuf.timestamp is compliant with RFC3339, 
> meaning it can only represent date-times between 0001-01-01 and -12-31. 
> This is not the same as the int64 MAX and MIN representation. The questions 
> remain:
>  * What should the timestamp and duration representations be?
>  * What units should the timestamps and duration be? Nanos? Micros?
>  * What algebra is allowed when dealing with timestamps and durations? What 
> is needed?
> See:
>  * 
> [https://lists.apache.org/thread.html/c8e7d8dc7d94487fae23fa2b74ee61aec93c94abbcbef3329ffbf3bd@%3Cdev.beam.apache.org%3E]
>  * 
> [https://lists.apache.org/thread.html/27fe9aa5b33dbee97db1bc924ee410048137e4fe97d9c79d131c010c@%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8738) Revisit timestamp and duration representation

2019-11-18 Thread Sam Rohde (Jira)
Sam Rohde created BEAM-8738:
---

 Summary: Revisit timestamp and duration representation
 Key: BEAM-8738
 URL: https://issues.apache.org/jira/browse/BEAM-8738
 Project: Beam
  Issue Type: Improvement
  Components: beam-model
Reporter: Sam Rohde


The current proto representation of timesetamp and durations in Beam is either 
raw int64s or the well-known Google protobuf types "google.protobuf.timestamp" 
and "google.protobuf.duration". Apache Beam uses int64 MAX and MIN as sentinel 
values for an +inf watermark and -inf watermark. However, the 
google.protobuf.timestamp is compliant with RFC3339, meaning it can only 
represent date-times between 0001-01-01 and -12-31. This is not the same as 
the int64 MAX and MIN representation. The questions remain:
 * What should the timestamp and duration representations be?
 * What units should the timestamps and duration be? Nanos? Micros?
 * What algebra is allowed when dealing with timestamps and durations? What is 
needed?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=345665&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345665
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 18/Nov/19 23:24
Start Date: 18/Nov/19 23:24
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #10143: [BEAM-8645] To test 
state backed iterable coder in py sdk.
URL: https://github.com/apache/beam/pull/10143#issuecomment-555257322
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345665)
Time Spent: 3h 40m  (was: 3.5h)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345664&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345664
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Nov/19 23:24
Start Date: 18/Nov/19 23:24
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on issue #10102: [BEAM-8575] Test 
flatten a single PC and test flatten a flattened PC
URL: https://github.com/apache/beam/pull/10102#issuecomment-555257137
 
 
   Thanks, Robert! PTAL:)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345664)
Time Spent: 9.5h  (was: 9h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345658&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345658
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Nov/19 23:22
Start Date: 18/Nov/19 23:22
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on pull request #10102: 
[BEAM-8575] Test flatten a single PC and test flatten a flattened PC
URL: https://github.com/apache/beam/pull/10102#discussion_r347661431
 
 

 ##
 File path: sdks/python/apache_beam/transforms/ptransform_test.py
 ##
 @@ -550,11 +563,23 @@ def test_flatten_pcollections_in_iterable(self):
 assert_that(result, equal_to([0, 1, 2, 3, 4, 5, 6, 7]))
 pipeline.run()
 
+  @attr('ValidatesRunner')
+  def test_flatten_a_flattened_pcollection(self):
+pipeline = TestPipeline()
+pcoll_1 = pipeline | 'Start 1' >> beam.Create([0, 1, 2, 3])
+pcoll_2 = pipeline | 'Start 2' >> beam.Create([4, 5, 6, 7])
+result = ((pcoll_1, pcoll_2) | 'Flatten' >> beam.Flatten(),
 
 Review comment:
   Refactored as suggested:)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345658)
Time Spent: 9h  (was: 8h 50m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=345661&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345661
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 18/Nov/19 23:22
Start Date: 18/Nov/19 23:22
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10146: [BEAM-8335] 
Add timestamp to/from protos to Python SDK
URL: https://github.com/apache/beam/pull/10146#discussion_r347661383
 
 

 ##
 File path: sdks/python/apache_beam/utils/timestamp.py
 ##
 @@ -140,6 +141,23 @@ def to_rfc3339(self):
 # Append 'Z' for UTC timezone.
 return self.to_utc_datetime().isoformat() + 'Z'
 
+  def to_proto(self):
+"""Returns the `google.protobuf.timestamp_pb2` representation."""
+secs = self.micros // 100
+nanos = (self.micros - (secs * 100)) * 1000
+return timestamp_pb2.Timestamp(seconds=secs, nanos=nanos)
+
+  @staticmethod
+  def from_proto(timestamp_proto):
+"""Creates a Timestamp from a `google.protobuf.timestamp_pb2`.
+
+Note that the google has a sub-second resolution of nanoseconds whereas 
this
+class has a resolution of microsends. This class will truncate the
+nanosecond resolution down to the microsecond.
+"""
+return Timestamp(seconds=timestamp_proto.seconds,
+ micros=timestamp_proto.nanos // 1000)
 
 Review comment:
   I think we should fail here if we ever perform any rounding with a TODO 
stating that we should support nanos.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345661)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 32h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345659&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345659
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Nov/19 23:22
Start Date: 18/Nov/19 23:22
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on pull request #10102: 
[BEAM-8575] Test flatten a single PC and test flatten a flattened PC
URL: https://github.com/apache/beam/pull/10102#discussion_r347661504
 
 

 ##
 File path: sdks/python/apache_beam/transforms/ptransform_test.py
 ##
 @@ -550,11 +563,23 @@ def test_flatten_pcollections_in_iterable(self):
 assert_that(result, equal_to([0, 1, 2, 3, 4, 5, 6, 7]))
 pipeline.run()
 
+  @attr('ValidatesRunner')
+  def test_flatten_a_flattened_pcollection(self):
+pipeline = TestPipeline()
+pcoll_1 = pipeline | 'Start 1' >> beam.Create([0, 1, 2, 3])
+pcoll_2 = pipeline | 'Start 2' >> beam.Create([4, 5, 6, 7])
+result = ((pcoll_1, pcoll_2) | 'Flatten' >> beam.Flatten(),
+  ) |'Flatten again' >> beam.Flatten()
+assert_that(result, equal_to([x for x in range(8)]))
+pipeline.run()
+
+  @attr('ValidatesRunner')
 
 Review comment:
   I see. Reverted this one too.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345659)
Time Spent: 9h 10m  (was: 9h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=345660&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345660
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 18/Nov/19 23:22
Start Date: 18/Nov/19 23:22
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10146: [BEAM-8335] 
Add timestamp to/from protos to Python SDK
URL: https://github.com/apache/beam/pull/10146#discussion_r347661181
 
 

 ##
 File path: sdks/python/apache_beam/utils/timestamp.py
 ##
 @@ -140,6 +141,23 @@ def to_rfc3339(self):
 # Append 'Z' for UTC timezone.
 return self.to_utc_datetime().isoformat() + 'Z'
 
+  def to_proto(self):
+"""Returns the `google.protobuf.timestamp_pb2` representation."""
+secs = self.micros // 100
+nanos = (self.micros - (secs * 100)) * 1000
 
 Review comment:
   ```suggestion
   nanos = (self.micros % 100) * 1000
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345660)
Time Spent: 32h  (was: 31h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 32h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345662&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345662
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Nov/19 23:22
Start Date: 18/Nov/19 23:22
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on pull request #10102: 
[BEAM-8575] Test flatten a single PC and test flatten a flattened PC
URL: https://github.com/apache/beam/pull/10102#discussion_r347661571
 
 

 ##
 File path: sdks/python/apache_beam/transforms/ptransform_test.py
 ##
 @@ -550,11 +563,23 @@ def test_flatten_pcollections_in_iterable(self):
 assert_that(result, equal_to([0, 1, 2, 3, 4, 5, 6, 7]))
 pipeline.run()
 
+  @attr('ValidatesRunner')
+  def test_flatten_a_flattened_pcollection(self):
+pipeline = TestPipeline()
+pcoll_1 = pipeline | 'Start 1' >> beam.Create([0, 1, 2, 3])
+pcoll_2 = pipeline | 'Start 2' >> beam.Create([4, 5, 6, 7])
+result = ((pcoll_1, pcoll_2) | 'Flatten' >> beam.Flatten(),
+  ) |'Flatten again' >> beam.Flatten()
+assert_that(result, equal_to([x for x in range(8)]))
+pipeline.run()
+
+  @attr('ValidatesRunner')
   def test_flatten_input_type_must_be_iterable(self):
 # Inputs to flatten *must* be an iterable.
 with self.assertRaises(ValueError):
   4 | beam.Flatten()
 
+  @attr('ValidatesRunner')
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345662)
Time Spent: 9h 20m  (was: 9h 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345657&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345657
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Nov/19 23:21
Start Date: 18/Nov/19 23:21
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on pull request #10102: 
[BEAM-8575] Test flatten a single PC and test flatten a flattened PC
URL: https://github.com/apache/beam/pull/10102#discussion_r347661278
 
 

 ##
 File path: sdks/python/apache_beam/transforms/ptransform_test.py
 ##
 @@ -536,12 +538,23 @@ def test_flatten_no_pcollections(self):
 assert_that(result, equal_to([]))
 pipeline.run()
 
+  @attr('ValidatesRunner')
+  def test_flatten_one_single_pcollection(self):
+pipeline = TestPipeline()
+input = [0, 1, 2, 3]
+pcoll = pipeline | 'Input' >> beam.Create(input)
+result = (pcoll,)| 'Single Flatten' >> beam.Flatten()
+assert_that(result, equal_to(input))
+pipeline.run()
+
+  @attr('ValidatesRunner')
   def test_flatten_same_pcollections(self):
 pipeline = TestPipeline()
 pc = pipeline | beam.Create(['a', 'b'])
 assert_that((pc, pc, pc) | beam.Flatten(), equal_to(['a', 'b'] * 3))
 pipeline.run()
 
+  @attr('ValidatesRunner')
 
 Review comment:
   Reverted.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345657)
Time Spent: 8h 50m  (was: 8h 40m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7886) Make row coder a standard coder and implement in python

2019-11-18 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976961#comment-16976961
 ] 

Brian Hulette commented on BEAM-7886:
-

I don't think we have a jira for it yet.

The way it works in Java is that every logical type needs to have a base type, 
and there's some well known way to convert between the logical type and the 
base type (e.g. timestamp would be converted to int64 millis since the epoch 
and encoded with varint coder). Java also currently has some support for 
Java-only logical types that rely on putting a serialized class in the pipeline 
graph, but I'd like to move away from that.



> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 16h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=345641&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345641
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 18/Nov/19 22:46
Start Date: 18/Nov/19 22:46
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10132: [BEAM-8016] 
Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r347031607
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pipeline_graph.py
 ##
 @@ -136,14 +159,50 @@ def _generate_graph_dicts(self):
   vertex_dict[invisible_leaf] = {'style': 'invis'}
   self._edge_to_vertex_pairs[pcoll_id].append(
   (transform.unique_name, invisible_leaf))
-  edge_dict[(transform.unique_name, invisible_leaf)] = {}
+  if self._pin:
+edge_label = {'label':
+self._pin.cacheable_var_by_pcoll_id(pcoll_id)}
+edge_dict[(transform.unique_name, invisible_leaf)] = edge_label
+  else:
+edge_dict[(transform.unique_name, invisible_leaf)] = {}
+# For PCollections with more than one consuming PTransform, we also add
+# an invisible dummy node to diverge the edge in the middle as the
+# single output is used by multiple down stream PTransforms as inputs
+# instead of emitting multiple edges.
+elif len(self._consumers[pcoll_id]) > 1:
+  intermediate_dummy = 'diverge{}'.format(
+  hash(pcoll_id) % 1)
+  vertex_dict[intermediate_dummy] = {'shape': 'point',
+ 'width': '0'}
+  for consumer in self._consumers[pcoll_id]:
+producer_name = transform.unique_name
+consumer_name = transforms[consumer].unique_name
+self._edge_to_vertex_pairs[pcoll_id].append(
+(producer_name, intermediate_dummy))
+if self._pin:
+  edge_dict[(producer_name, intermediate_dummy)] = {
+  'arrowhead': 'none',
+  'label':
+self._pin.cacheable_var_by_pcoll_id(pcoll_id)}
+else:
+  edge_dict[(producer_name, intermediate_dummy)] = {
+  'arrowhead': 'none'}
+self._edge_to_vertex_pairs[pcoll_id].append(
+(intermediate_dummy, consumer_name))
+edge_dict[(intermediate_dummy, consumer_name)] = {}
 else:
   for consumer in self._consumers[pcoll_id]:
 producer_name = transform.unique_name
 consumer_name = transforms[consumer].unique_name
 self._edge_to_vertex_pairs[pcoll_id].append(
 (producer_name, consumer_name))
-edge_dict[(producer_name, consumer_name)] = {}
+if self._pin:
+  edge_dict[(producer_name, consumer_name)] = {
+  'label':
+self._pin.cacheable_var_by_pcoll_id(pcoll_id)
+  }
+else:
+  edge_dict[(producer_name, consumer_name)] = {}
 
 Review comment:
   The entire logic will be simplified once data-centric user flow is in-place 
and pcollections are rendered as nodes rather than labels on edges.
   
   The difference introduced:
   Before:
   
![1](https://user-images.githubusercontent.com/4423149/68979012-364f6480-07b1-11ea-9ec3-8022154a8499.png)
   After (edge diverge for single output, user variable names labelled 
PCollections):
   
![2](https://user-images.githubusercontent.com/4423149/69100500-2bdfd580-0a12-11ea-9ef3-cb735fe69775.png)
   
   In the future:
   With data-centric user flow, notebook cell execution metadata and new Beam 
pipeline graph proposal:
   
![3](https://user-images.githubusercontent.com/4423149/68979139-93e3b100-07b1-11ea-8931-a6825b39b7a4.png)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345641)
Time Spent: 1h 10m  (was: 1h)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipe

[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=345640&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345640
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 18/Nov/19 22:44
Start Date: 18/Nov/19 22:44
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #10135: [BEAM-8645] Create 
a py test case for Re-iteration on GBK result. 
URL: https://github.com/apache/beam/pull/10135#issuecomment-555245258
 
 
   Thanks for reviewing. 
   
   Updated. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345640)
Time Spent: 3.5h  (was: 3h 20m)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=345638&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345638
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 18/Nov/19 22:40
Start Date: 18/Nov/19 22:40
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on pull request #10135: [BEAM-8645] 
Create a py test case for Re-iteration on GBK result. 
URL: https://github.com/apache/beam/pull/10135#discussion_r347647686
 
 

 ##
 File path: sdks/python/apache_beam/transforms/ptransform_test.py
 ##
 @@ -471,6 +471,24 @@ def test_group_by_key(self):
 assert_that(result, equal_to([(1, [1, 2, 3]), (2, [1, 2]), (3, [1])]))
 pipeline.run()
 
+  def test_group_by_key_reiteration(self):
+class MyDoFn(beam.DoFn):
+  def process(self, gbk_result):
+value_list = gbk_result[1]
+sum_val = 0
+# Iterate the GBK result for multiple times.
+for _ in range(0, 17):
+  sum_val += sum(value_list)
+return (gbk_result[0], sum_val)
+
+pipeline = TestPipeline()
+pcoll = pipeline | 'start' >> beam.Create(
+[(1, 1), (1, 2), (1, 3), (1, 4)])
+result = (pcoll | 'Group' >> beam.GroupByKey()
+  | 'Reiteration-Sum' >> beam.ParDo(MyDoFn()))
+assert_that(result, equal_to([1, 170]))
 
 Review comment:
   Ah, turns out the DoFn above is supposed to return a list in py SDK.
   
   Thanks for catching this.  Fixed.  
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345638)
Time Spent: 3h 20m  (was: 3h 10m)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=345637&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345637
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 18/Nov/19 22:39
Start Date: 18/Nov/19 22:39
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on pull request #10135: [BEAM-8645] 
Create a py test case for Re-iteration on GBK result. 
URL: https://github.com/apache/beam/pull/10135#discussion_r347647417
 
 

 ##
 File path: sdks/python/apache_beam/transforms/ptransform_test.py
 ##
 @@ -471,6 +471,24 @@ def test_group_by_key(self):
 assert_that(result, equal_to([(1, [1, 2, 3]), (2, [1, 2]), (3, [1])]))
 pipeline.run()
 
+  def test_group_by_key_reiteration(self):
+class MyDoFn(beam.DoFn):
+  def process(self, gbk_result):
+value_list = gbk_result[1]
 
 Review comment:
   Thanks! Fixed. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345637)
Time Spent: 3h 10m  (was: 3h)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8592) DataCatalogTableProvider should not squash table components together into a string

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8592?focusedWorklogId=345635&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345635
 ]

ASF GitHub Bot logged work on BEAM-8592:


Author: ASF GitHub Bot
Created on: 18/Nov/19 22:32
Start Date: 18/Nov/19 22:32
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #10021: [BEAM-8592] 
Adjusting ZetaSQL table resolution to standard
URL: https://github.com/apache/beam/pull/10021#issuecomment-555240238
 
 
   run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345635)
Time Spent: 1h 10m  (was: 1h)

> DataCatalogTableProvider should not squash table components together into a 
> string
> --
>
> Key: BEAM-8592
> URL: https://issues.apache.org/jira/browse/BEAM-8592
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently, if a user writes a table name like \{{foo.`baz.bar`.bizzle}} 
> representing the components \{{"foo", "baz.bar", "bizzle"}} the 
> DataCatalogTableProvider will concatenate the components into a string and 
> resolve the identifier as if it represented \{{"foo", "baz", "bar", 
> "bizzle"}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345632&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345632
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Nov/19 22:25
Start Date: 18/Nov/19 22:25
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10070: [BEAM-8575] 
Added a unit test for Reshuffle to test that Reshuffle pr…
URL: https://github.com/apache/beam/pull/10070#discussion_r347641647
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util_test.py
 ##
 @@ -276,6 +279,13 @@ def process(self, element):
 with self.assertRaisesRegex(ValueError, r'window.*None.*add_timestamps2'):
   pipeline.run()
 
+class AddTimestamp(beam.DoFn):
 
 Review comment:
   As these are used only locally, define then locally (i.e. in the test 
method). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345632)
Time Spent: 8.5h  (was: 8h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345631&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345631
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Nov/19 22:25
Start Date: 18/Nov/19 22:25
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10070: [BEAM-8575] 
Added a unit test for Reshuffle to test that Reshuffle pr…
URL: https://github.com/apache/beam/pull/10070#discussion_r347641178
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util_test.py
 ##
 @@ -276,6 +279,13 @@ def process(self, element):
 with self.assertRaisesRegex(ValueError, r'window.*None.*add_timestamps2'):
   pipeline.run()
 
+class AddTimestamp(beam.DoFn):
+  def process(self, element, timestamp=beam.DoFn.TimestampParam):
+yield beam.window.TimestampedValue(element, timestamp)
 
 Review comment:
   This is a no-op. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345631)
Time Spent: 8.5h  (was: 8h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345633&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345633
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Nov/19 22:25
Start Date: 18/Nov/19 22:25
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10070: [BEAM-8575] 
Added a unit test for Reshuffle to test that Reshuffle pr…
URL: https://github.com/apache/beam/pull/10070#discussion_r347642132
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util_test.py
 ##
 @@ -276,6 +279,13 @@ def process(self, element):
 with self.assertRaisesRegex(ValueError, r'window.*None.*add_timestamps2'):
   pipeline.run()
 
+class AddTimestamp(beam.DoFn):
+  def process(self, element, timestamp=beam.DoFn.TimestampParam):
+yield beam.window.TimestampedValue(element, timestamp)
+
+class GetTimestamp(beam.DoFn):
+  def process(self, element, timestamp=beam.DoFn.TimestampParam):
+yield '{} - {}'.format(timestamp, element['name'])
 
 Review comment:
   (Optional) I would either call this FormatWithTimestamp, or have it return a 
tuple. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345633)
Time Spent: 8h 40m  (was: 8.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=345625&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345625
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 18/Nov/19 22:23
Start Date: 18/Nov/19 22:23
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #10146: [BEAM-8335] Add 
timestamp to/from protos to Python SDK
URL: https://github.com/apache/beam/pull/10146#issuecomment-555237183
 
 
   R: @lukecwik Hey Luke, are you able to review this please?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345625)
Time Spent: 31h 50m  (was: 31h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 31h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345622&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345622
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Nov/19 22:22
Start Date: 18/Nov/19 22:22
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10102: [BEAM-8575] 
Test flatten a single PC and test flatten a flattened PC
URL: https://github.com/apache/beam/pull/10102#discussion_r347640781
 
 

 ##
 File path: sdks/python/apache_beam/transforms/ptransform_test.py
 ##
 @@ -550,11 +563,23 @@ def test_flatten_pcollections_in_iterable(self):
 assert_that(result, equal_to([0, 1, 2, 3, 4, 5, 6, 7]))
 pipeline.run()
 
+  @attr('ValidatesRunner')
+  def test_flatten_a_flattened_pcollection(self):
+pipeline = TestPipeline()
+pcoll_1 = pipeline | 'Start 1' >> beam.Create([0, 1, 2, 3])
+pcoll_2 = pipeline | 'Start 2' >> beam.Create([4, 5, 6, 7])
+result = ((pcoll_1, pcoll_2) | 'Flatten' >> beam.Flatten(),
+  ) |'Flatten again' >> beam.Flatten()
+assert_that(result, equal_to([x for x in range(8)]))
+pipeline.run()
+
+  @attr('ValidatesRunner')
   def test_flatten_input_type_must_be_iterable(self):
 # Inputs to flatten *must* be an iterable.
 with self.assertRaises(ValueError):
   4 | beam.Flatten()
 
+  @attr('ValidatesRunner')
 
 Review comment:
   Same. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345622)
Time Spent: 8h 20m  (was: 8h 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345620&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345620
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Nov/19 22:22
Start Date: 18/Nov/19 22:22
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10102: [BEAM-8575] 
Test flatten a single PC and test flatten a flattened PC
URL: https://github.com/apache/beam/pull/10102#discussion_r347639407
 
 

 ##
 File path: sdks/python/apache_beam/transforms/ptransform_test.py
 ##
 @@ -536,12 +538,23 @@ def test_flatten_no_pcollections(self):
 assert_that(result, equal_to([]))
 pipeline.run()
 
+  @attr('ValidatesRunner')
+  def test_flatten_one_single_pcollection(self):
+pipeline = TestPipeline()
+input = [0, 1, 2, 3]
+pcoll = pipeline | 'Input' >> beam.Create(input)
+result = (pcoll,)| 'Single Flatten' >> beam.Flatten()
+assert_that(result, equal_to(input))
+pipeline.run()
+
+  @attr('ValidatesRunner')
   def test_flatten_same_pcollections(self):
 pipeline = TestPipeline()
 pc = pipeline | beam.Create(['a', 'b'])
 assert_that((pc, pc, pc) | beam.Flatten(), equal_to(['a', 'b'] * 3))
 pipeline.run()
 
+  @attr('ValidatesRunner')
 
 Review comment:
   This one doesn't need to be ValidatesRunner as it's an SDK feature. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345620)
Time Spent: 8h 10m  (was: 8h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345621&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345621
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Nov/19 22:22
Start Date: 18/Nov/19 22:22
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10102: [BEAM-8575] 
Test flatten a single PC and test flatten a flattened PC
URL: https://github.com/apache/beam/pull/10102#discussion_r347640740
 
 

 ##
 File path: sdks/python/apache_beam/transforms/ptransform_test.py
 ##
 @@ -550,11 +563,23 @@ def test_flatten_pcollections_in_iterable(self):
 assert_that(result, equal_to([0, 1, 2, 3, 4, 5, 6, 7]))
 pipeline.run()
 
+  @attr('ValidatesRunner')
+  def test_flatten_a_flattened_pcollection(self):
+pipeline = TestPipeline()
+pcoll_1 = pipeline | 'Start 1' >> beam.Create([0, 1, 2, 3])
+pcoll_2 = pipeline | 'Start 2' >> beam.Create([4, 5, 6, 7])
+result = ((pcoll_1, pcoll_2) | 'Flatten' >> beam.Flatten(),
+  ) |'Flatten again' >> beam.Flatten()
+assert_that(result, equal_to([x for x in range(8)]))
+pipeline.run()
+
+  @attr('ValidatesRunner')
 
 Review comment:
   This raises an error on construction, should not be ValidatesRunner. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345621)
Time Spent: 8h 20m  (was: 8h 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345619&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345619
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Nov/19 22:22
Start Date: 18/Nov/19 22:22
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10102: [BEAM-8575] 
Test flatten a single PC and test flatten a flattened PC
URL: https://github.com/apache/beam/pull/10102#discussion_r347640596
 
 

 ##
 File path: sdks/python/apache_beam/transforms/ptransform_test.py
 ##
 @@ -550,11 +563,23 @@ def test_flatten_pcollections_in_iterable(self):
 assert_that(result, equal_to([0, 1, 2, 3, 4, 5, 6, 7]))
 pipeline.run()
 
+  @attr('ValidatesRunner')
+  def test_flatten_a_flattened_pcollection(self):
+pipeline = TestPipeline()
+pcoll_1 = pipeline | 'Start 1' >> beam.Create([0, 1, 2, 3])
+pcoll_2 = pipeline | 'Start 2' >> beam.Create([4, 5, 6, 7])
+result = ((pcoll_1, pcoll_2) | 'Flatten' >> beam.Flatten(),
 
 Review comment:
   This is a bit hard to read. 
   
   It also would probably be good to have a third collection in the mix, e.g. 
   
   ```
   pcoll_12 = (pcoll_1, pcoll_2) | beam.Flatten()
   pcoll_123 = (pcoll_12, pcoll_3) | 'FlattenAgain' >> beam.Flatten()
   ```
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345619)
Time Spent: 8h 10m  (was: 8h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=345623&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345623
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 18/Nov/19 22:22
Start Date: 18/Nov/19 22:22
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10146: 
[BEAM-8335] Add timestamp to/from protos to Python SDK
URL: https://github.com/apache/beam/pull/10146
 
 
   Adds extra utility in the Python SDK Timestamp object to convert between it 
and the well-known google.protobuf.timestamp.Timestamp object.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.or

[jira] [Work logged] (BEAM-8691) Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8691?focusedWorklogId=345616&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345616
 ]

ASF GitHub Bot logged work on BEAM-8691:


Author: ASF GitHub Bot
Created on: 18/Nov/19 22:19
Start Date: 18/Nov/19 22:19
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10144: [BEAM-8691] 
Upgrading bigtable-client-core to latest 1.12.1
URL: https://github.com/apache/beam/pull/10144#issuecomment-555235926
 
 
   R: @chamikaramj, @lukecwik 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345616)
Time Spent: 1.5h  (was: 1h 20m)

> Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core
> --
>
> Key: BEAM-8691
> URL: https://issues.apache.org/jira/browse/BEAM-8691
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
>  - 2019-11-15 19:39:51.523448 
> -
> Please consider upgrading the dependency 
> com.google.cloud.bigtable:bigtable-client-core. 
> The current version is 1.8.0. The latest version is 1.12.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7473) Update RestrictionTracker within Python to not be required to be thread safe

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7473?focusedWorklogId=345615&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345615
 ]

ASF GitHub Bot logged work on BEAM-7473:


Author: ASF GitHub Bot
Created on: 18/Nov/19 22:17
Start Date: 18/Nov/19 22:17
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10118: [BEAM-7473] 
Pack RangeTracker into restriction
URL: https://github.com/apache/beam/pull/10118#discussion_r347636628
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -1402,20 +1402,36 @@ class _SDFBoundedSourceWrapper(ptransform.PTransform):
 
   NOTE: This transform can only be used with beam_fn_api enabled.
   """
+
+  class _SDFBoundedSourceRestriction(object):
+""" A restriction wraps SourceBundle and RangeTracker. """
+def __init__(self, source_bundle, range_tracker=None):
+  self.source_bundle = source_bundle
+  self.range_tracker = range_tracker
+
+def __reduce__(self):
+  # The instance of RangeTracker shouldn't be serialized.
+  return (self.__class__, (self.source_bundle, ))
+
+
   class _SDFBoundedSourceRestrictionTracker(RestrictionTracker):
 """An `iobase.RestrictionTracker` implementations for wrapping 
BoundedSource
-with SDF.
+with SDF. The tracked restriction is a (SourceBundle, RangeTracker) pair.
+In order to save bytes sent across the wire, the RangeTracker is set as
+system tracking RangeTracker only when current_restriction is called.
 
 Delegated RangeTracker guarantees synchronization safety.
 """
 def __init__(self, restriction):
-  if not isinstance(restriction, SourceBundle):
+  if not isinstance(restriction,
+_SDFBoundedSourceWrapper._SDFBoundedSourceRestriction):
 raise ValueError('Initializing SDFBoundedSourceRestrictionTracker'
- 'requires a SourceBundle')
-  self._delegate_range_tracker = restriction.source.get_range_tracker(
-  restriction.start_position, restriction.stop_position)
-  self._source = restriction.source
-  self._weight = restriction.weight
+ ' requires a _SDFBoundedSourceRestriction')
+  self._source = restriction.source_bundle.source
+  self._weight = restriction.source_bundle.weight
+  self._delegate_range_tracker = self._source.get_range_tracker(
 
 Review comment:
   How about just adding a range_tracker() method to 
_SDFBoundedSourceRestriction? Then _delegate_range_tracker would not have to be 
tracked here at all. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345615)
Time Spent: 1h 20m  (was: 1h 10m)

> Update RestrictionTracker within Python to not be required to be thread safe
> 
>
> Key: BEAM-7473
> URL: https://issues.apache.org/jira/browse/BEAM-7473
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The commit 
> [https://github.com/apache/beam/commit/8faffb2bcf28ccab5e9a95322743cc60df65077c#diff-ed95abb6bc30a9ed07faef5c3fea93f0]
>  modified the Java SDK removed the need for users to be thread safe and 
> instead made the framework provide the necessary locking around a restriction 
> tracker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7473) Update RestrictionTracker within Python to not be required to be thread safe

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7473?focusedWorklogId=345614&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345614
 ]

ASF GitHub Bot logged work on BEAM-7473:


Author: ASF GitHub Bot
Created on: 18/Nov/19 22:17
Start Date: 18/Nov/19 22:17
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10118: [BEAM-7473] 
Pack RangeTracker into restriction
URL: https://github.com/apache/beam/pull/10118#discussion_r347636857
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -1424,11 +1440,12 @@ def current_progress(self):
 def current_restriction(self):
   start_pos = self._delegate_range_tracker.start_position()
   stop_pos = self._delegate_range_tracker.stop_position()
-  return SourceBundle(
-  self._weight,
-  self._source,
-  start_pos,
-  stop_pos)
+  return _SDFBoundedSourceWrapper._SDFBoundedSourceRestriction(
 
 Review comment:
   Rather than re-creating it here, how about just storing (and returning) the 
initial _SDFBoundedSourceRestriction object (which does lazy initialization in 
range_tracker())? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345614)
Time Spent: 1h 10m  (was: 1h)

> Update RestrictionTracker within Python to not be required to be thread safe
> 
>
> Key: BEAM-7473
> URL: https://issues.apache.org/jira/browse/BEAM-7473
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The commit 
> [https://github.com/apache/beam/commit/8faffb2bcf28ccab5e9a95322743cc60df65077c#diff-ed95abb6bc30a9ed07faef5c3fea93f0]
>  modified the Java SDK removed the need for users to be thread safe and 
> instead made the framework provide the necessary locking around a restriction 
> tracker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7473) Update RestrictionTracker within Python to not be required to be thread safe

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7473?focusedWorklogId=345613&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345613
 ]

ASF GitHub Bot logged work on BEAM-7473:


Author: ASF GitHub Bot
Created on: 18/Nov/19 22:17
Start Date: 18/Nov/19 22:17
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10118: [BEAM-7473] 
Pack RangeTracker into restriction
URL: https://github.com/apache/beam/pull/10118#discussion_r347637047
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -1456,10 +1473,17 @@ def try_split(self, fraction_of_remainder):
 residual_weight = self._weight - primary_weight
 # Update self._weight to primary weight
 self._weight = primary_weight
-return (SourceBundle(primary_weight, self._source, start_pos,
- split_pos),
-SourceBundle(residual_weight, self._source, split_pos,
- stop_pos))
+return (
 
 Review comment:
   You could add a split method to the _SDFBoundedSourceRestriction that would 
contain most of the logic of this method. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345613)
Time Spent: 1h 10m  (was: 1h)

> Update RestrictionTracker within Python to not be required to be thread safe
> 
>
> Key: BEAM-7473
> URL: https://issues.apache.org/jira/browse/BEAM-7473
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The commit 
> [https://github.com/apache/beam/commit/8faffb2bcf28ccab5e9a95322743cc60df65077c#diff-ed95abb6bc30a9ed07faef5c3fea93f0]
>  modified the Java SDK removed the need for users to be thread safe and 
> instead made the framework provide the necessary locking around a restriction 
> tracker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7473) Update RestrictionTracker within Python to not be required to be thread safe

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7473?focusedWorklogId=345612&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345612
 ]

ASF GitHub Bot logged work on BEAM-7473:


Author: ASF GitHub Bot
Created on: 18/Nov/19 22:17
Start Date: 18/Nov/19 22:17
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10118: [BEAM-7473] 
Pack RangeTracker into restriction
URL: https://github.com/apache/beam/pull/10118#discussion_r347635742
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -1404,18 +1404,21 @@ class _SDFBoundedSourceWrapper(ptransform.PTransform):
   """
   class _SDFBoundedSourceRestrictionTracker(RestrictionTracker):
 """An `iobase.RestrictionTracker` implementations for wrapping 
BoundedSource
-with SDF.
+with SDF. The tracked restriction is a (SourceBundle, RangeTracker) pair.
+In order to save bytes sent across the wire, the RangeTracker is set as
+system tracking RangeTracker only when current_restriction is called.
 
 Review comment:
   Update comment?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345612)
Time Spent: 1h 10m  (was: 1h)

> Update RestrictionTracker within Python to not be required to be thread safe
> 
>
> Key: BEAM-7473
> URL: https://issues.apache.org/jira/browse/BEAM-7473
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The commit 
> [https://github.com/apache/beam/commit/8faffb2bcf28ccab5e9a95322743cc60df65077c#diff-ed95abb6bc30a9ed07faef5c3fea93f0]
>  modified the Java SDK removed the need for users to be thread safe and 
> instead made the framework provide the necessary locking around a restriction 
> tracker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-4775) JobService should support returning metrics

2019-11-18 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy resolved BEAM-4775.
-
Fix Version/s: Not applicable
   Resolution: Fixed

the JobServer part is done - all kinds of MonitroingInfos are forwarded through 
grpc to PortableRunners (sdk side). The PortableRunners can decide how to 
digest them (e.g. create MetricsResult from monitoring infos where possible). 

> JobService should support returning metrics
> ---
>
> Key: BEAM-4775
> URL: https://issues.apache.org/jira/browse/BEAM-4775
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Eugene Kirpichov
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 55h
>  Remaining Estimate: 0h
>
> Design doc: [https://s.apache.org/get-metrics-api].
> Further discussion is ongoing on [this 
> doc|https://docs.google.com/document/d/1m83TsFvJbOlcLfXVXprQm1B7vUakhbLZMzuRrOHWnTg/edit?ts=5c826bb4#heading=h.faqan9rjc6dm].
> We want to report job metrics back to the portability harness from the runner 
> harness, for displaying to users.
> h1. Relevant PRs in flight:
> h2. Ready for Review:
>  * [#8022|https://github.com/apache/beam/pull/8022]: correct the Job RPC 
> protos from [#8018|https://github.com/apache/beam/pull/8018].
> h2. Iterating / Discussing:
>  * [#7971|https://github.com/apache/beam/pull/7971]: Flink portable metrics: 
> get ptransform from MonitoringInfo, not stage name
>  ** this is a simpler, Flink-specific PR that is basically duplicated inside 
> each of the following two, so may be worth trying to merge in first
>  * #[7915|https://github.com/apache/beam/pull/7915]: use MonitoringInfo data 
> model in Java SDK metrics
>  * [#7868|https://github.com/apache/beam/pull/7868]: MonitoringInfo URN tweaks
> h2. Merged
>  * [#8018|https://github.com/apache/beam/pull/8018]: add job metrics RPC 
> protos
>  * [#7867|https://github.com/apache/beam/pull/7867]: key MetricResult by a 
> MetricKey
>  * [#7938|https://github.com/apache/beam/pull/7938]: move MonitoringInfo 
> protos to model/pipeline module
>  * [#7883|https://github.com/apache/beam/pull/7883]: Add 
> MetricQueryResults.allMetrics() helper
>  * [#7866|https://github.com/apache/beam/pull/7866]: move function helpers 
> from fn-harness to sdks/java/core
>  * [#7890|https://github.com/apache/beam/pull/7890]: consolidate MetricResult 
> implementations
> h2. Closed
>  * [#7934|https://github.com/apache/beam/pull/7934]: job metrics RPC + SDK 
> support
>  * [#7876|https://github.com/apache/beam/pull/7876]: Clean up metric protos; 
> support integer distributions, gauges



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

2019-11-18 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976938#comment-16976938
 ] 

Valentyn Tymofieiev commented on BEAM-8651:
---

It is possible that the error is caused by https://bugs.python.org/issue35943, 
which is fixed on cpython master, but not on 3.7 branch. This would explain why 
I can still reproduce the error on Python. 3.7.5rc1. Also, as per 
https://bugs.python.org/issue34572, pickling fixes will not be backported to 
Python 3.5, 3.6.

> Python 3 portable pipelines sometimes fail with errors in 
> StockUnpickler.find_class()
> -
>
> Key: BEAM-8651
> URL: https://issues.apache.org/jira/browse/BEAM-8651
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Attachments: beam8651.py
>
>
> Several Beam users [1,2] reported an error which happens on Python 3 in 
> StockUnpickler.find_class.
> So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink 
> and Dataflow runners. On Dataflow runner so far I have seen this in streaming 
> pipelines only, which use portable SDK worker.
> Typical stack trace:
> {noformat}
> File 
> "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", 
> line 1148, in _create_pardo_operation
>     dofn_data = pickler.loads(serialized_fn)  
>  
>   File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, 
> in loads
>     return dill.loads(s)  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 317, in loads
>  
>     return load(file, ignore) 
>  
>   File "python3.5/site-packages/dill/_dill.py", line 305, in load 
>  
>     obj = pik.load()  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 474, in find_class   
>  
>     return StockUnpickler.find_class(self, module, name)  
>  
> AttributeError: Can't get attribute 'ClassName' on  'python3.5/site-packages/filename.py'>
> {noformat}
> According to Guenther from [1]:
> {quote}
> This looks exactly like a race condition that we've encountered on Python
> 3.7.1: There's a bug in some older 3.7.x releases that breaks the
> thread-safety of the unpickler, as concurrent unpickle threads can access a
> module before it has been fully imported. See
> https://bugs.python.org/issue34572 for more information.
> The traceback shows a Python 3.6 venv so this could be a different issue
> (the unpickle bug was introduced in version 3.7). If it's the same bug then
> upgrading to Python 3.7.3 or higher should fix that issue. One potential
> workaround is to ensure that all of the modules get imported during the
> initialization of the sdk_worker, as this bug only affects imports done by
> the unpickler.
> {quote}
> Opening this for visibility. Current open questions are:
> 1. Find a minimal example to reproduce this issue.
> 2. Figure out whether users are still affected by this issue on Python 3.7.3.
> 3. Communicate a workarounds for 3.5, 3.6 users affected by this.
> [1] 
> https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-4775) JobService should support returning metrics

2019-11-18 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-4775:
---

Assignee: Lukasz Gajowy  (was: Kamil Wasilewski)

> JobService should support returning metrics
> ---
>
> Key: BEAM-4775
> URL: https://issues.apache.org/jira/browse/BEAM-4775
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Eugene Kirpichov
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 55h
>  Remaining Estimate: 0h
>
> Design doc: [https://s.apache.org/get-metrics-api].
> Further discussion is ongoing on [this 
> doc|https://docs.google.com/document/d/1m83TsFvJbOlcLfXVXprQm1B7vUakhbLZMzuRrOHWnTg/edit?ts=5c826bb4#heading=h.faqan9rjc6dm].
> We want to report job metrics back to the portability harness from the runner 
> harness, for displaying to users.
> h1. Relevant PRs in flight:
> h2. Ready for Review:
>  * [#8022|https://github.com/apache/beam/pull/8022]: correct the Job RPC 
> protos from [#8018|https://github.com/apache/beam/pull/8018].
> h2. Iterating / Discussing:
>  * [#7971|https://github.com/apache/beam/pull/7971]: Flink portable metrics: 
> get ptransform from MonitoringInfo, not stage name
>  ** this is a simpler, Flink-specific PR that is basically duplicated inside 
> each of the following two, so may be worth trying to merge in first
>  * #[7915|https://github.com/apache/beam/pull/7915]: use MonitoringInfo data 
> model in Java SDK metrics
>  * [#7868|https://github.com/apache/beam/pull/7868]: MonitoringInfo URN tweaks
> h2. Merged
>  * [#8018|https://github.com/apache/beam/pull/8018]: add job metrics RPC 
> protos
>  * [#7867|https://github.com/apache/beam/pull/7867]: key MetricResult by a 
> MetricKey
>  * [#7938|https://github.com/apache/beam/pull/7938]: move MonitoringInfo 
> protos to model/pipeline module
>  * [#7883|https://github.com/apache/beam/pull/7883]: Add 
> MetricQueryResults.allMetrics() helper
>  * [#7866|https://github.com/apache/beam/pull/7866]: move function helpers 
> from fn-harness to sdks/java/core
>  * [#7890|https://github.com/apache/beam/pull/7890]: consolidate MetricResult 
> implementations
> h2. Closed
>  * [#7934|https://github.com/apache/beam/pull/7934]: job metrics RPC + SDK 
> support
>  * [#7876|https://github.com/apache/beam/pull/7876]: Clean up metric protos; 
> support integer distributions, gauges



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-4777) Python PortableRunner should support metrics

2019-11-18 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-4777:
---

Assignee: Kamil Wasilewski  (was: Lukasz Gajowy)

> Python PortableRunner should support metrics
> 
>
> Key: BEAM-4777
> URL: https://issues.apache.org/jira/browse/BEAM-4777
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kamil Wasilewski
>Priority: Major
>
> BEAM-4775 concerns adding metrics to the JobService API; the current issue is 
> about making Python PortableRunner understand them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-5994) Publish metrics from load tests to BigQuery database

2019-11-18 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski resolved BEAM-5994.

Resolution: Fixed

> Publish metrics from load tests to BigQuery database
> 
>
> Key: BEAM-5994
> URL: https://issues.apache.org/jira/browse/BEAM-5994
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Kasia Kucharczyk
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=345610&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345610
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 18/Nov/19 22:08
Start Date: 18/Nov/19 22:08
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10135: [BEAM-8645] 
Create a py test case for Re-iteration on GBK result. 
URL: https://github.com/apache/beam/pull/10135#discussion_r347634526
 
 

 ##
 File path: sdks/python/apache_beam/transforms/ptransform_test.py
 ##
 @@ -471,6 +471,24 @@ def test_group_by_key(self):
 assert_that(result, equal_to([(1, [1, 2, 3]), (2, [1, 2]), (3, [1])]))
 pipeline.run()
 
+  def test_group_by_key_reiteration(self):
+class MyDoFn(beam.DoFn):
+  def process(self, gbk_result):
+value_list = gbk_result[1]
 
 Review comment:
   More idiomatic to do `key, value_list = gbk_result`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345610)
Time Spent: 2h 50m  (was: 2h 40m)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=345611&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345611
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 18/Nov/19 22:08
Start Date: 18/Nov/19 22:08
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10135: [BEAM-8645] 
Create a py test case for Re-iteration on GBK result. 
URL: https://github.com/apache/beam/pull/10135#discussion_r347635063
 
 

 ##
 File path: sdks/python/apache_beam/transforms/ptransform_test.py
 ##
 @@ -471,6 +471,24 @@ def test_group_by_key(self):
 assert_that(result, equal_to([(1, [1, 2, 3]), (2, [1, 2]), (3, [1])]))
 pipeline.run()
 
+  def test_group_by_key_reiteration(self):
+class MyDoFn(beam.DoFn):
+  def process(self, gbk_result):
+value_list = gbk_result[1]
+sum_val = 0
+# Iterate the GBK result for multiple times.
+for _ in range(0, 17):
+  sum_val += sum(value_list)
+return (gbk_result[0], sum_val)
+
+pipeline = TestPipeline()
+pcoll = pipeline | 'start' >> beam.Create(
+[(1, 1), (1, 2), (1, 3), (1, 4)])
+result = (pcoll | 'Group' >> beam.GroupByKey()
+  | 'Reiteration-Sum' >> beam.ParDo(MyDoFn()))
+assert_that(result, equal_to([1, 170]))
 
 Review comment:
   `equal_to` takes a list of PCollection elements. This should be 
`equal_to([(1, 170)])`. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345611)
Time Spent: 3h  (was: 2h 50m)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345603&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345603
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Nov/19 21:43
Start Date: 18/Nov/19 21:43
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on issue #10145: [BEAM-8575] Add a 
Python test to test windowing in DoFn finish_bundle()
URL: https://github.com/apache/beam/pull/10145#issuecomment-555223170
 
 
   R: @y1chi
   
   PTAL Yichi! Thanks:)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345603)
Time Spent: 8h  (was: 7h 50m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-4132) Element type inference doesn't work for multi-output DoFns

2019-11-18 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976906#comment-16976906
 ] 

Udi Meiri commented on BEAM-4132:
-

PCollections have element_type, inherited from PValue. The default value is 
None, which is not a valid value for a type.
Currently, the solution for DoOutputsTuple is to always set the type to Any.

The types should be derived from one of:
1. Tagged type hints. This doesn't exist yet.
2. Inferred result/output type. Pipeline._infer_result_type doesn't seem to 
work on undeclared tags (when DoOutputsTuple._tags is empty).


> Element type inference doesn't work for multi-output DoFns
> --
>
> Key: BEAM-4132
> URL: https://issues.apache.org/jira/browse/BEAM-4132
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.4.0
>Reporter: Chuan Yu Foo
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> TLDR: if you have a multi-output DoFn, then the non-main PCollections with 
> incorrectly have their element types set to None. This affects type checking 
> for pipelines involving these PCollections.
> Minimal example:
> {code}
> import apache_beam as beam
> class TripleDoFn(beam.DoFn):
>   def process(self, elem):
> yield_elem
> if elem % 2 == 0:
>   yield beam.pvalue.TaggedOutput('ten_times', elem * 10)
> if elem % 3 == 0:
>   yield beam.pvalue.TaggedOutput('hundred_times', elem * 100)
>   
> @beam.typehints.with_input_types(int)
> @beam.typehints.with_output_types(int)
> class MultiplyBy(beam.DoFn):
>   def __init__(self, multiplier):
> self._multiplier = multiplier
>   def process(self, elem):
> return elem * self._multiplier
>   
> def main():
>   with beam.Pipeline() as p:
> x, a, b = (
>   p
>   | 'Create' >> beam.Create([1, 2, 3])
>   | 'TripleDo' >> beam.ParDo(TripleDoFn()).with_outputs(
> 'ten_times', 'hundred_times', main='main_output'))
> _ = a | 'MultiplyBy2' >> beam.ParDo(MultiplyBy(2))
> if __name__ == '__main__':
>   main()
> {code}
> Running this yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 
> 'MultiplyBy2': requires  but got None for elem
> {noformat}
> Replacing {{a}} with {{b}} yields the same error. Replacing {{a}} with {{x}} 
> instead yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 
> 'MultiplyBy2': requires  but got Union[TaggedOutput, int] for elem
> {noformat}
> I would expect Beam to correctly infer that {{a}} and {{b}} have element 
> types of {{int}} rather than {{None}}, and I would also expect Beam to 
> correctly figure out that the element types of {{x}} are compatible with 
> {{int}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=345597&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345597
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Nov/19 21:34
Start Date: 18/Nov/19 21:34
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on pull request #10145: 
[BEAM-8575] Add a Python test to test windowing in DoFn finish_bundle()
URL: https://github.com/apache/beam/pull/10145
 
 
   This test is the Python parity for the Java ParDo test 
testWindowingInStartAndFinishBundle. 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=345589&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-345589
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 18/Nov/19 21:22
Start Date: 18/Nov/19 21:22
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10119: [BEAM-8335] Adds 
the StreamingCache
URL: https://github.com/apache/beam/pull/10119#issuecomment-555215039
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 345589)
Time Spent: 31.5h  (was: 31h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 31.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8668) TypeError: _create_function() takes from 2 to 6 positional arguments but 7 were given

2019-11-18 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976890#comment-16976890
 ] 

Valentyn Tymofieiev commented on BEAM-8668:
---

Correction: This happens when SDK uses dill >=0.3.1.1 for pickling and worker 
uses dill <= 0.3.0 for unpickling.



> TypeError: _create_function() takes from 2 to 6 positional arguments but 7 
> were given
> -
>
> Key: BEAM-8668
> URL: https://issues.apache.org/jira/browse/BEAM-8668
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>
> This error looks to me a lot like a mismatch between SDK versions (Dill 
> dependencies), but I've had a couple reports of it so I thought it worth 
> making it discoverable here.
>  
> Traceback (most recent call last):
> File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 158, in _execute
> response = task()
> File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 191, in 
> self._execute(lambda: worker.do_instruction(work), work)
> File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 343, in do_instruction
> request.instruction_id)
> File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 363, in process_bundle
> instruction_id, request.process_bundle_descriptor_reference)
> File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 306, in get
> self.data_channel_factory)
> File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 578, in __init__
> self.ops = self.create_execution_tree(self.process_bundle_descriptor)
> File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 622, in create_execution_tree
> descriptor.transforms, key=topological_height, reverse=True)])
> File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 621, in 
> for transform_id in sorted(
> File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 546, in wrapper
> result = cache[args] = func(*args)
> File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 605, in get_operation
> in descriptor.transforms[transform_id].outputs.items()
> File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 604, in 
> for tag, pcoll_id
> File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 603, in 
> tag: [get_operation(op) for op in pcoll_consumers[pcoll_id]]
> File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 546, in wrapper
> result = cache[args] = func(*args)
> File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 608, in get_operation
> transform_id, transform_consumers)
> File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 867, in create_operation
> return creator(self, transform_id, transform_proto, payload, consumers)
> File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 1110, in create
> serialized_fn, parameter)
> File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 1148, in _create_pardo_operation
> dofn_data = pickler.loads(serialized_fn)
> File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", 
> line 265, in loads
> return dill.loads(s)
> File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 317, in 
> loads
> return load(file, ignore)
> File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 305, in load
> obj = pik.load()
> TypeError: _create_function() takes from 2 to 6 positional arguments but 7 
> were given
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   >