date:20200121

[jira] [Work logged] (BEAM-8941) Create a common place for Load Tests configuration

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8941?focusedWorklogId=375415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375415
 ]

ASF GitHub Bot logged work on BEAM-8941:


Author: ASF GitHub Bot
Created on: 22/Jan/20 05:54
Start Date: 22/Jan/20 05:54
Worklog Time Spent: 10m 
  Work Description: pawelpasterz commented on issue #10543: [BEAM-8941] 
Implement simple DSL for load tests
URL: https://github.com/apache/beam/pull/10543#issuecomment-577020379
 
 
   Hey @kkucharc , I fixed typo, should work now :) Please rerun seed job and 
smoke test, thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375415)
Time Spent: 4h 10m  (was: 4h)

> Create a common place for Load Tests configuration
> --
>
> Key: BEAM-8941
> URL: https://issues.apache.org/jira/browse/BEAM-8941
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Pawel Pasterz
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> The Apache Beam community maintains different versions of each Load Test. For 
> example, right now, there are two versions of all Python Load Tests: the 
> first one runs on Dataflow runner, and the second one runs on Flink. With the 
> lack of a common place where configuration for the tests can be stored, the 
> configuration is duplicated many times with minimal differences.
> The goal is to create a common place for the configuration, so that it could 
> be passed to different files with tests (.test-infra/jenkins/*.groovy) and 
> filtered according to needs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8618) Tear down unused DoFns periodically in Python SDK harness

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8618?focusedWorklogId=375408=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375408
 ]

ASF GitHub Bot logged work on BEAM-8618:


Author: ASF GitHub Bot
Created on: 22/Jan/20 05:39
Start Date: 22/Jan/20 05:39
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #10655: [BEAM-8618] 
Tear down unused DoFns periodically in Python SDK harness.
URL: https://github.com/apache/beam/pull/10655#issuecomment-577016962
 
 
   R: @robertwb @lukecwik 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375408)
Time Spent: 20m  (was: 10m)

> Tear down unused DoFns periodically in Python SDK harness
> -
>
> Key: BEAM-8618
> URL: https://issues.apache.org/jira/browse/BEAM-8618
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Per the discussion in the ML, detail can be found [1],  the teardown of DoFns 
> should be supported in the portability framework. It happens at two places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for tear down the unused DoFns 
> periodically in Python SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8618) Tear down unused DoFns periodically in Python SDK harness

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8618?focusedWorklogId=375407=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375407
 ]

ASF GitHub Bot logged work on BEAM-8618:


Author: ASF GitHub Bot
Created on: 22/Jan/20 05:39
Start Date: 22/Jan/20 05:39
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on pull request #10655: 
[BEAM-8618] Tear down unused DoFns periodically in Python SDK harness.
URL: https://github.com/apache/beam/pull/10655
 
 
   Tear down unused DoFns periodically in Python SDK harness.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build

[jira] [Work logged] (BEAM-7847) Generate Python SDK docs using Python 3

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7847?focusedWorklogId=375403=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375403
 ]

ASF GitHub Bot logged work on BEAM-7847:


Author: ASF GitHub Bot
Created on: 22/Jan/20 05:21
Start Date: 22/Jan/20 05:21
Worklog Time Spent: 10m 
  Work Description: lazylynx commented on pull request #10141: [BEAM-7847] 
enabled to generate SDK docs with Python3
URL: https://github.com/apache/beam/pull/10141#discussion_r369374288
 
 

 ##
 File path: sdks/python/tox.ini
 ##
 @@ -290,7 +290,7 @@ commands =
 [testenv:docs]
 extras = test,gcp,docs,interactive
 deps =
-  Sphinx==1.6.5
+  Sphinx==1.8.5
   sphinx_rtd_theme==0.2.4
 
 Review comment:
   Got it. Thank you!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375403)
Time Spent: 5h 50m  (was: 5h 40m)

> Generate Python SDK docs using Python 3 
> 
>
> Key: BEAM-7847
> URL: https://issues.apache.org/jira/browse/BEAM-7847
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Currently scripts/generate_pydoc.sh script fails on Python 3 with 
> "RuntimeError: empty_like method already has a docstring" errors:
> {noformat}
> pip install -e .[gcp,test]
> pip install Sphinx==1.6.5
> pip install sphinx_rtd_theme==0.2.4
> ./scripts/generate_pydoc.sh
> /home/valentyn/projects/beam/beam/beam/sdks/python/target/docs/source/apache_beam.testing.benchmarks.nexmark.queries.query0.rst:4:
>  WARNING: autodoc: failed to import module 
> 'apache_beam.testing.benchmarks.nexmark.queries.query0'; the following 
> exception was raised:
> Traceback (most recent call last):
>   File 
> "/home/valentyn/tmp/venv/py3/lib/python3.6/site-packages/sphinx/ext/autodoc.py",
>  line 658, in import_object
> __import__(self.modname)
>   File 
> "/home/valentyn/projects/beam/beam/beam/sdks/python/apache_beam/__init__.py", 
> line 98, in 
> from apache_beam import io
>   File 
> "/home/valentyn/projects/beam/beam/beam/sdks/python/apache_beam/io/__init__.py",
>  line 22, in 
> from apache_beam.io.avroio import *
>   File 
> "/home/valentyn/projects/beam/beam/beam/sdks/python/apache_beam/io/avroio.py",
>  line 61, in 
> from apache_beam.io import filebasedsink
>   File 
> "/home/valentyn/projects/beam/beam/beam/sdks/python/apache_beam/io/filebasedsink.py",
>  line 34, in 
> from apache_beam.io import iobase
>   File 
> "/home/valentyn/projects/beam/beam/beam/sdks/python/apache_beam/io/iobase.py",
>  line 50, in 
> from apache_beam.transforms import core
>   File 
> "/home/valentyn/projects/beam/beam/beam/sdks/python/apache_beam/transforms/__init__.py",
>  line 29, in 
> from apache_beam.transforms.util import *
>   File 
> "/home/valentyn/projects/beam/beam/beam/sdks/python/apache_beam/transforms/util.py",
>  line 228, in 
> class _BatchSizeEstimator(object):
>   File 
> "/home/valentyn/projects/beam/beam/beam/sdks/python/apache_beam/transforms/util.py",
>  line 359, in _BatchSizeEstimator
> import numpy as np
>   File 
> "/home/valentyn/tmp/venv/py3/lib/python3.6/site-packages/numpy/__init__.py", 
> line 142, in 
> from . import core
>   File 
> "/home/valentyn/tmp/venv/py3/lib/python3.6/site-packages/numpy/core/__init__.py",
>  line 17, in 
> from . import multiarray
>   File 
> "/home/valentyn/tmp/venv/py3/lib/python3.6/site-packages/numpy/core/multiarray.py",
>  line 78, in 
> def empty_like(prototype, dtype=None, order=None, subok=None, shape=None):
>   File 
> "/home/valentyn/tmp/venv/py3/lib/python3.6/site-packages/numpy/core/overrides.py",
>  line 203, in decorator
> docs_from_dispatcher=docs_from_dispatcher)(implementation)
>   File 
> "/home/valentyn/tmp/venv/py3/lib/python3.6/site-packages/numpy/core/overrides.py",
>  line 159, in decorator
> add_docstring(implementation, dispatcher.__doc__)
> RuntimeError: empty_like method already has a docstring
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9008) Add readAll() method to CassandraIO

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9008?focusedWorklogId=375391=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375391
 ]

ASF GitHub Bot logged work on BEAM-9008:


Author: ASF GitHub Bot
Created on: 22/Jan/20 04:09
Start Date: 22/Jan/20 04:09
Worklog Time Spent: 10m 
  Work Description: vmarquez commented on issue #10546: [BEAM-9008] Add 
CassandraIO readAll method
URL: https://github.com/apache/beam/pull/10546#issuecomment-576998681
 
 
   Thanks for taking a look, I've pushed a new commit (I won't be squashing 
anymore until we finalize everything) that hopefully addresses all the minor 
style issues.  LMK if I missed anything. 
   
   I do want to make sure we keep the cassandraIO 'idiomatic' to the rest of 
the IO connectors, but I don't think modeling this after the SOLR one will 
work.  For one thing, if we want to share the `ReadFn` class between both Read 
and ReadAll, it means we have to have some way of having both use it and pass 
in 'connection' information, which we can't do if the signature of ReadFn is 
`ReadFn extends DoFn, A>`.  I think another class to look at for 
something that has both Read and ReadAll PTransforms is the SpannerIO, which is 
modeled similarly to how I did it here (though not exactly). 
   
   
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java#L315
   
   They have a configuration class there is public (which we can't do since we 
want to keep backwards compatibility with the current way `Read` works), it has 
two different PTransforms, `Read` uses `ReadAll` internally, etc.  
   
   I do think instead of taking a collection of RingRanges, taking some sort of 
'Query' object makes sense, and the idea that it doesn't have to tie in to the 
actual connection means we can split up the CassandraConfig class.  Thoughts on 
that? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375391)
Time Spent: 2h 40m  (was: 2.5h)

> Add readAll() method to CassandraIO
> ---
>
> Key: BEAM-9008
> URL: https://issues.apache.org/jira/browse/BEAM-9008
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-cassandra
>Affects Versions: 2.16.0
>Reporter: vincent marquez
>Assignee: vincent marquez
>Priority: Minor
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When querying a large cassandra database, it's often *much* more useful to 
> programatically generate the queries needed to to be run rather than reading 
> all partitions and attempting some filtering.  
> As an example:
> {code:java}
> public class Event { 
>@PartitionKey(0) public UUID accountId;
>@PartitionKey(1)public String yearMonthDay; 
>@ClusteringKey public UUID eventId;  
>//other data...
> }{code}
> If there is ten years worth of data, you may want to only query one year's 
> worth.  Here each token range would represent one 'token' but all events for 
> the day. 
> {code:java}
> Set accounts = getRelevantAccounts();
> Set dateRange = generateDateRange("2018-01-01", "2019-01-01");
> PCollection tokens = generateTokens(accounts, dateRange); 
> {code}
>  
>  I propose an additional _readAll()_ PTransform that can take a PCollection 
> of token ranges and can return a PCollection of what the query would 
> return. 
> *Question: How much code should be in common between both methods?* 
> Currently the read connector already groups all partitions into a List of 
> Token Ranges, so it would be simple to refactor the current read() based 
> method to a 'ParDo' based one and have them both share the same function.  
> Reasons against sharing code between read and readAll
>  * Not having the read based method return a BoundedSource connector would 
> mean losing the ability to know the size of the data returned
>  * Currently the CassandraReader executes all the grouped TokenRange queries 
> *asynchronously* which is (maybe?) fine when all that's happening is 
> splitting up all the partition ranges but terrible for executing potentially 
> millions of queries. 
>  Reasons _for_ sharing code would be simplified code base and that both of 
> the above issues would most likely have a negligable performance impact. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-6550) ParDo Async Java API

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-6550?focusedWorklogId=375390=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375390
 ]

ASF GitHub Bot logged work on BEAM-6550:


Author: ASF GitHub Bot
Created on: 22/Jan/20 04:04
Start Date: 22/Jan/20 04:04
Worklog Time Spent: 10m 
  Work Description: mynameborat commented on pull request #10651: 
[BEAM-6550] ParDo Async Java API
URL: https://github.com/apache/beam/pull/10651
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375390)
Time Spent: 20m  (was: 10m)

> ParDo Async Java API
> 
>
> Key: BEAM-6550
> URL: https://issues.apache.org/jira/browse/BEAM-6550
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Xinyu Liu
>Assignee: Bharath Kumarasubramanian
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This ticket is to track the work on adding the ParDo async API. The 
> motivation for this is:
> - Many users are experienced in asynchronous programming. With async 
> frameworks such as Netty and ParSeq and libs like async jersey client, they 
> are able to make remote calls efficiently and the libraries help manage the 
> execution threads underneath. Async remote calls are very common in most of 
> our streaming applications today.
> - Many jobs are running on a multi-tenancy cluster. Async processing helps 
> for less resource usage and fast computation (less context switch).
> This API has become one of the most asked Java api from SamzaRunner users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=375364=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375364
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 22/Jan/20 02:06
Start Date: 22/Jan/20 02:06
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10051: [BEAM-7961] Add 
tests for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#discussion_r369338933
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ValidateRunnerXlangTest.java
 ##
 @@ -0,0 +1,227 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import static org.hamcrest.Matchers.equalTo;
+import static org.junit.Assert.assertThat;
+
+import java.io.Serializable;
+import java.nio.charset.StandardCharsets;
+import java.util.Arrays;
+import org.apache.beam.sdk.PipelineResult;
+import org.apache.beam.sdk.options.ExperimentalOptions;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.testing.UsesCrossLanguageTransforms;
+import org.apache.beam.sdk.testing.ValidatesRunner;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.MapElements;
+import org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.PCollectionTuple;
+import org.apache.beam.sdk.values.TypeDescriptors;
+import org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ConnectivityState;
+import org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ManagedChannel;
+import org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ManagedChannelBuilder;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Test External transforms. */
+@RunWith(JUnit4.class)
+public class ValidateRunnerXlangTest implements Serializable {
+  @Rule public transient TestPipeline testPipeline = TestPipeline.create();
+  private PipelineResult pipelineResult;
+
+  private static final String TEST_PREFIX_URN = 
"beam:transforms:xlang:test:prefix";
+  private static final String TEST_MULTI_URN = 
"beam:transforms:xlang:test:multi";
+  private static final String TEST_GBK_URN = "beam:transforms:xlang:test:gbk";
+  private static final String TEST_CGBK_URN = 
"beam:transforms:xlang:test:cgbk";
+  private static final String TEST_COMGL_URN = 
"beam:transforms:xlang:test:comgl";
+  private static final String TEST_COMPK_URN = 
"beam:transforms:xlang:test:compk";
+  private static final String TEST_FLATTEN_URN = 
"beam:transforms:xlang:test:flatten";
+  private static final String TEST_PARTITION_URN = 
"beam:transforms:xlang:test:partition";
+
+  private static String expansionAddr;
+  private static String expansionJar;
+
+  @BeforeClass
+  public static void setUpClass() {
+expansionAddr =
+String.format("localhost:%s", 
Integer.valueOf(System.getProperty("expansionPort")));
+expansionJar = System.getProperty("expansionJar");
+  }
+
+  @Before
+  public void setUp() {
+testPipeline
+.getOptions()
+.as(ExperimentalOptions.class)
+.setExperiments(ImmutableList.of("jar_packages=" + expansionJar));
+waitForReady();
+  }
+
+  @After
+  public void tearDown() {
+pipelineResult.waitUntilFinish();
+assertThat(pipelineResult.getState(), equalTo(PipelineResult.State.DONE));
+  }
+
+  private void waitForReady() {
+try {
+  ManagedChannel channel = 
ManagedChannelBuilder.forTarget(expansionAddr).build();
+  ConnectivityState state

[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=375363=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375363
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 22/Jan/20 02:05
Start Date: 22/Jan/20 02:05
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10051: [BEAM-7961] Add 
tests for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#discussion_r369338651
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ValidateRunnerXlangTest.java
 ##
 @@ -0,0 +1,227 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import static org.hamcrest.Matchers.equalTo;
+import static org.junit.Assert.assertThat;
+
+import java.io.Serializable;
+import java.nio.charset.StandardCharsets;
+import java.util.Arrays;
+import org.apache.beam.sdk.PipelineResult;
+import org.apache.beam.sdk.options.ExperimentalOptions;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.testing.UsesCrossLanguageTransforms;
+import org.apache.beam.sdk.testing.ValidatesRunner;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.MapElements;
+import org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.PCollectionTuple;
+import org.apache.beam.sdk.values.TypeDescriptors;
+import org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ConnectivityState;
+import org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ManagedChannel;
+import org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ManagedChannelBuilder;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Test External transforms. */
+@RunWith(JUnit4.class)
+public class ValidateRunnerXlangTest implements Serializable {
+  @Rule public transient TestPipeline testPipeline = TestPipeline.create();
+  private PipelineResult pipelineResult;
+
+  private static final String TEST_PREFIX_URN = 
"beam:transforms:xlang:test:prefix";
 
 Review comment:
   I put the link to the design doc.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375363)
Time Spent: 16h 20m  (was: 16h 10m)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 16h 20m
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=375362=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375362
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 22/Jan/20 02:04
Start Date: 22/Jan/20 02:04
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10051: [BEAM-7961] Add 
tests for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#discussion_r369338544
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExternalTest.java
 ##
 @@ -56,19 +57,18 @@
 @RunWith(JUnit4.class)
 public class ExternalTest implements Serializable {
   @Rule public transient TestPipeline testPipeline = TestPipeline.create();
+  private PipelineResult pipelineResult;
 
   private static final String TEST_URN_SIMPLE = "simple";
   private static final String TEST_URN_LE = "le";
   private static final String TEST_URN_MULTI = "multi";
 
-  private static Integer expansionPort;
   private static String localExpansionAddr;
   private static Server localExpansionServer;
 
   @BeforeClass
-  public static void setUp() throws IOException {
-expansionPort = Integer.valueOf(System.getProperty("expansionPort"));
-int localExpansionPort = expansionPort + 100;
+  public static void setUpClass() throws IOException {
+int localExpansionPort = 
Integer.parseInt(System.getProperty("expansionPort")) + 100;
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375362)
Time Spent: 16h 10m  (was: 16h)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 16h 10m
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9154) Move Chicago Taxi Example to Python 3

2020-01-21 Thread Udi Meiri (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020714#comment-17020714
 ] 

Udi Meiri commented on BEAM-9154:
-

How did you run the example?
I'm trying this (having copied the 2 tasks from py2 to the py37/build.gradle 
file):
{code}
./gradlew :sdks:python:test-suites:dataflow:py37:chicagoTaxiExample 
-PgcsRoot=gs://BUCKET/chicago-taxi
{code}

> Move Chicago Taxi Example to Python 3
> -
>
> Key: BEAM-9154
> URL: https://issues.apache.org/jira/browse/BEAM-9154
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>
> The Chicago Taxi Example[1] should be moved to the latest version of Python 
> supported by Beam (currently it's Python 3.7).
> At the moment, the following error occurs when running the benchmark on 
> Python 3.7 (requires futher investigation):
> {code:java}
> Traceback (most recent call last):
>   File "preprocess.py", line 259, in 
> main()
>   File "preprocess.py", line 254, in main
> project=known_args.metric_reporting_project
>   File "preprocess.py", line 155, in transform_data
> ('Analyze' >> tft_beam.AnalyzeDataset(preprocessing_fn)))
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 987, in __ror__
> return self.transform.__ror__(pvalueish, self.label)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 547, in __ror__
> result = p.apply(self, pvalueish, label)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 
> 532, in apply
> return self.apply(transform, pvalueish)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 
> 573, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", 
> line 223, in apply_PTransform
> return transform.expand(input)
>   File 
> "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
>  line 825, in expand
> input_metadata))
>   File 
> "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
>  line 716, in expand
> output_signature = self._preprocessing_fn(copied_inputs)
>   File "preprocess.py", line 102, in preprocessing_fn
> _fill_in_missing(inputs[key]),
> KeyError: 'company'
> {code}
> [1] sdks/python/apache_beam/testing/benchmarks/chicago_taxi



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9168?focusedWorklogId=375361=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375361
 ]

ASF GitHub Bot logged work on BEAM-9168:


Author: ASF GitHub Bot
Created on: 22/Jan/20 01:54
Start Date: 22/Jan/20 01:54
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10652: [BEAM-9168] Temporary 
fix for RunnerAPIPTransformHolder usage
URL: https://github.com/apache/beam/pull/10652#issuecomment-576971789
 
 
   still waiting for tests (just starting now)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375361)
Time Spent: 1h 20m  (was: 1h 10m)

> AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with 
> PAR_DO.urn
> -
>
> Key: BEAM-9168
> URL: https://issues.apache.org/jira/browse/BEAM-9168
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is failing on a google-internal test.
> Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder.
> Failed assertion: 
> https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9168?focusedWorklogId=375360=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375360
 ]

ASF GitHub Bot logged work on BEAM-9168:


Author: ASF GitHub Bot
Created on: 22/Jan/20 01:48
Start Date: 22/Jan/20 01:48
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10652: 
[BEAM-9168] Temporary fix for RunnerAPIPTransformHolder usage
URL: https://github.com/apache/beam/pull/10652#discussion_r369335166
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -1050,7 +1051,9 @@ def is_side_input(tag):
 for tag, id in proto.outputs.items()}
 # This annotation is expected by some runners.
 if proto.spec.urn == common_urns.primitives.PAR_DO.urn:
-  assert isinstance(result.transform, ParDo)
+  # TODO(BEAM-9168): Figure out what to do for RunnerAPIPTransformHolder.
+  assert isinstance(result.transform, (ParDo, RunnerAPIPTransformHolder)),\
 
 Review comment:
   I see. Nevermind then.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375360)
Time Spent: 1h 10m  (was: 1h)

> AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with 
> PAR_DO.urn
> -
>
> Key: BEAM-9168
> URL: https://issues.apache.org/jira/browse/BEAM-9168
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This is failing on a google-internal test.
> Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder.
> Failed assertion: 
> https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9167) Reduce overhead of Go SDK side metrics

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9167?focusedWorklogId=375359=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375359
 ]

ASF GitHub Bot logged work on BEAM-9167:


Author: ASF GitHub Bot
Created on: 22/Jan/20 01:45
Start Date: 22/Jan/20 01:45
Worklog Time Spent: 10m 
  Work Description: lostluck commented on issue #10654: [BEAM-9167] Reduce 
Go SDK metric overhead
URL: https://github.com/apache/beam/pull/10654#issuecomment-576969875
 
 
   R: @youngoli 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375359)
Time Spent: 20m  (was: 10m)

> Reduce overhead of Go SDK side metrics
> --
>
> Key: BEAM-9167
> URL: https://issues.apache.org/jira/browse/BEAM-9167
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Locking overhead due to the global store and local caches of SDK counter data 
> can dominate certain workloads, which means we can do better.
> Instead of having a global store of metrics data to extract counters, we 
> should use per ptransform (or per bundle) counter sets, which would avoid 
> requiring locking per counter operation. The main detriment compared to the 
> current implementation is that a user would need to add their own locking if 
> they were to spawn multiple goroutines to process a Bundle's work in a DoFn.
> Given that self multithreaded DoFns aren't recommended/safe in Java,  largely 
> impossible in Python, and the other beam Go SDK provided constructs (like 
> Iterators and Emitters) are not thread safe, this is a small concern, 
> provided the documentation is clear on this.
> Removing the locking and switching to atomic ops reduces the overhead 
> significantly in example jobs and in the benchmarks.
> A second part of this change should be to move the exec package to manage 
> it's own per bundle state, rather than relying on a global datastore to 
> extract the per bundle,per ptransform values.
> Related: https://issues.apache.org/jira/browse/BEAM-6541 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9167) Reduce overhead of Go SDK side metrics

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9167?focusedWorklogId=375358=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375358
 ]

ASF GitHub Bot logged work on BEAM-9167:


Author: ASF GitHub Bot
Created on: 22/Jan/20 01:44
Start Date: 22/Jan/20 01:44
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #10654: [BEAM-9167] 
Reduce Go SDK metric overhead
URL: https://github.com/apache/beam/pull/10654
 
 
   This PR dramatically reduces the overhead of metrics in the Go SDK.
   
   A contemporary side by side comparison of the benchmark in the metrics 
package on my current machine:
   benchmarkold ns/op new ns/op 
delta
   BenchmarkMetrics/counter_inplace-12  585   249   
-57.44%
   BenchmarkMetrics/distribution_inplace-12 622   270   
-56.59%
   BenchmarkMetrics/gauge_inplace-12812   311   
-61.70%
   BenchmarkMetrics/counter_predeclared-12  227   15.8  
-93.04%
   BenchmarkMetrics/distribution_predeclared-12 282   24.0  
-91.49%
   BenchmarkMetrics/gauge_predeclared-12389   63.7  
-83.62%
   
   benchmarkold allocs new allocs   
  delta
   BenchmarkMetrics/counter_inplace-12  4  1
  -75.00%
   BenchmarkMetrics/distribution_inplace-12 4  1
  -75.00%
   BenchmarkMetrics/gauge_inplace-124  1
  -75.00%
   BenchmarkMetrics/counter_predeclared-12  3  0
  -100.00%
   BenchmarkMetrics/distribution_predeclared-12 3  0
  -100.00%
   BenchmarkMetrics/gauge_predeclared-123  0
  -100.00%
   
   benchmarkold bytes new bytes 
delta
   BenchmarkMetrics/counter_inplace-12  160   48
-70.00%
   BenchmarkMetrics/distribution_inplace-12 192   48
-75.00%
   BenchmarkMetrics/gauge_inplace-12192   48
-75.00%
   BenchmarkMetrics/counter_predeclared-12  480 
-100.00%
   BenchmarkMetrics/distribution_predeclared-12 800 
-100.00%
   BenchmarkMetrics/gauge_predeclared-12800 
-100.00%
   
   In particular this PR moves away from a global datastore for all metrics 
towards a perBundle based countersets. This allows for the removal of the per 
layer locks and the global lock that needed to be checked since all bundles had 
to check the same datastore. Now they only store a metric cell in the global 
store on first creation (still stored per bundle and per ptransform).
   
   A subsequent change will remove the global store altogether in favour of 
better exposing the metrics per bundle, and allowing a callback visitor to 
thread-safely access the data inside each metric. This will also permit 
removing the dependency on the protos from the package, which was a mistake I 
made when I first wrote the package.
   
   Further, Counters now use atomic operations rather than locks, which 
additional speeds them up vs the previous mutex approach.
   
   Counter "names" are hashed ahead of time and the hash value cached in the 
proxy to increase the speed of subsequent lookups using the same proxy object.
   
   This does make the proxies unsafe to use concurrently within the same bundle 
prior to first use, but this matches the general rule of Beam runners managing 
the concurrency for efficient processing, and that framework constructs are not 
safe for concurrent use by user code, without user managed locks.
   
   As an exploration, I did try using sync.Map to avoid the above restriction, 
but the overhead for the additional interface wraping and unwraping was 
significant enough that this approach was worthwhile. 
   This may be worth revisiting if Go gains Generics, as that would probably 
avoid this cost.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor

[jira] [Work logged] (BEAM-7739) Add ValueState in Python sdk

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7739?focusedWorklogId=375356=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375356
 ]

ASF GitHub Bot logged work on BEAM-7739:


Author: ASF GitHub Bot
Created on: 22/Jan/20 01:27
Start Date: 22/Jan/20 01:27
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on pull request #9067: [BEAM-7739] 
Implement ReadModifyWriteState in Python SDK
URL: https://github.com/apache/beam/pull/9067
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375356)
Time Spent: 6.5h  (was: 6h 20m)

> Add ValueState in Python sdk
> 
>
> Key: BEAM-7739
> URL: https://issues.apache.org/jira/browse/BEAM-7739
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rakesh Kumar
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Currently ValueState is missing from Python Sdks but it is existing in Java 
> sdks. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7739) Add ValueState in Python sdk

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7739?focusedWorklogId=375355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375355
 ]

ASF GitHub Bot logged work on BEAM-7739:


Author: ASF GitHub Bot
Created on: 22/Jan/20 01:27
Start Date: 22/Jan/20 01:27
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #9067: [BEAM-7739] 
Implement ReadModifyWriteState in Python SDK
URL: https://github.com/apache/beam/pull/9067#issuecomment-576965575
 
 
   This pull request has been closed due to lack of activity. If you think that 
is incorrect, or the pull request requires review, you can revive the PR at any 
time.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375355)
Time Spent: 6h 20m  (was: 6h 10m)

> Add ValueState in Python sdk
> 
>
> Key: BEAM-7739
> URL: https://issues.apache.org/jira/browse/BEAM-7739
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rakesh Kumar
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Currently ValueState is missing from Python Sdks but it is existing in Java 
> sdks. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=375354=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375354
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 22/Jan/20 01:22
Start Date: 22/Jan/20 01:22
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10051: [BEAM-7961] Add 
tests for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#discussion_r369329159
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1684,22 +1688,21 @@ class BeamModulePlugin implements Plugin {
 args '-c', "$pythonDir/scripts/run_expansion_services.sh stop 
--group_id ${project.name}"
   }
   setupTask.finalizedBy cleanupTask
+  config.startJobServer.finalizedBy config.cleanupJobServer
 
   // Task for running testcases in Java SDK
   def beamJavaTestPipelineOptions = [
-
"--runner=org.apache.beam.runners.portability.testing.TestPortableRunner",
-"--jobServerDriver=${config.jobServerDriver}",
+"--runner=PortableRunner",
 
 Review comment:
   The change enabled remote execution of portable pipelines. 
`TestPortableRunner` provides `--jobServerDriver` just for easier testing 
(otherwise there needs to be a separate jobserver process to run integration 
tests). Validate xlang test suite already implemented the setup/shutdown script 
managing external processes so I modified the code to adopt the original 
`PortableRunner`. I didn't check but OOM shouldn't happen with the new tests 
here even with in-process job server drivers.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375354)
Time Spent: 16h  (was: 15h 50m)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 16h
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (BEAM-6541) Consider converting bundle & ptransform ids to ints eagerly.

2020-01-21 Thread Robert Burke (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke closed BEAM-6541.
--
Fix Version/s: Not applicable
 Assignee: Robert Burke
   Resolution: Won't Fix

I'm taking a different approach in 
https://issues.apache.org/jira/browse/BEAM-9167 which better relies on the 
structure bundles and ptransforms to reduce the overhead.

Granted, I'm also using the technique mentioned here, but with hashing the 
metric names rather than the higher level structs.

> Consider converting bundle & ptransform ids to ints eagerly.
> 
>
> Key: BEAM-6541
> URL: https://issues.apache.org/jira/browse/BEAM-6541
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Minor
> Fix For: Not applicable
>
>
> BundleIDs and PTransformIDs necessary for communicating with the Runner 
> interface in the go SDK are currently strings, and used as is for metrics 
> contexts. We use them for getting bundle & ptransform specific metrics, and 
> transmitting the same. We could instead eagerly assign them a local index 
> that is then converted out when communicating metrics over the FnAPI, this 
> would reduce overhead on metric lookups in the various maps.
> Note: the same could be done for the user's metric-name, completing the 
> optimization. Measuring the per-report overhead for tentative/final metric 
> reporting is required before committing to this approach.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9168?focusedWorklogId=375353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375353
 ]

ASF GitHub Bot logged work on BEAM-9168:


Author: ASF GitHub Bot
Created on: 22/Jan/20 01:20
Start Date: 22/Jan/20 01:20
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10652: [BEAM-9168] 
Temporary fix for RunnerAPIPTransformHolder usage
URL: https://github.com/apache/beam/pull/10652#discussion_r369328701
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -1050,7 +1051,9 @@ def is_side_input(tag):
 for tag, id in proto.outputs.items()}
 # This annotation is expected by some runners.
 if proto.spec.urn == common_urns.primitives.PAR_DO.urn:
-  assert isinstance(result.transform, ParDo)
+  # TODO(BEAM-9168): Figure out what to do for RunnerAPIPTransformHolder.
+  assert isinstance(result.transform, (ParDo, RunnerAPIPTransformHolder)),\
 
 Review comment:
   ```py
   >>> assert (False, 'blah')
   :1: SyntaxWarning: assertion is always true, perhaps remove 
parentheses?
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375353)
Time Spent: 1h  (was: 50m)

> AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with 
> PAR_DO.urn
> -
>
> Key: BEAM-9168
> URL: https://issues.apache.org/jira/browse/BEAM-9168
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is failing on a google-internal test.
> Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder.
> Failed assertion: 
> https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=375352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375352
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 22/Jan/20 01:19
Start Date: 22/Jan/20 01:19
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10346: [BEAM-7926] 
Data-centric Interactive Part2
URL: https://github.com/apache/beam/pull/10346#issuecomment-576963609
 
 
   looking
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375352)
Time Spent: 33h 50m  (was: 33h 40m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 33h 50m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7516) Add a watermark manager for the fn_api_runner

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7516?focusedWorklogId=375351=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375351
 ]

ASF GitHub Bot logged work on BEAM-7516:


Author: ASF GitHub Bot
Created on: 22/Jan/20 01:16
Start Date: 22/Jan/20 01:16
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10291: 
[BEAM-7516][BEAM-8823] FnApiRunner works with work queues, and a primitive 
watermark manager
URL: https://github.com/apache/beam/pull/10291#issuecomment-576962848
 
 
   The failed test in precommit is dataflow_runner related. I'll leave it as is 
for review.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375351)
Time Spent: 4h 10m  (was: 4h)

> Add a watermark manager for the fn_api_runner
> -
>
> Key: BEAM-7516
> URL: https://issues.apache.org/jira/browse/BEAM-7516
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> To track watermarks for each stage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=375350=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375350
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Jan/20 01:07
Start Date: 22/Jan/20 01:07
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10368: [BEAM-8335] 
Modify PipelineInstrument to add TestStream for unbounded PCollections
URL: https://github.com/apache/beam/pull/10368#discussion_r369325488
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/pipeline_instrument.py
 ##
 @@ -418,6 +423,45 @@ def _replace_with_cached_inputs(self, pipeline):
 cache, noop.
 """
 
+# Find all cached unbounded PCollections.
+class CacheableUnboundedPCollectionVisitor(PipelineVisitor):
+  def __init__(self, pin):
+self._pin = pin
+self.unbounded_pcolls = set()
+
+  def enter_composite_transform(self, transform_node):
+self.visit_transform(transform_node)
+
+  def visit_transform(self, transform_node):
+if transform_node.inputs:
+  for input_pcoll in transform_node.inputs:
+key = self._pin.cache_key(input_pcoll)
+if (key in self._pin._cached_pcoll_read and
+not input_pcoll.is_bounded):
+  self.unbounded_pcolls.add(key)
+
+v = CacheableUnboundedPCollectionVisitor(self)
+pipeline.visit(v)
+
+# The set of keys from the cached unbounded PCollections will be used as 
the
+# output tags for the TestStream. This is to remember what cache-key is
+# associated with which PCollection.
+unbounded_cacheables = v.unbounded_pcolls
+output_tags = unbounded_cacheables
+
+# Take the PCollections that will be read from the TestStream and insert
+# them back into the dictionary of cached PCollections. The next step will
+# replace the downstream consumer of the non-cached PCollections with these
+# PCollections.
+if output_tags:
+  output_pcolls = pipeline | 
test_stream.TestStream(output_tags=output_tags)
+  if len(output_tags) == 1:
+self._cached_pcoll_read[None] = output_pcolls
+  else:
+for tag, pcoll in output_pcolls.items():
+  self._cached_pcoll_read[tag] = pcoll
 
 Review comment:
   nit `self._cached_pcoll_read.update(output_pcolls)`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375350)
Time Spent: 53h 50m  (was: 53h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 53h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=375348=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375348
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Jan/20 01:07
Start Date: 22/Jan/20 01:07
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10368: [BEAM-8335] 
Modify PipelineInstrument to add TestStream for unbounded PCollections
URL: https://github.com/apache/beam/pull/10368#discussion_r369325196
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/pipeline_instrument.py
 ##
 @@ -418,6 +423,45 @@ def _replace_with_cached_inputs(self, pipeline):
 cache, noop.
 """
 
+# Find all cached unbounded PCollections.
+class CacheableUnboundedPCollectionVisitor(PipelineVisitor):
+  def __init__(self, pin):
+self._pin = pin
+self.unbounded_pcolls = set()
+
+  def enter_composite_transform(self, transform_node):
+self.visit_transform(transform_node)
+
+  def visit_transform(self, transform_node):
+if transform_node.inputs:
+  for input_pcoll in transform_node.inputs:
+key = self._pin.cache_key(input_pcoll)
+if (key in self._pin._cached_pcoll_read and
+not input_pcoll.is_bounded):
+  self.unbounded_pcolls.add(key)
+
+v = CacheableUnboundedPCollectionVisitor(self)
+pipeline.visit(v)
+
+# The set of keys from the cached unbounded PCollections will be used as 
the
+# output tags for the TestStream. This is to remember what cache-key is
+# associated with which PCollection.
+unbounded_cacheables = v.unbounded_pcolls
 
 Review comment:
   `unbounded_cacheables` seems like a superfluous variable? It's not used 
anywhere else?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375348)
Time Spent: 53h 40m  (was: 53.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 53h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=375349=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375349
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Jan/20 01:07
Start Date: 22/Jan/20 01:07
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10368: [BEAM-8335] 
Modify PipelineInstrument to add TestStream for unbounded PCollections
URL: https://github.com/apache/beam/pull/10368#discussion_r369322292
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream.py
 ##
 @@ -172,13 +172,14 @@ class TestStream(PTransform):
   output.
   """
 
-  def __init__(self, coder=coders.FastPrimitivesCoder(), events=None):
+  def __init__(self, coder=coders.FastPrimitivesCoder(), events=None,
+   output_tags=None):
 super(TestStream, self).__init__()
 assert coder is not None
 self.coder = coder
 self.watermarks = {None: timestamp.MIN_TIMESTAMP}
-self._events = [] if events is None else list(events)
-self.output_tags = set()
+self._events = list(events) if events is not None else []
+self.output_tags = set(output_tags) if output_tags is not None else set()
 
 Review comment:
   You can do `list(var) if var else list()` and `set(var) if var else set()`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375349)
Time Spent: 53h 50m  (was: 53h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 53h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Yueyang Qiu (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020694#comment-17020694
 ] 

Yueyang Qiu commented on BEAM-9169:
---

Sounds good. Thanks for helping!

> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Kirill Kozlov
>Priority: Minor
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}
> From what I can see now this character is introduced during the Calcite 
> unparsing step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-3419) Enable iterable side input for beam runners.

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-3419?focusedWorklogId=375346=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375346
 ]

ASF GitHub Bot logged work on BEAM-3419:


Author: ASF GitHub Bot
Created on: 22/Jan/20 00:56
Start Date: 22/Jan/20 00:56
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10648: [BEAM-3419] Support 
iterable on Dataflow runner when using the unified worker.
URL: https://github.com/apache/beam/pull/10648#issuecomment-576958103
 
 
   R: @tvalentyn @ananvay @robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375346)
Time Spent: 4h 20m  (was: 4h 10m)

> Enable iterable side input for beam runners.
> 
>
> Key: BEAM-3419
> URL: https://issues.apache.org/jira/browse/BEAM-3419
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9168?focusedWorklogId=375345=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375345
 ]

ASF GitHub Bot logged work on BEAM-9168:


Author: ASF GitHub Bot
Created on: 22/Jan/20 00:53
Start Date: 22/Jan/20 00:53
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10652: 
[BEAM-9168] Temporary fix for RunnerAPIPTransformHolder usage
URL: https://github.com/apache/beam/pull/10652#discussion_r369322255
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -1050,7 +1051,9 @@ def is_side_input(tag):
 for tag, id in proto.outputs.items()}
 # This annotation is expected by some runners.
 if proto.spec.urn == common_urns.primitives.PAR_DO.urn:
-  assert isinstance(result.transform, ParDo)
+  # TODO(BEAM-9168): Figure out what to do for RunnerAPIPTransformHolder.
+  assert isinstance(result.transform, (ParDo, RunnerAPIPTransformHolder)),\
 
 Review comment:
   nit: use paranthesis instead of \
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375345)
Time Spent: 50m  (was: 40m)

> AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with 
> PAR_DO.urn
> -
>
> Key: BEAM-9168
> URL: https://issues.apache.org/jira/browse/BEAM-9168
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is failing on a google-internal test.
> Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder.
> Failed assertion: 
> https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9168?focusedWorklogId=375344=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375344
 ]

ASF GitHub Bot logged work on BEAM-9168:


Author: ASF GitHub Bot
Created on: 22/Jan/20 00:52
Start Date: 22/Jan/20 00:52
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10652: [BEAM-9168] 
Temporary fix for RunnerAPIPTransformHolder usage
URL: https://github.com/apache/beam/pull/10652#issuecomment-576957411
 
 
   LGTM. Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375344)
Time Spent: 40m  (was: 0.5h)

> AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with 
> PAR_DO.urn
> -
>
> Key: BEAM-9168
> URL: https://issues.apache.org/jira/browse/BEAM-9168
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is failing on a google-internal test.
> Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder.
> Failed assertion: 
> https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9168?focusedWorklogId=375342=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375342
 ]

ASF GitHub Bot logged work on BEAM-9168:


Author: ASF GitHub Bot
Created on: 22/Jan/20 00:49
Start Date: 22/Jan/20 00:49
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10652: [BEAM-9168] Temporary 
fix for RunnerAPIPTransformHolder usage
URL: https://github.com/apache/beam/pull/10652#issuecomment-576956473
 
 
   R: @aaltay 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375342)
Time Spent: 0.5h  (was: 20m)

> AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with 
> PAR_DO.urn
> -
>
> Key: BEAM-9168
> URL: https://issues.apache.org/jira/browse/BEAM-9168
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is failing on a google-internal test.
> Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder.
> Failed assertion: 
> https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9168?focusedWorklogId=375340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375340
 ]

ASF GitHub Bot logged work on BEAM-9168:


Author: ASF GitHub Bot
Created on: 22/Jan/20 00:41
Start Date: 22/Jan/20 00:41
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10652: [BEAM-9168] Temporary 
fix for RunnerAPIPTransformHolder usage
URL: https://github.com/apache/beam/pull/10652#issuecomment-576954653
 
 
   R: @chamikaramj 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375340)
Time Spent: 20m  (was: 10m)

> AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with 
> PAR_DO.urn
> -
>
> Key: BEAM-9168
> URL: https://issues.apache.org/jira/browse/BEAM-9168
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is failing on a google-internal test.
> Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder.
> Failed assertion: 
> https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Kirill Kozlov (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020690#comment-17020690
 ] 

Kirill Kozlov commented on BEAM-9169:
-

Yes, we might be able to fix this by specifying what needs escaping and what 
does not (or using a different escaping method). Not sure how to achieve that, 
but I will look into it.

> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Kirill Kozlov
>Priority: Minor
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}
> From what I can see now this character is introduced during the Calcite 
> unparsing step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn

2020-01-21 Thread Udi Meiri (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020689#comment-17020689
 ] 

Udi Meiri commented on BEAM-9168:
-

Note that the output_tags attribute is required here: 
https://github.com/apache/beam/blob/6a6adc8433deff10a5594bbf77cc9148ce0a951a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L856

> AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with 
> PAR_DO.urn
> -
>
> Key: BEAM-9168
> URL: https://issues.apache.org/jira/browse/BEAM-9168
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is failing on a google-internal test.
> Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder.
> Failed assertion: 
> https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=375338=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375338
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 22/Jan/20 00:35
Start Date: 22/Jan/20 00:35
Worklog Time Spent: 10m 
  Work Description: veblush commented on pull request #10617: [BEAM-8889] 
adding gRPC connectivity to Beam/GCS connector
URL: https://github.com/apache/beam/pull/10617#discussion_r369317768
 
 

 ##
 File path: 
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java
 ##
 @@ -427,22 +440,14 @@ public WritableByteChannel create(GcsPath path, String 
type) throws IOException
*/
   public WritableByteChannel create(GcsPath path, String type, Integer 
uploadBufferSizeBytes)
   throws IOException {
-GoogleCloudStorageWriteChannel channel =
-new GoogleCloudStorageWriteChannel(
-executorService,
-storageClient,
-new ClientRequestHelper<>(),
-path.getBucket(),
-path.getObject(),
-type,
-/* kmsKeyName= */ null,
-AsyncWriteChannelOptions.newBuilder().build(),
-new ObjectWriteConditions(),
-Collections.emptyMap());
+WritableByteChannel channel = getCloudStorage().create(new 
StorageResourceId(path.getBucket()));
 if (uploadBufferSizeBytes != null) {
-  channel.setUploadBufferSize(uploadBufferSizeBytes);
+  if (channel instanceof GoogleCloudStorageWriteChannel) {
 
 Review comment:
   This seems to be replaced with using 
`AsyncWriteChannelOptions.setUploadChunkSize` in 
`GoogleCloudStorageOptions.WriteChannelOptions`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375338)
Remaining Estimate: 167h  (was: 167h 10m)
Time Spent: 1h  (was: 50m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 1h
>  Remaining Estimate: 167h
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn

2020-01-21 Thread Udi Meiri (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020687#comment-17020687
 ] 

Udi Meiri commented on BEAM-9168:
-

Please add appropriate test cases that cover this assert.

> AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with 
> PAR_DO.urn
> -
>
> Key: BEAM-9168
> URL: https://issues.apache.org/jira/browse/BEAM-9168
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is failing on a google-internal test.
> Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder.
> Failed assertion: 
> https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn

2020-01-21 Thread Udi Meiri (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020686#comment-17020686
 ] 

Udi Meiri commented on BEAM-9168:
-

Possible options (I'm not that familiar with the semantics):
1. When replacing a ParDo with RunnerAPIPTransformHolder, change the URN as 
well (add a new one?).
2. Make a special-case RunnerAPIPTransformHolder class that also inherits from 
ParDo, or make  ParDo and RunnerAPIPTransformHolder inherit from the same base 
class and assert isinstance on that base.
3. Explicitly handle cases in the code where RunnerAPIPTransformHolder may come 
instead of ParDo.


> AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with 
> PAR_DO.urn
> -
>
> Key: BEAM-9168
> URL: https://issues.apache.org/jira/browse/BEAM-9168
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is failing on a google-internal test.
> Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder.
> Failed assertion: 
> https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7861) Make it easy to change between multi-process and multi-thread mode for Python Direct runners

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7861?focusedWorklogId=375335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375335
 ]

ASF GitHub Bot logged work on BEAM-7861:


Author: ASF GitHub Bot
Created on: 22/Jan/20 00:33
Start Date: 22/Jan/20 00:33
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10616: [BEAM-7861] 
update documentation about --direct_running_mode option with direct runner.
URL: https://github.com/apache/beam/pull/10616
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375335)
Time Spent: 4h 50m  (was: 4h 40m)

> Make it easy to change between multi-process and multi-thread mode for Python 
> Direct runners
> 
>
> Key: BEAM-7861
> URL: https://issues.apache.org/jira/browse/BEAM-7861
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> BEAM-3645 makes it possible to run a map task parallel.
> However, users need to change runner when switch between multithreading and 
> multiprocessing mode.
> We want to add a flag (ex: --use-multiprocess) to make the switch easy 
> without changing the runner each time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8626) Implement status api handler in python sdk harness

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8626?focusedWorklogId=375332=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375332
 ]

ASF GitHub Bot logged work on BEAM-8626:


Author: ASF GitHub Bot
Created on: 22/Jan/20 00:32
Start Date: 22/Jan/20 00:32
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #10598: [BEAM-8626] 
Implement status fn api handler in python sdk
URL: https://github.com/apache/beam/pull/10598#issuecomment-576952415
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375332)
Time Spent: 4h 50m  (was: 4h 40m)

> Implement status api handler in python sdk harness
> --
>
> Key: BEAM-8626
> URL: https://issues.apache.org/jira/browse/BEAM-8626
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7861) Make it easy to change between multi-process and multi-thread mode for Python Direct runners

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7861?focusedWorklogId=375334=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375334
 ]

ASF GitHub Bot logged work on BEAM-7861:


Author: ASF GitHub Bot
Created on: 22/Jan/20 00:32
Start Date: 22/Jan/20 00:32
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10616: [BEAM-7861] update 
documentation about --direct_running_mode option with direct runner.
URL: https://github.com/apache/beam/pull/10616#issuecomment-576952554
 
 
   LGTM, thank you! 
   In the future, please don't squash reviewed and unreviewed commits before a 
review has finalized, see: 
https://beam.apache.org/contribute/#make-reviewers-job-easier. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375334)
Time Spent: 4h 40m  (was: 4.5h)

> Make it easy to change between multi-process and multi-thread mode for Python 
> Direct runners
> 
>
> Key: BEAM-7861
> URL: https://issues.apache.org/jira/browse/BEAM-7861
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> BEAM-3645 makes it possible to run a map task parallel.
> However, users need to change runner when switch between multithreading and 
> multiprocessing mode.
> We want to add a flag (ex: --use-multiprocess) to make the switch easy 
> without changing the runner each time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8626) Implement status api handler in python sdk harness

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8626?focusedWorklogId=375329=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375329
 ]

ASF GitHub Bot logged work on BEAM-8626:


Author: ASF GitHub Bot
Created on: 22/Jan/20 00:31
Start Date: 22/Jan/20 00:31
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #10598: [BEAM-8626] 
Implement status fn api handler in python sdk
URL: https://github.com/apache/beam/pull/10598#issuecomment-576952149
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375329)
Time Spent: 4.5h  (was: 4h 20m)

> Implement status api handler in python sdk harness
> --
>
> Key: BEAM-8626
> URL: https://issues.apache.org/jira/browse/BEAM-8626
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8626) Implement status api handler in python sdk harness

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8626?focusedWorklogId=375331=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375331
 ]

ASF GitHub Bot logged work on BEAM-8626:


Author: ASF GitHub Bot
Created on: 22/Jan/20 00:31
Start Date: 22/Jan/20 00:31
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #10598: [BEAM-8626] 
Implement status fn api handler in python sdk
URL: https://github.com/apache/beam/pull/10598#issuecomment-576952309
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375331)
Time Spent: 4h 40m  (was: 4.5h)

> Implement status api handler in python sdk harness
> --
>
> Key: BEAM-8626
> URL: https://issues.apache.org/jira/browse/BEAM-8626
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Kirill Kozlov (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020681#comment-17020681
 ] 

Kirill Kozlov edited comment on BEAM-9169 at 1/22/20 12:30 AM:
---

[~robinyqiu] This might be the cause: 
[https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52]

 

Edit:

This was added to insure that strings with special characters (like: '
{noformat}
SELECT 'abc\\n'{noformat}
') get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes 
characters ZetaSQL does not expect to be escaped.


was (Author: kirillkozlov):
[~robinyqiu] This might be the cause: 
[https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52]

 

Edit:

This was added to insure that strings with special characters (like: 'abc\\n') 
get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes 
characters ZetaSQL does not expect to be escaped.

> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Kirill Kozlov
>Priority: Minor
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}
> From what I can see now this character is introduced during the Calcite 
> unparsing step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Kirill Kozlov (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020681#comment-17020681
 ] 

Kirill Kozlov edited comment on BEAM-9169 at 1/22/20 12:30 AM:
---

[~robinyqiu] This might be the cause: 
[https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52]

 

Edit:

This was added to insure that strings with special characters (like: 
{noformat}
SELECT 'abc\\n'{noformat}
) get escaped.

It looks like `StringEscapeUtils.escapeJava` method escapes characters ZetaSQL 
does not expect to be escaped.


was (Author: kirillkozlov):
[~robinyqiu] This might be the cause: 
[https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52]

 

Edit:

This was added to insure that strings with special characters (like: '
{noformat}
SELECT 'abc\\n'{noformat}
') get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes 
characters ZetaSQL does not expect to be escaped.

> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Kirill Kozlov
>Priority: Minor
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}
> From what I can see now this character is introduced during the Calcite 
> unparsing step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Yueyang Qiu (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020684#comment-17020684
 ] 

Yueyang Qiu edited comment on BEAM-9169 at 1/22/20 12:30 AM:
-

Thanks for the quick reply.

This seems to be a bug in Calcite util functions. Can we have a work around for 
it in Beam? I am currently assigning this bug to you since you have more 
context on the issue.


was (Author: robinyqiu):
Thanks for the quick reply.

 

This seems to be a bug in Calcite util functions. Can we have a work around for 
it in Beam? I am currently assigning this bug to you since you have more 
context on the issue.

> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Kirill Kozlov
>Priority: Minor
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}
> From what I can see now this character is introduced during the Calcite 
> unparsing step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Yueyang Qiu (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yueyang Qiu reassigned BEAM-9169:
-

Assignee: Kirill Kozlov  (was: Yueyang Qiu)

> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Kirill Kozlov
>Priority: Minor
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}
> From what I can see now this character is introduced during the Calcite 
> unparsing step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Yueyang Qiu (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020684#comment-17020684
 ] 

Yueyang Qiu commented on BEAM-9169:
---

Thanks for the quick reply.

 

This seems to be a bug in Calcite util functions. Can we have a work around for 
it in Beam? I am currently assigning this bug to you since you have more 
context on the issue.

> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Minor
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}
> From what I can see now this character is introduced during the Calcite 
> unparsing step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Kirill Kozlov (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020681#comment-17020681
 ] 

Kirill Kozlov edited comment on BEAM-9169 at 1/22/20 12:28 AM:
---

[~robinyqiu] This might be the cause: 
[https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52]

 

Edit:

This was added to insure that strings with special characters (like: 'abc\\n') 
get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes 
characters ZetaSQL does not expect to be escaped.


was (Author: kirillkozlov):
[~robinyqiu] This might be the cause: 
[https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52]

> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Minor
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}
> From what I can see now this character is introduced during the Calcite 
> unparsing step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9149) Support ZetaSQL positional parameters

2020-01-21 Thread Kyle Weaver (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-9149:
--
Description: 
While they are not yet exposed to the end user, ZetaSQL query parameters are 
currently being passed internally. However, the existing code assumes that all 
parameters are named parameters, not positional parameters. To support 
positional parameters, we will need to make at least the following changes:

1) Set mode to PARAMETER_POSITIONAL and use addPositionalQueryParameter instead 
of addQueryParameter in SqlAnalyzer:
https://github.com/apache/beam/blob/671b02ac5f1be87a591de8f5f456d0e5a199d771/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java#L119

2) Code currently takes a Map everywhere parameters are 
provided. This is not suitable for positional parameters, which are better 
represented as an ordered collection such as a list.

  was:
While they are not yet exposed to the end user, ZetaSQL query parameters are 
currently being passed internally. However, the existing code assumes that all 
parameters are named parameters, not positional parameters. To support 
positional parameters, we will need to make at least the following changes:

1) Set mode to PARAMETER_POSITIONAL and use addPositionalQueryParameter instead 
of addQueryParameter in SqlAnalyzer:
https://github.com/apache/beam/blob/671b02ac5f1be87a591de8f5f456d0e5a199d771/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java#L119

2) Code currently assumes that resolved parameters are named. While even 
positional parameters must be named when they are used as inputs, after they 
are resolved their names are removed. Thus this check will deref a null pointer 
and must be fixed:
https://github.com/apache/beam/blob/8915d6e95c405aeee0f29152545d3210e8e09f1f/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ExpressionConverter.java#L1004


> Support ZetaSQL positional parameters
> -
>
> Key: BEAM-9149
> URL: https://issues.apache.org/jira/browse/BEAM-9149
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql-zetasql
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>
> While they are not yet exposed to the end user, ZetaSQL query parameters are 
> currently being passed internally. However, the existing code assumes that 
> all parameters are named parameters, not positional parameters. To support 
> positional parameters, we will need to make at least the following changes:
> 1) Set mode to PARAMETER_POSITIONAL and use addPositionalQueryParameter 
> instead of addQueryParameter in SqlAnalyzer:
> https://github.com/apache/beam/blob/671b02ac5f1be87a591de8f5f456d0e5a199d771/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java#L119
> 2) Code currently takes a Map everywhere parameters are 
> provided. This is not suitable for positional parameters, which are better 
> represented as an ordered collection such as a list.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9168?focusedWorklogId=375326=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375326
 ]

ASF GitHub Bot logged work on BEAM-9168:


Author: ASF GitHub Bot
Created on: 22/Jan/20 00:28
Start Date: 22/Jan/20 00:28
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10652: [BEAM-9168] 
Temporary fix for RunnerAPIPTransformHolder usage
URL: https://github.com/apache/beam/pull/10652
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build

[jira] [Comment Edited] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Kirill Kozlov (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020681#comment-17020681
 ] 

Kirill Kozlov edited comment on BEAM-9169 at 1/22/20 12:28 AM:
---

[~robinyqiu] This might be the cause: 
[https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52]

 

Edit:

This was added to insure that strings with special characters (like: 'abc\\n') 
get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes 
characters ZetaSQL does not expect to be escaped.


was (Author: kirillkozlov):
[~robinyqiu] This might be the cause: 
[https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52]

 

Edit:

This was added to insure that strings with special characters (like: 'abc\\n') 
get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes 
characters ZetaSQL does not expect to be escaped.

> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Minor
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}
> From what I can see now this character is introduced during the Calcite 
> unparsing step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-6550) ParDo Async Java API

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-6550?focusedWorklogId=375325=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375325
 ]

ASF GitHub Bot logged work on BEAM-6550:


Author: ASF GitHub Bot
Created on: 22/Jan/20 00:26
Start Date: 22/Jan/20 00:26
Worklog Time Spent: 10m 
  Work Description: mynameborat commented on pull request #10651: 
[BEAM-6550] ParDo Async Java API
URL: https://github.com/apache/beam/pull/10651
 
 
   * Introduced support for async Java ParDo
   * Added direct runner support
   * Unit tests and sample
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build

[jira] [Commented] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Kirill Kozlov (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020681#comment-17020681
 ] 

Kirill Kozlov commented on BEAM-9169:
-

[~robinyqiu] This might be the cause: 
[https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52]

> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Minor
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}
> From what I can see now this character is introduced during the Calcite 
> unparsing step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Yueyang Qiu (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yueyang Qiu updated BEAM-9169:
--
Description: 
When I am testing query string
{code:java}
"SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
\"America/Los_Angeles\")"{code}
on BeamZetaSqlCalcRel I found that the second string parameter to the function 
is unparsed to
{code:java}
America\/Los_Angeles{code}
(note an extra backslash character is added).

This breaks the ZetaSQL evaluator with error
{code:java}
Syntax error: Illegal escape sequence: \/{code}
>From what I can see now this character is introduced during the Calcite 
>unparsing step.

  was:
When I am testing query string
{code:java}
"SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
\"America/Los_Angeles\")"{code}
on BeamZetaSqlCalcRel I found that the second string parameter to the function 
is unparsed to
{code:java}
America\/Los_Angeles{code}
(note an extra backslash character is added).

This breaks the ZetaSQL evaluator with error
{code:java}
Syntax error: Illegal escape sequence: \/{code}


> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Minor
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}
> From what I can see now this character is introduced during the Calcite 
> unparsing step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Yueyang Qiu (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020677#comment-17020677
 ] 

Yueyang Qiu commented on BEAM-9169:
---

[~kirillkozlov] [~apilloud] do you have any idea why this happend?

> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Minor
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Yueyang Qiu (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yueyang Qiu updated BEAM-9169:
--
Description: 
When I am testing query string
{code:java}
"SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
\"America/Los_Angeles\")"{code}
on BeamZetaSqlCalcRel I found that the second string parameter to the function 
is unparsed to
{code:java}
America\/Los_Angeles{code}
(note an extra backslash character is added).

This breaks the ZetaSQL evaluator with error
{code:java}
Syntax error: Illegal escape sequence: \/{code}

  was:
When I am testing query string
{code:java}
"SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
\"America/Los_Angeles\")"{code}
on BeamZetaSqlCalcRel I found that the second string parameter to the function 
is unparsed to
{code:java}
America\/Los_Angeles{code}
(note an extra backslash character is added).

This breaks the ZetaSQL evaluator with error
{code:java}
Syntax error: Illegal escape sequence: \\{code}


> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Minor
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Yueyang Qiu (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yueyang Qiu updated BEAM-9169:
--
Description: 
When I am testing query string
{code:java}
"SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
\"America/Los_Angeles\")"{code}
on BeamZetaSqlCalcRel I found that the second string parameter to the function 
is unparsed to
{code:java}
America\/Los_Angeles{code}
(note an extra backslash character is added).

This breaks the ZetaSQL evaluator with error
{code:java}
Syntax error: Illegal escape sequence: \\{code}

  was:
When I am testing query string
{code:java}
"SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
\"America/Los_Angeles\")"{code}
on BeamZetaSqlCalcRel I found that the second string parameter to the function 
is unparsed to
{code:java}
America\/Los_Angeles{code}
(note an extra backslash character is added).

This breaks the ZetaSQL evaluator with error
{code:java}
Syntax error: Illegal escape sequence: \{code}


> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Minor
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \\{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Yueyang Qiu (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yueyang Qiu updated BEAM-9169:
--
Description: 
When I am testing query string
{code:java}
"SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
\"America/Los_Angeles\")"{code}
on BeamZetaSqlCalcRel I found that the second string parameter to the function 
is unparsed to
{code:java}
America\/Los_Angeles{code}
(note an extra backslash character is added).

This breaks the ZetaSQL evaluator with error
{code:java}
Syntax error: Illegal escape sequence: \{code}

  was:
When I am testing query 

`SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", \"America/Los_Angeles\")` on 
`BeamZetaSqlCalcRel` I found that the second string parameter to the function 
is unparsed to `America\/Los_Angeles` (note an extra backslash character is 
added). This breaks the ZetaSQL evaluator with error `Syntax error: Illegal 
escape sequence: \`


> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Minor
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Yueyang Qiu (Jira)

Yueyang Qiu created BEAM-9169:
-

 Summary: Extra character introduced during Calcite unparsing
 Key: BEAM-9169
 URL: https://issues.apache.org/jira/browse/BEAM-9169
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql-zetasql
Reporter: Yueyang Qiu
Assignee: Yueyang Qiu


When I am testing query 

`SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", \"America/Los_Angeles\")` on 
`BeamZetaSqlCalcRel` I found that the second string parameter to the function 
is unparsed to `America\/Los_Angeles` (note an extra backslash character is 
added). This breaks the ZetaSQL evaluator with error `Syntax error: Illegal 
escape sequence: \`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn

2020-01-21 Thread Udi Meiri (Jira)

Udi Meiri created BEAM-9168:
---

 Summary: AppliedPTransform.from_runner_api fails on unexpected 
non-ParDo class with PAR_DO.urn
 Key: BEAM-9168
 URL: https://issues.apache.org/jira/browse/BEAM-9168
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Udi Meiri
Assignee: Chamikara Madhusanka Jayalath


This is failing on a google-internal test.
Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder.
Failed assertion: 
https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn

2020-01-21 Thread Udi Meiri (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri updated BEAM-9168:

Status: Open  (was: Triage Needed)

> AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with 
> PAR_DO.urn
> -
>
> Key: BEAM-9168
> URL: https://issues.apache.org/jira/browse/BEAM-9168
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>
> This is failing on a google-internal test.
> Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder.
> Failed assertion: 
> https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8042) Parsing of aggregate query fails

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8042?focusedWorklogId=375319=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375319
 ]

ASF GitHub Bot logged work on BEAM-8042:


Author: ASF GitHub Bot
Created on: 22/Jan/20 00:13
Start Date: 22/Jan/20 00:13
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #10649: [BEAM-8042] 
[ZetaSQL] Fix aggregate column reference
URL: https://github.com/apache/beam/pull/10649#issuecomment-576947355
 
 
   Run SQL postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375319)
Time Spent: 1h 40m  (was: 1.5h)

> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Kirill Kozlov
>Priority: Critical
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> {code}
>   @Rule
>   public TestPipeline pipeline = 
> TestPipeline.fromOptions(createPipelineOptions());
>   private static PipelineOptions createPipelineOptions() {
> BeamSqlPipelineOptions opts = 
> PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class);
> opts.setPlannerName(ZetaSQLQueryPlanner.class.getName());
> return opts;
>   }
>   @Test
>   public void testAggregate() {
> Schema inputSchema = Schema.builder()
> .addByteArrayField("id")
> .addInt64Field("has_f1")
> .addInt64Field("has_f2")
> .addInt64Field("has_f3")
> .addInt64Field("has_f4")
> .addInt64Field("has_f5")
> .addInt64Field("has_f6")
> .build();
> String sql = "SELECT \n" +
> "  id, \n" +
> "  COUNT(*) as count, \n" +
> "  SUM(has_f1) as f1_count, \n" +
> "  SUM(has_f2) as f2_count, \n" +
> "  SUM(has_f3) as f3_count, \n" +
> "  SUM(has_f4) as f4_count, \n" +
> "  SUM(has_f5) as f5_count, \n" +
> "  SUM(has_f6) as f6_count  \n" +
> "FROM PCOLLECTION \n" +
> "GROUP BY id";
> pipeline
> .apply(Create.empty(inputSchema))
> .apply(SqlTransform.query(sql));
> pipeline.run();
>   }
> {code}
> {code}
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>   at 
>   ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>   at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205)
>   ... 48 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8042) Parsing of aggregate query fails

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8042?focusedWorklogId=375317=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375317
 ]

ASF GitHub Bot logged work on BEAM-8042:


Author: ASF GitHub Bot
Created on: 22/Jan/20 00:12
Start Date: 22/Jan/20 00:12
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #10649: [BEAM-8042] 
[ZetaSQL] Fix aggregate column reference
URL: https://github.com/apache/beam/pull/10649#issuecomment-576947174
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375317)
Time Spent: 1.5h  (was: 1h 20m)

> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Kirill Kozlov
>Priority: Critical
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> {code}
>   @Rule
>   public TestPipeline pipeline = 
> TestPipeline.fromOptions(createPipelineOptions());
>   private static PipelineOptions createPipelineOptions() {
> BeamSqlPipelineOptions opts = 
> PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class);
> opts.setPlannerName(ZetaSQLQueryPlanner.class.getName());
> return opts;
>   }
>   @Test
>   public void testAggregate() {
> Schema inputSchema = Schema.builder()
> .addByteArrayField("id")
> .addInt64Field("has_f1")
> .addInt64Field("has_f2")
> .addInt64Field("has_f3")
> .addInt64Field("has_f4")
> .addInt64Field("has_f5")
> .addInt64Field("has_f6")
> .build();
> String sql = "SELECT \n" +
> "  id, \n" +
> "  COUNT(*) as count, \n" +
> "  SUM(has_f1) as f1_count, \n" +
> "  SUM(has_f2) as f2_count, \n" +
> "  SUM(has_f3) as f3_count, \n" +
> "  SUM(has_f4) as f4_count, \n" +
> "  SUM(has_f5) as f5_count, \n" +
> "  SUM(has_f6) as f6_count  \n" +
> "FROM PCOLLECTION \n" +
> "GROUP BY id";
> pipeline
> .apply(Create.empty(inputSchema))
> .apply(SqlTransform.query(sql));
> pipeline.run();
>   }
> {code}
> {code}
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>   at 
>   ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>   at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205)
>   ... 48 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (BEAM-8042) Parsing of aggregate query fails

2020-01-21 Thread Kirill Kozlov (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-8042 started by Kirill Kozlov.
---
> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Kirill Kozlov
>Priority: Critical
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {code}
>   @Rule
>   public TestPipeline pipeline = 
> TestPipeline.fromOptions(createPipelineOptions());
>   private static PipelineOptions createPipelineOptions() {
> BeamSqlPipelineOptions opts = 
> PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class);
> opts.setPlannerName(ZetaSQLQueryPlanner.class.getName());
> return opts;
>   }
>   @Test
>   public void testAggregate() {
> Schema inputSchema = Schema.builder()
> .addByteArrayField("id")
> .addInt64Field("has_f1")
> .addInt64Field("has_f2")
> .addInt64Field("has_f3")
> .addInt64Field("has_f4")
> .addInt64Field("has_f5")
> .addInt64Field("has_f6")
> .build();
> String sql = "SELECT \n" +
> "  id, \n" +
> "  COUNT(*) as count, \n" +
> "  SUM(has_f1) as f1_count, \n" +
> "  SUM(has_f2) as f2_count, \n" +
> "  SUM(has_f3) as f3_count, \n" +
> "  SUM(has_f4) as f4_count, \n" +
> "  SUM(has_f5) as f5_count, \n" +
> "  SUM(has_f6) as f6_count  \n" +
> "FROM PCOLLECTION \n" +
> "GROUP BY id";
> pipeline
> .apply(Create.empty(inputSchema))
> .apply(SqlTransform.query(sql));
> pipeline.run();
>   }
> {code}
> {code}
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>   at 
>   ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>   at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205)
>   ... 48 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-9151) Dataflow legacy worker tests are mis-configured

2020-01-21 Thread Boyuan Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyuan Zhang resolved BEAM-9151.

Resolution: Fixed

> Dataflow legacy worker tests are mis-configured
> ---
>
> Key: BEAM-9151
> URL: https://issues.apache.org/jira/browse/BEAM-9151
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Please refer to the last comment of https://github.com/apache/beam/pull/8183



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9151) Dataflow legacy worker tests are mis-configured

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9151?focusedWorklogId=375315=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375315
 ]

ASF GitHub Bot logged work on BEAM-9151:


Author: ASF GitHub Bot
Created on: 22/Jan/20 00:05
Start Date: 22/Jan/20 00:05
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #10647: [BEAM-9151] 
Cherry-pick: Fix misconfigured legacy dataflow tests
URL: https://github.com/apache/beam/pull/10647
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375315)
Time Spent: 1h 10m  (was: 1h)

> Dataflow legacy worker tests are mis-configured
> ---
>
> Key: BEAM-9151
> URL: https://issues.apache.org/jira/browse/BEAM-9151
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Please refer to the last comment of https://github.com/apache/beam/pull/8183



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9167) Reduce overhead of Go SDK side metrics

2020-01-21 Thread Robert Burke (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9167:
---
Description: 
Locking overhead due to the global store and local caches of SDK counter data 
can dominate certain workloads, which means we can do better.

Instead of having a global store of metrics data to extract counters, we should 
use per ptransform (or per bundle) counter sets, which would avoid requiring 
locking per counter operation. The main detriment compared to the current 
implementation is that a user would need to add their own locking if they were 
to spawn multiple goroutines to process a Bundle's work in a DoFn.

Given that self multithreaded DoFns aren't recommended/safe in Java,  largely 
impossible in Python, and the other beam Go SDK provided constructs (like 
Iterators and Emitters) are not thread safe, this is a small concern, provided 
the documentation is clear on this.

Removing the locking and switching to atomic ops reduces the overhead 
significantly in example jobs and in the benchmarks.

A second part of this change should be to move the exec package to manage it's 
own per bundle state, rather than relying on a global datastore to extract the 
per bundle,per ptransform values.

Related: https://issues.apache.org/jira/browse/BEAM-6541 

  was:
Locking overhead due to the global store and local caches of SDK counter data 
can dominate certain workloads, which means we can do better.

Instead of having a global store of metrics data to extract counters, we should 
use per ptransform (or per bundle) counter sets, which would avoid requiring 
locking per counter operation. The main detriment compared to the current 
implementation is that a user would need to add their own locking if they were 
to spawn multiple goroutines to process a Bundle's work in a DoFn.

Given that self multithreaded DoFns aren't recommended/safe in Java,  largely 
impossible in Python, and the other beam Go SDK provided constructs (like 
Iterators and Emitters) are not thread safe, this is a small concern, provided 
the documentation is clear on this.

Removing the locking and switching to atomic ops reduces the overhead 
significantly in example jobs and in the benchmarks.

Related: https://issues.apache.org/jira/browse/BEAM-6541 


> Reduce overhead of Go SDK side metrics
> --
>
> Key: BEAM-9167
> URL: https://issues.apache.org/jira/browse/BEAM-9167
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Major
>
> Locking overhead due to the global store and local caches of SDK counter data 
> can dominate certain workloads, which means we can do better.
> Instead of having a global store of metrics data to extract counters, we 
> should use per ptransform (or per bundle) counter sets, which would avoid 
> requiring locking per counter operation. The main detriment compared to the 
> current implementation is that a user would need to add their own locking if 
> they were to spawn multiple goroutines to process a Bundle's work in a DoFn.
> Given that self multithreaded DoFns aren't recommended/safe in Java,  largely 
> impossible in Python, and the other beam Go SDK provided constructs (like 
> Iterators and Emitters) are not thread safe, this is a small concern, 
> provided the documentation is clear on this.
> Removing the locking and switching to atomic ops reduces the overhead 
> significantly in example jobs and in the benchmarks.
> A second part of this change should be to move the exec package to manage 
> it's own per bundle state, rather than relying on a global datastore to 
> extract the per bundle,per ptransform values.
> Related: https://issues.apache.org/jira/browse/BEAM-6541 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-9167) Reduce overhead of Go SDK side metrics

2020-01-21 Thread Robert Burke (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-9167:
--

Assignee: Robert Burke

> Reduce overhead of Go SDK side metrics
> --
>
> Key: BEAM-9167
> URL: https://issues.apache.org/jira/browse/BEAM-9167
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Major
>
> Locking overhead due to the global store and local caches of SDK counter data 
> can dominate certain workloads, which means we can do better.
> Instead of having a global store of metrics data to extract counters, we 
> should use per ptransform (or per bundle) counter sets, which would avoid 
> requiring locking per counter operation. The main detriment compared to the 
> current implementation is that a user would need to add their own locking if 
> they were to spawn multiple goroutines to process a Bundle's work in a DoFn.
> Given that self multithreaded DoFns aren't recommended/safe in Java,  largely 
> impossible in Python, and the other beam Go SDK provided constructs (like 
> Iterators and Emitters) are not thread safe, this is a small concern, 
> provided the documentation is clear on this.
> Removing the locking and switching to atomic ops reduces the overhead 
> significantly in example jobs and in the benchmarks.
> Related: https://issues.apache.org/jira/browse/BEAM-6541 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9072) [SQL] Add support for Datastore source

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9072?focusedWorklogId=375303=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375303
 ]

ASF GitHub Bot logged work on BEAM-9072:


Author: ASF GitHub Bot
Created on: 21/Jan/20 23:46
Start Date: 21/Jan/20 23:46
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #10440: [BEAM-9072] [SQL] 
DataStoreV1 IO connector
URL: https://github.com/apache/beam/pull/10440#issuecomment-576940136
 
 
   Run JavaBeamZetaSQL PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375303)
Time Spent: 6h 10m  (was: 6h)

> [SQL] Add support for Datastore source
> --
>
> Key: BEAM-9072
> URL: https://issues.apache.org/jira/browse/BEAM-9072
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> * Create a Datastore table and table provider
>  * Conversion between Datastore and Beam data types
>  * Implement buildIOReader
>  * Implement buildIOWrite
>  * Implement getTableStatistics
> Doc: 
> [https://docs.google.com/document/d/1FxuEGewJ3GPDl0IKglfOYf1edwa2m_wryFZYRMpRNbA/edit?pli=1]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8042) Parsing of aggregate query fails

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8042?focusedWorklogId=375301=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375301
 ]

ASF GitHub Bot logged work on BEAM-8042:


Author: ASF GitHub Bot
Created on: 21/Jan/20 23:46
Start Date: 21/Jan/20 23:46
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #10649: [BEAM-8042] 
[ZetaSQL] Fix aggregate column reference
URL: https://github.com/apache/beam/pull/10649#discussion_r369304269
 
 

 ##
 File path: 
sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java
 ##
 @@ -1347,6 +1347,44 @@ public void testZetaSQLStructFieldAccessInTumble() {
 
pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES));
   }
 
+  @Test
+  public void testAggregateWithAndWithoutColumnRefs() {
 
 Review comment:
   Great suggestions! Will add a message in a commit.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375301)
Time Spent: 1h 20m  (was: 1h 10m)

> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Kirill Kozlov
>Priority: Critical
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {code}
>   @Rule
>   public TestPipeline pipeline = 
> TestPipeline.fromOptions(createPipelineOptions());
>   private static PipelineOptions createPipelineOptions() {
> BeamSqlPipelineOptions opts = 
> PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class);
> opts.setPlannerName(ZetaSQLQueryPlanner.class.getName());
> return opts;
>   }
>   @Test
>   public void testAggregate() {
> Schema inputSchema = Schema.builder()
> .addByteArrayField("id")
> .addInt64Field("has_f1")
> .addInt64Field("has_f2")
> .addInt64Field("has_f3")
> .addInt64Field("has_f4")
> .addInt64Field("has_f5")
> .addInt64Field("has_f6")
> .build();
> String sql = "SELECT \n" +
> "  id, \n" +
> "  COUNT(*) as count, \n" +
> "  SUM(has_f1) as f1_count, \n" +
> "  SUM(has_f2) as f2_count, \n" +
> "  SUM(has_f3) as f3_count, \n" +
> "  SUM(has_f4) as f4_count, \n" +
> "  SUM(has_f5) as f5_count, \n" +
> "  SUM(has_f6) as f6_count  \n" +
> "FROM PCOLLECTION \n" +
> "GROUP BY id";
> pipeline
> .apply(Create.empty(inputSchema))
> .apply(SqlTransform.query(sql));
> pipeline.run();
>   }
> {code}
> {code}
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>   at 
>   ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>   at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
>   at 
>

[jira] [Work logged] (BEAM-9072) [SQL] Add support for Datastore source

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9072?focusedWorklogId=375302=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375302
 ]

ASF GitHub Bot logged work on BEAM-9072:


Author: ASF GitHub Bot
Created on: 21/Jan/20 23:46
Start Date: 21/Jan/20 23:46
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #10440: [BEAM-9072] [SQL] 
DataStoreV1 IO connector
URL: https://github.com/apache/beam/pull/10440#issuecomment-576940105
 
 
   Run SQL postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375302)
Time Spent: 6h  (was: 5h 50m)

> [SQL] Add support for Datastore source
> --
>
> Key: BEAM-9072
> URL: https://issues.apache.org/jira/browse/BEAM-9072
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> * Create a Datastore table and table provider
>  * Conversion between Datastore and Beam data types
>  * Implement buildIOReader
>  * Implement buildIOWrite
>  * Implement getTableStatistics
> Doc: 
> [https://docs.google.com/document/d/1FxuEGewJ3GPDl0IKglfOYf1edwa2m_wryFZYRMpRNbA/edit?pli=1]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8042) Parsing of aggregate query fails

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8042?focusedWorklogId=375299=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375299
 ]

ASF GitHub Bot logged work on BEAM-8042:


Author: ASF GitHub Bot
Created on: 21/Jan/20 23:44
Start Date: 21/Jan/20 23:44
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #10649: [BEAM-8042] 
[ZetaSQL] Fix aggregate column reference
URL: https://github.com/apache/beam/pull/10649#discussion_r369303857
 
 

 ##
 File path: 
sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java
 ##
 @@ -1347,6 +1347,44 @@ public void testZetaSQLStructFieldAccessInTumble() {
 
pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES));
   }
 
+  @Test
+  public void testAggregateWithAndWithoutColumnRefs() {
 
 Review comment:
   Nit: you might can include a comment like "this test is used to fix 
BEAM-8042", which can provide some context for readers. But it's not required.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375299)
Time Spent: 1h  (was: 50m)

> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Kirill Kozlov
>Priority: Critical
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {code}
>   @Rule
>   public TestPipeline pipeline = 
> TestPipeline.fromOptions(createPipelineOptions());
>   private static PipelineOptions createPipelineOptions() {
> BeamSqlPipelineOptions opts = 
> PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class);
> opts.setPlannerName(ZetaSQLQueryPlanner.class.getName());
> return opts;
>   }
>   @Test
>   public void testAggregate() {
> Schema inputSchema = Schema.builder()
> .addByteArrayField("id")
> .addInt64Field("has_f1")
> .addInt64Field("has_f2")
> .addInt64Field("has_f3")
> .addInt64Field("has_f4")
> .addInt64Field("has_f5")
> .addInt64Field("has_f6")
> .build();
> String sql = "SELECT \n" +
> "  id, \n" +
> "  COUNT(*) as count, \n" +
> "  SUM(has_f1) as f1_count, \n" +
> "  SUM(has_f2) as f2_count, \n" +
> "  SUM(has_f3) as f3_count, \n" +
> "  SUM(has_f4) as f4_count, \n" +
> "  SUM(has_f5) as f5_count, \n" +
> "  SUM(has_f6) as f6_count  \n" +
> "FROM PCOLLECTION \n" +
> "GROUP BY id";
> pipeline
> .apply(Create.empty(inputSchema))
> .apply(SqlTransform.query(sql));
> pipeline.run();
>   }
> {code}
> {code}
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>   at 
>   ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>   at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
>   at 
>

[jira] [Work logged] (BEAM-8042) Parsing of aggregate query fails

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8042?focusedWorklogId=375298=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375298
 ]

ASF GitHub Bot logged work on BEAM-8042:


Author: ASF GitHub Bot
Created on: 21/Jan/20 23:44
Start Date: 21/Jan/20 23:44
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #10649: [BEAM-8042] 
[ZetaSQL] Fix aggregate column reference
URL: https://github.com/apache/beam/pull/10649#discussion_r369303722
 
 

 ##
 File path: 
sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java
 ##
 @@ -1347,6 +1347,44 @@ public void testZetaSQLStructFieldAccessInTumble() {
 
pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES));
   }
 
+  @Test
+  public void testAggregateWithAndWithoutColumnRefs() {
+ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config);
+
+String sql =
+"SELECT \n"
++ "  id, \n"
++ "  SUM(has_f1) as f1_count, \n"
++ "  SUM(has_f2) as f2_count, \n"
++ "  SUM(has_f3) as f3_count, \n"
++ "  SUM(has_f4) as f4_count, \n"
++ "  SUM(has_f5) as f5_count, \n"
++ "  COUNT(*) as count, \n"
++ "  SUM(has_f6) as f6_count  \n"
++ "FROM (select 0 as id, 1 as has_f1, 2 as has_f2, 3 as has_f3, 4 
as has_f4, 5 as has_f5, 6 as has_f6)\n"
 
 Review comment:
   Might want to use named parameters to not rely on hard-coded values.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375298)
Time Spent: 50m  (was: 40m)

> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Kirill Kozlov
>Priority: Critical
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code}
>   @Rule
>   public TestPipeline pipeline = 
> TestPipeline.fromOptions(createPipelineOptions());
>   private static PipelineOptions createPipelineOptions() {
> BeamSqlPipelineOptions opts = 
> PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class);
> opts.setPlannerName(ZetaSQLQueryPlanner.class.getName());
> return opts;
>   }
>   @Test
>   public void testAggregate() {
> Schema inputSchema = Schema.builder()
> .addByteArrayField("id")
> .addInt64Field("has_f1")
> .addInt64Field("has_f2")
> .addInt64Field("has_f3")
> .addInt64Field("has_f4")
> .addInt64Field("has_f5")
> .addInt64Field("has_f6")
> .build();
> String sql = "SELECT \n" +
> "  id, \n" +
> "  COUNT(*) as count, \n" +
> "  SUM(has_f1) as f1_count, \n" +
> "  SUM(has_f2) as f2_count, \n" +
> "  SUM(has_f3) as f3_count, \n" +
> "  SUM(has_f4) as f4_count, \n" +
> "  SUM(has_f5) as f5_count, \n" +
> "  SUM(has_f6) as f6_count  \n" +
> "FROM PCOLLECTION \n" +
> "GROUP BY id";
> pipeline
> .apply(Create.empty(inputSchema))
> .apply(SqlTransform.query(sql));
> pipeline.run();
>   }
> {code}
> {code}
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
>

[jira] [Work logged] (BEAM-8042) Parsing of aggregate query fails

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8042?focusedWorklogId=375300=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375300
 ]

ASF GitHub Bot logged work on BEAM-8042:


Author: ASF GitHub Bot
Created on: 21/Jan/20 23:44
Start Date: 21/Jan/20 23:44
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #10649: [BEAM-8042] 
[ZetaSQL] Fix aggregate column reference
URL: https://github.com/apache/beam/pull/10649#discussion_r369303857
 
 

 ##
 File path: 
sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java
 ##
 @@ -1347,6 +1347,44 @@ public void testZetaSQLStructFieldAccessInTumble() {
 
pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES));
   }
 
+  @Test
+  public void testAggregateWithAndWithoutColumnRefs() {
 
 Review comment:
   Nit: you might can include a comment like "this test is used to verify 
BEAM-8042", which can provide some context for readers. But it's not required.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375300)
Time Spent: 1h 10m  (was: 1h)

> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Kirill Kozlov
>Priority: Critical
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code}
>   @Rule
>   public TestPipeline pipeline = 
> TestPipeline.fromOptions(createPipelineOptions());
>   private static PipelineOptions createPipelineOptions() {
> BeamSqlPipelineOptions opts = 
> PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class);
> opts.setPlannerName(ZetaSQLQueryPlanner.class.getName());
> return opts;
>   }
>   @Test
>   public void testAggregate() {
> Schema inputSchema = Schema.builder()
> .addByteArrayField("id")
> .addInt64Field("has_f1")
> .addInt64Field("has_f2")
> .addInt64Field("has_f3")
> .addInt64Field("has_f4")
> .addInt64Field("has_f5")
> .addInt64Field("has_f6")
> .build();
> String sql = "SELECT \n" +
> "  id, \n" +
> "  COUNT(*) as count, \n" +
> "  SUM(has_f1) as f1_count, \n" +
> "  SUM(has_f2) as f2_count, \n" +
> "  SUM(has_f3) as f3_count, \n" +
> "  SUM(has_f4) as f4_count, \n" +
> "  SUM(has_f5) as f5_count, \n" +
> "  SUM(has_f6) as f6_count  \n" +
> "FROM PCOLLECTION \n" +
> "GROUP BY id";
> pipeline
> .apply(Create.empty(inputSchema))
> .apply(SqlTransform.query(sql));
> pipeline.run();
>   }
> {code}
> {code}
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>   at 
>   ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>   at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
>   at 
>

[jira] [Work logged] (BEAM-8042) Parsing of aggregate query fails

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8042?focusedWorklogId=375291=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375291
 ]

ASF GitHub Bot logged work on BEAM-8042:


Author: ASF GitHub Bot
Created on: 21/Jan/20 23:37
Start Date: 21/Jan/20 23:37
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #10649: [BEAM-8042] 
[ZetaSQL] Fix aggregate column reference
URL: https://github.com/apache/beam/pull/10649
 
 
   Some aggregate operations do not require a column reference (Ex: `COUNT(*)`, 
unlike `COUNT(id)`).
   Such expressions should not increment reference offset when construction 
`LogicalAggregate`.
   
   R: @amaliujia 
   CC: @kanterov 
   CC: @apilloud 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build

[jira] [Created] (BEAM-9167) Reduce overhead of Go SDK side metrics

2020-01-21 Thread Robert Burke (Jira)

Robert Burke created BEAM-9167:
--

 Summary: Reduce overhead of Go SDK side metrics
 Key: BEAM-9167
 URL: https://issues.apache.org/jira/browse/BEAM-9167
 Project: Beam
  Issue Type: Improvement
  Components: sdk-go
Reporter: Robert Burke


Locking overhead due to the global store and local caches of SDK counter data 
can dominate certain workloads, which means we can do better.

Instead of having a global store of metrics data to extract counters, we should 
use per ptransform (or per bundle) counter sets, which would avoid requiring 
locking per counter operation. The main detriment compared to the current 
implementation is that a user would need to add their own locking if they were 
to spawn multiple goroutines to process a Bundle's work in a DoFn.

Given that self multithreaded DoFns aren't recommended/safe in Java,  largely 
impossible in Python, and the other beam Go SDK provided constructs (like 
Iterators and Emitters) are not thread safe, this is a small concern, provided 
the documentation is clear on this.

Removing the locking and switching to atomic ops reduces the overhead 
significantly in example jobs and in the benchmarks.

Related: https://issues.apache.org/jira/browse/BEAM-6541 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-3419) Enable iterable side input for beam runners.

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-3419?focusedWorklogId=375288=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375288
 ]

ASF GitHub Bot logged work on BEAM-3419:


Author: ASF GitHub Bot
Created on: 21/Jan/20 23:31
Start Date: 21/Jan/20 23:31
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10648: [BEAM-3419] Support 
iterable on Dataflow runner when using the unified worker.
URL: https://github.com/apache/beam/pull/10648#issuecomment-576936184
 
 
   Ran tests internal to Google to validate this change.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375288)
Time Spent: 4h 10m  (was: 4h)

> Enable iterable side input for beam runners.
> 
>
> Key: BEAM-3419
> URL: https://issues.apache.org/jira/browse/BEAM-3419
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-8042) Parsing of aggregate query fails

2020-01-21 Thread Kirill Kozlov (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov reassigned BEAM-8042:
---

Assignee: Kirill Kozlov

> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Kirill Kozlov
>Priority: Critical
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code}
>   @Rule
>   public TestPipeline pipeline = 
> TestPipeline.fromOptions(createPipelineOptions());
>   private static PipelineOptions createPipelineOptions() {
> BeamSqlPipelineOptions opts = 
> PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class);
> opts.setPlannerName(ZetaSQLQueryPlanner.class.getName());
> return opts;
>   }
>   @Test
>   public void testAggregate() {
> Schema inputSchema = Schema.builder()
> .addByteArrayField("id")
> .addInt64Field("has_f1")
> .addInt64Field("has_f2")
> .addInt64Field("has_f3")
> .addInt64Field("has_f4")
> .addInt64Field("has_f5")
> .addInt64Field("has_f6")
> .build();
> String sql = "SELECT \n" +
> "  id, \n" +
> "  COUNT(*) as count, \n" +
> "  SUM(has_f1) as f1_count, \n" +
> "  SUM(has_f2) as f2_count, \n" +
> "  SUM(has_f3) as f3_count, \n" +
> "  SUM(has_f4) as f4_count, \n" +
> "  SUM(has_f5) as f5_count, \n" +
> "  SUM(has_f6) as f6_count  \n" +
> "FROM PCOLLECTION \n" +
> "GROUP BY id";
> pipeline
> .apply(Create.empty(inputSchema))
> .apply(SqlTransform.query(sql));
> pipeline.run();
>   }
> {code}
> {code}
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>   at 
>   ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>   at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205)
>   ... 48 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-3419) Enable iterable side input for beam runners.

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-3419?focusedWorklogId=375285=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375285
 ]

ASF GitHub Bot logged work on BEAM-3419:


Author: ASF GitHub Bot
Created on: 21/Jan/20 23:24
Start Date: 21/Jan/20 23:24
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10648: [BEAM-3419] 
Support iterable on Dataflow runner when using the unified worker.
URL: https://github.com/apache/beam/pull/10648
 
 
   Note that all other portable runners are using iterable side inputs.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build

[jira] [Commented] (BEAM-8584) Remove TestPipelineOptions

2020-01-21 Thread Brian Hulette (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020644#comment-17020644
 ] 

Brian Hulette commented on BEAM-8584:
-

Sorry I didn't get back to you with that link Mark. I've been out of the office 
a bit dealing with a medical issue (all is well now :))

I think the reasoning qualifies as language-specific, I just noted that much of 
the functionality provided by TestPipelineOptions could be provided in other 
ways.

> Remove TestPipelineOptions
> --
>
> Key: BEAM-8584
> URL: https://issues.apache.org/jira/browse/BEAM-8584
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>
> See [ML 
> thread|https://lists.apache.org/thread.html/cc2ac6db764e0d750688f8bae540728e38759365b86ba6f3fabfa6dd@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9063) Migrate docker images to apache namespace.

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9063?focusedWorklogId=375277=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375277
 ]

ASF GitHub Bot logged work on BEAM-9063:


Author: ASF GitHub Bot
Created on: 21/Jan/20 23:07
Start Date: 21/Jan/20 23:07
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10612: [NOT READY TO 
MERGE][BEAM-9063] migrate docker images to apache
URL: https://github.com/apache/beam/pull/10612#issuecomment-576929102
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375277)
Time Spent: 3h 20m  (was: 3h 10m)

> Migrate docker images to apache namespace.
> --
>
> Key: BEAM-9063
> URL: https://issues.apache.org/jira/browse/BEAM-9063
> Project: Beam
>  Issue Type: Task
>  Components: beam-community
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> https://hub.docker.com/u/apache



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-9166) Add unit tests for ZetaSQL time-related functions that depends on system clock

2020-01-21 Thread Yueyang Qiu (Jira)

Yueyang Qiu created BEAM-9166:
-

 Summary: Add unit tests for ZetaSQL time-related functions that 
depends on system clock
 Key: BEAM-9166
 URL: https://issues.apache.org/jira/browse/BEAM-9166
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql-zetasql
Reporter: Yueyang Qiu
Assignee: Yueyang Qiu


Example test query: "SELECT CURRENT_TIMESTAMP()"

We cannot test this because currently the Java ZetaSQL evaluator wrapper does 
not support configuring clocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (BEAM-9080) [Go SDK] beam.Partition should support PCollection>s and not just PCollection

2020-01-21 Thread Robert Burke (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke closed BEAM-9080.
--
Fix Version/s: Not applicable
   Resolution: Fixed

> [Go SDK] beam.Partition should support PCollection>s and not just 
> PCollection
> 
>
> Key: BEAM-9080
> URL: https://issues.apache.org/jira/browse/BEAM-9080
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Minor
>  Labels: beginner, noob, starter
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> beam.Partition should support PCollection>s and not just 
> PCollection 
> If StructualDynFns also present themselves, the emitters can be optimized too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9029) Two bugs in Python SDK S3 filesystem support

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9029?focusedWorklogId=375263=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375263
 ]

ASF GitHub Bot logged work on BEAM-9029:


Author: ASF GitHub Bot
Created on: 21/Jan/20 22:48
Start Date: 21/Jan/20 22:48
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10459: 
[BEAM-9029]Fix two bugs in Python SDK S3 filesystem support
URL: https://github.com/apache/beam/pull/10459
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375263)
Remaining Estimate: 21h  (was: 21h 10m)
Time Spent: 3h  (was: 2h 50m)

> Two bugs in Python SDK S3 filesystem support
> 
>
> Key: BEAM-9029
> URL: https://issues.apache.org/jira/browse/BEAM-9029
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Wenhai Pan
>Assignee: Wenhai Pan
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 24h
>  Time Spent: 3h
>  Remaining Estimate: 21h
>
> Hi :)
> There seem to be 2 bugs in the S3 filesystem support.
> I tried to use S3 storage for a simple wordcount demo with DirectRunner.
> The demo script:
> {code:java}
> def main():
> options = PipelineOptions().view_as(StandardOptions)
>  options.runner = 'DirectRunner'
> pipeline = beam.Pipeline(options = options)
> (
>  pipeline
>  | ReadFromText("s3://mx-machine-learning/panwenhai/beam_test/test_data")
>  | "extract_words" >> beam.FlatMap(lambda x: re.findall(r" [A-Za-z\']+", x))
>  | beam.combiners.Count.PerElement()
>  | beam.MapTuple(lambda word, count: "%s: %s" % (word, count))
>  | WriteToText("s3://mx-machine-learning/panwenhai/beam_test/output")
>  )
> result = pipeline.run()
>  result.wait_until_finish()
> return
> {code}
>  
> Error message 1:
> {noformat}
> apache_beam.io.filesystem.BeamIOError: Match operation failed with exceptions 
> {'s3://mx-machine-learning/panwenhai/beam_test/output-*-of-1': 
> BeamIOError("List operation failed with exceptions 
> {'s3://mx-machine-learning/panwenhai/beam_test/output-': S3ClientError('Tried 
> to list nonexistent S3 path: 
> s3://mx-machine-learning/panwenhai/beam_test/output-', 404)}")} [while 
> running 'WriteToText/Write/WriteImpl/PreFinalize'] with exceptions 
> None{noformat}
>  
> After digging into the code, it seems the Boto3 client's list function will 
> raise an exception when trying to list a nonexistent S3 path 
> (beam/sdks/pythonapache_beam/io/aws/clients/s3/boto3_client.py line 111). And 
> the S3IO class does not handle this exception in list_prefix function 
> (beam/sdks/python/apache_beam/io/aws/s3io.py line 121).
> When the runner tries to list and delete the existing output file, if there 
> is no existing output file, it will try to list a nonexistent S3 path and 
> will trigger the exception.
> This should not be an issue here. I think we can ignore this exception safely 
> in the S3IO list_prefix function.
> Error Message 2:
> {noformat}
> File 
> "/Users/wenhai.pan/venvs/tfx/lib/python3.7/site-packages/apache_beam-2.19.0.dev0-py3.7.egg/apache_beam/io/aws/s3filesystem.py",
>  line 272, in delete
> exceptions = {path: error for (path, error) in results
> File 
> "/Users/wenhai.pan/venvs/tfx/lib/python3.7/site-packages/apache_beam-2.19.0.dev0-py3.7.egg/apache_beam/io/aws/s3filesystem.py",
>  line 272, in 
> exceptions = {path: error for (path, error) in results
> ValueError: too many values to unpack (expected 2) [while running 
> 'WriteToText/Write/WriteImpl/FinalizeWrite']{noformat}
>  
> When the runner tries to delete the temporary output directory, it will 
> trigger this exception. This exception is caused by parsing (path, error) 
> directly from the "results" which is a dict 
> (beam/sdks/python/apache_beam/io/aws/s3filesystem.py line 272). I think we 
> should use results.items() here.
> I have submitted a patch for these 2 bugs: 
> https://github.com/apache/beam/pull/10459
>  
> Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9029) Two bugs in Python SDK S3 filesystem support

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9029?focusedWorklogId=375262=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375262
 ]

ASF GitHub Bot logged work on BEAM-9029:


Author: ASF GitHub Bot
Created on: 21/Jan/20 22:47
Start Date: 21/Jan/20 22:47
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10459: [BEAM-9029]Fix two 
bugs in Python SDK S3 filesystem support
URL: https://github.com/apache/beam/pull/10459#issuecomment-576922546
 
 
   LGTM. Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375262)
Remaining Estimate: 21h 10m  (was: 21h 20m)
Time Spent: 2h 50m  (was: 2h 40m)

> Two bugs in Python SDK S3 filesystem support
> 
>
> Key: BEAM-9029
> URL: https://issues.apache.org/jira/browse/BEAM-9029
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Wenhai Pan
>Assignee: Wenhai Pan
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 24h
>  Time Spent: 2h 50m
>  Remaining Estimate: 21h 10m
>
> Hi :)
> There seem to be 2 bugs in the S3 filesystem support.
> I tried to use S3 storage for a simple wordcount demo with DirectRunner.
> The demo script:
> {code:java}
> def main():
> options = PipelineOptions().view_as(StandardOptions)
>  options.runner = 'DirectRunner'
> pipeline = beam.Pipeline(options = options)
> (
>  pipeline
>  | ReadFromText("s3://mx-machine-learning/panwenhai/beam_test/test_data")
>  | "extract_words" >> beam.FlatMap(lambda x: re.findall(r" [A-Za-z\']+", x))
>  | beam.combiners.Count.PerElement()
>  | beam.MapTuple(lambda word, count: "%s: %s" % (word, count))
>  | WriteToText("s3://mx-machine-learning/panwenhai/beam_test/output")
>  )
> result = pipeline.run()
>  result.wait_until_finish()
> return
> {code}
>  
> Error message 1:
> {noformat}
> apache_beam.io.filesystem.BeamIOError: Match operation failed with exceptions 
> {'s3://mx-machine-learning/panwenhai/beam_test/output-*-of-1': 
> BeamIOError("List operation failed with exceptions 
> {'s3://mx-machine-learning/panwenhai/beam_test/output-': S3ClientError('Tried 
> to list nonexistent S3 path: 
> s3://mx-machine-learning/panwenhai/beam_test/output-', 404)}")} [while 
> running 'WriteToText/Write/WriteImpl/PreFinalize'] with exceptions 
> None{noformat}
>  
> After digging into the code, it seems the Boto3 client's list function will 
> raise an exception when trying to list a nonexistent S3 path 
> (beam/sdks/pythonapache_beam/io/aws/clients/s3/boto3_client.py line 111). And 
> the S3IO class does not handle this exception in list_prefix function 
> (beam/sdks/python/apache_beam/io/aws/s3io.py line 121).
> When the runner tries to list and delete the existing output file, if there 
> is no existing output file, it will try to list a nonexistent S3 path and 
> will trigger the exception.
> This should not be an issue here. I think we can ignore this exception safely 
> in the S3IO list_prefix function.
> Error Message 2:
> {noformat}
> File 
> "/Users/wenhai.pan/venvs/tfx/lib/python3.7/site-packages/apache_beam-2.19.0.dev0-py3.7.egg/apache_beam/io/aws/s3filesystem.py",
>  line 272, in delete
> exceptions = {path: error for (path, error) in results
> File 
> "/Users/wenhai.pan/venvs/tfx/lib/python3.7/site-packages/apache_beam-2.19.0.dev0-py3.7.egg/apache_beam/io/aws/s3filesystem.py",
>  line 272, in 
> exceptions = {path: error for (path, error) in results
> ValueError: too many values to unpack (expected 2) [while running 
> 'WriteToText/Write/WriteImpl/FinalizeWrite']{noformat}
>  
> When the runner tries to delete the temporary output directory, it will 
> trigger this exception. This exception is caused by parsing (path, error) 
> directly from the "results" which is a dict 
> (beam/sdks/python/apache_beam/io/aws/s3filesystem.py line 272). I think we 
> should use results.items() here.
> I have submitted a patch for these 2 bugs: 
> https://github.com/apache/beam/pull/10459
>  
> Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9120) Deprecate onSuccessMatcher, onCreateMatcher

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9120?focusedWorklogId=375261=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375261
 ]

ASF GitHub Bot logged work on BEAM-9120:


Author: ASF GitHub Bot
Created on: 21/Jan/20 22:37
Start Date: 21/Jan/20 22:37
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10589: [WIP] 
[BEAM-9120] Deprecate onSuccessMatcher, onCreateMatcher
URL: https://github.com/apache/beam/pull/10589#discussion_r369281313
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/TestDataflowRunner.java
 ##
 @@ -102,8 +101,6 @@ DataflowPipelineJob run(Pipeline pipeline, DataflowRunner 
runner) {
 job.getJobId(),
 expectedNumberOfAssertions);
 
-assertThat(job, testPipelineOptions.getOnCreateMatcher());
 
 Review comment:
   Does it do that? My understanding was that this would be a no-op by default, 
since OnCreateMatcher is an instance of AlwaysPassMatcher. It will only fail 
fast if the user sets an OnCreateMatcher, which I don't think is happening 
anywhere outside of TestDataflowRunnerTest.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375261)
Time Spent: 2h  (was: 1h 50m)

> Deprecate onSuccessMatcher, onCreateMatcher
> ---
>
> Key: BEAM-9120
> URL: https://issues.apache.org/jira/browse/BEAM-9120
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Instead of creating matchers on PipelineResult we should just make assertions 
> on real matchers after waiting for the pipeline to finish.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9120) Deprecate onSuccessMatcher, onCreateMatcher

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9120?focusedWorklogId=375258=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375258
 ]

ASF GitHub Bot logged work on BEAM-9120:


Author: ASF GitHub Bot
Created on: 21/Jan/20 22:31
Start Date: 21/Jan/20 22:31
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10586: 
[BEAM-9120] Make BigqueryMatcher extend TypeSafeMatcher
URL: https://github.com/apache/beam/pull/10586
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375258)
Time Spent: 1h 40m  (was: 1.5h)

> Deprecate onSuccessMatcher, onCreateMatcher
> ---
>
> Key: BEAM-9120
> URL: https://issues.apache.org/jira/browse/BEAM-9120
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Instead of creating matchers on PipelineResult we should just make assertions 
> on real matchers after waiting for the pipeline to finish.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9120) Deprecate onSuccessMatcher, onCreateMatcher

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9120?focusedWorklogId=375259=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375259
 ]

ASF GitHub Bot logged work on BEAM-9120:


Author: ASF GitHub Bot
Created on: 21/Jan/20 22:31
Start Date: 21/Jan/20 22:31
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10588: 
[BEAM-9120] Make FileChecksumMatcher extend TypeSafeMatcher
URL: https://github.com/apache/beam/pull/10588
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375259)
Time Spent: 1h 50m  (was: 1h 40m)

> Deprecate onSuccessMatcher, onCreateMatcher
> ---
>
> Key: BEAM-9120
> URL: https://issues.apache.org/jira/browse/BEAM-9120
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Instead of creating matchers on PipelineResult we should just make assertions 
> on real matchers after waiting for the pipeline to finish.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9151) Dataflow legacy worker tests are mis-configured

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9151?focusedWorklogId=375255=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375255
 ]

ASF GitHub Bot logged work on BEAM-9151:


Author: ASF GitHub Bot
Created on: 21/Jan/20 22:29
Start Date: 21/Jan/20 22:29
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #10647: [BEAM-9151] 
Cherry-pick: Fix misconfigured legacy dataflow tests
URL: https://github.com/apache/beam/pull/10647#issuecomment-576915611
 
 
   R: @aaltay 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375255)
Time Spent: 1h  (was: 50m)

> Dataflow legacy worker tests are mis-configured
> ---
>
> Key: BEAM-9151
> URL: https://issues.apache.org/jira/browse/BEAM-9151
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Please refer to the last comment of https://github.com/apache/beam/pull/8183



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9151) Dataflow legacy worker tests are mis-configured

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9151?focusedWorklogId=375254=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375254
 ]

ASF GitHub Bot logged work on BEAM-9151:


Author: ASF GitHub Bot
Created on: 21/Jan/20 22:26
Start Date: 21/Jan/20 22:26
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #10647: [BEAM-9151] 
Cherry-pick: Fix misconfigured legacy dataflow tests
URL: https://github.com/apache/beam/pull/10647#issuecomment-576914509
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375254)
Time Spent: 50m  (was: 40m)

> Dataflow legacy worker tests are mis-configured
> ---
>
> Key: BEAM-9151
> URL: https://issues.apache.org/jira/browse/BEAM-9151
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Please refer to the last comment of https://github.com/apache/beam/pull/8183



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9063) Migrate docker images to apache namespace.

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9063?focusedWorklogId=375253=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375253
 ]

ASF GitHub Bot logged work on BEAM-9063:


Author: ASF GitHub Bot
Created on: 21/Jan/20 22:25
Start Date: 21/Jan/20 22:25
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10612: [NOT READY TO 
MERGE][BEAM-9063] migrate docker images to apache
URL: https://github.com/apache/beam/pull/10612#issuecomment-576914372
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375253)
Time Spent: 3h 10m  (was: 3h)

> Migrate docker images to apache namespace.
> --
>
> Key: BEAM-9063
> URL: https://issues.apache.org/jira/browse/BEAM-9063
> Project: Beam
>  Issue Type: Task
>  Components: beam-community
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> https://hub.docker.com/u/apache



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9151) Dataflow legacy worker tests are mis-configured

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9151?focusedWorklogId=375252=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375252
 ]

ASF GitHub Bot logged work on BEAM-9151:


Author: ASF GitHub Bot
Created on: 21/Jan/20 22:22
Start Date: 21/Jan/20 22:22
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #10647: [BEAM-9151] 
Cherry-pick: Fix misconfigured legacy dataflow tests
URL: https://github.com/apache/beam/pull/10647
 
 
   (cherry picked from commit f99c7d0a3ef55161797d6d00c7acf3a67ae0ee6e)
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build

[jira] [Work logged] (BEAM-9151) Dataflow legacy worker tests are mis-configured

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9151?focusedWorklogId=375250=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375250
 ]

ASF GitHub Bot logged work on BEAM-9151:


Author: ASF GitHub Bot
Created on: 21/Jan/20 22:14
Start Date: 21/Jan/20 22:14
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10635: [BEAM-9151] 
Fix misconfigured legacy dataflow tests.
URL: https://github.com/apache/beam/pull/10635
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375250)
Time Spent: 0.5h  (was: 20m)

> Dataflow legacy worker tests are mis-configured
> ---
>
> Key: BEAM-9151
> URL: https://issues.apache.org/jira/browse/BEAM-9151
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Please refer to the last comment of https://github.com/apache/beam/pull/8183



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9140) Update to ZetaSQL 2020.01.1

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9140?focusedWorklogId=375246=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375246
 ]

ASF GitHub Bot logged work on BEAM-9140:


Author: ASF GitHub Bot
Created on: 21/Jan/20 22:10
Start Date: 21/Jan/20 22:10
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #10620: [BEAM-9140] 
Upgrade to ZetaSQL 2020.01.1
URL: https://github.com/apache/beam/pull/10620#issuecomment-576908170
 
 
   LGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375246)
Time Spent: 1h 50m  (was: 1h 40m)

> Update to ZetaSQL 2020.01.1
> ---
>
> Key: BEAM-9140
> URL: https://issues.apache.org/jira/browse/BEAM-9140
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> I hear ZetaSQL 2020.01.1 will be coming out in the next few hours. We should 
> upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9093) Pipeline options which with different underlying store variable does not get over written

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9093?focusedWorklogId=375240=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375240
 ]

ASF GitHub Bot logged work on BEAM-9093:


Author: ASF GitHub Bot
Created on: 21/Jan/20 21:59
Start Date: 21/Jan/20 21:59
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10613: [BEAM-9093] Log 
invalid overwrites in pipeline options
URL: https://github.com/apache/beam/pull/10613#discussion_r369264919
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options_test.py
 ##
 @@ -264,6 +264,18 @@ def test_override_options(self):
 self.assertEqual(options.get_all_options()['num_workers'], 5)
 self.assertTrue(options.get_all_options()['mock_flag'])
 
+  def test_override_init_options(self):
+base_flags = ['--num_workers', '5']
+options = PipelineOptions(base_flags, mock_flag=True)
+self.assertEqual(options.get_all_options()['num_workers'], 5)
+self.assertEqual(options.get_all_options()['mock_flag'], True)
+
+  def test_invalid_override_init_options(self):
+base_flags = ['--num_workers', '5']
+options = PipelineOptions(base_flags, mock_invalid_flag=True)
+self.assertEqual(options.get_all_options()['num_workers'], 5)
+self.assertEqual(options.get_all_options()['mock_flag'], False)
 
 Review comment:
   It would be great if we could also capture the logging in addition to only 
checking `mock_flag` is untouched.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375240)
Time Spent: 40m  (was: 0.5h)

> Pipeline options which with different underlying store variable does not get 
> over written
> -
>
> Key: BEAM-9093
> URL: https://issues.apache.org/jira/browse/BEAM-9093
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Example:
> PipelineOptions(flags=[],**\{'no_use_public_ips': True,})
> Expectation: use_public_ips should be set False.
> Actual: the value is not used as its not passed through argparser
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9093) Pipeline options which with different underlying store variable does not get over written

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9093?focusedWorklogId=375241=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375241
 ]

ASF GitHub Bot logged work on BEAM-9093:


Author: ASF GitHub Bot
Created on: 21/Jan/20 21:59
Start Date: 21/Jan/20 21:59
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10613: [BEAM-9093] Log 
invalid overwrites in pipeline options
URL: https://github.com/apache/beam/pull/10613#discussion_r369260182
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -288,15 +288,20 @@ def get_all_options(self,
   _LOGGER.warning("Discarding unparseable args: %s", unknown_args)
 result = vars(known_args)
 
+overrides = self._all_options.copy()
 # Apply the overrides if any
 for k in list(result):
+  overrides.pop(k, None)
   if k in self._all_options:
 result[k] = self._all_options[k]
   if (drop_default and
   parser.get_default(k) == result[k] and
   not isinstance(parser.get_default(k), ValueProvider)):
 del result[k]
 
+if  overrides:
 
 Review comment:
   Looks like there are two spaces.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375241)
Time Spent: 40m  (was: 0.5h)

> Pipeline options which with different underlying store variable does not get 
> over written
> -
>
> Key: BEAM-9093
> URL: https://issues.apache.org/jira/browse/BEAM-9093
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Example:
> PipelineOptions(flags=[],**\{'no_use_public_ips': True,})
> Expectation: use_public_ips should be set False.
> Actual: the value is not used as its not passed through argparser
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8626) Implement status api handler in python sdk harness

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8626?focusedWorklogId=375239=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375239
 ]

ASF GitHub Bot logged work on BEAM-8626:


Author: ASF GitHub Bot
Created on: 21/Jan/20 21:57
Start Date: 21/Jan/20 21:57
Worklog Time Spent: 10m 
  Work Description: y1chi commented on pull request #10598: [BEAM-8626] 
Implement status fn api handler in python sdk
URL: https://github.com/apache/beam/pull/10598#discussion_r369264960
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/worker_status.py
 ##
 @@ -0,0 +1,148 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""Worker status api handler for reporting SDK harness debug info."""
+
+from __future__ import absolute_import
+from __future__ import division
+
+import queue
+import sys
+import threading
+import traceback
+from collections import defaultdict
+
+import grpc
+
+from apache_beam.portability.api import beam_fn_api_pb2
+from apache_beam.portability.api import beam_fn_api_pb2_grpc
+from apache_beam.runners.worker.channel_factory import GRPCChannelFactory
+from apache_beam.runners.worker.worker_id_interceptor import 
WorkerIdInterceptor
+
+
+def thread_dump():
+  """Get a thread dump for the current SDK worker harness. """
+  # deduplicate threads with same stack trace
+  stack_traces = defaultdict(list)
+  frames = sys._current_frames()  # pylint: disable=protected-access
+
+  for t in threading.enumerate():
+stack_trace = ''.join(traceback.format_stack(frames[t.ident]))
+thread_ident_name = (t.ident, t.name)
+stack_traces[stack_trace].append(thread_ident_name)
+
+  all_traces = ['=' * 10 + 'THREAD DUMP' + '=' * 10]
+  for stack, identity in stack_traces.items():
+ident, name = identity[0]
+trace = '--- Thread #%s name: %s %s---\n' % (
 
 Review comment:
   this is already printed in a separated line below.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375239)
Time Spent: 4h 20m  (was: 4h 10m)

> Implement status api handler in python sdk harness
> --
>
> Key: BEAM-8626
> URL: https://issues.apache.org/jira/browse/BEAM-8626
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9122) Add uses_keyed_state step property to python dataflow runner

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9122?focusedWorklogId=375238=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375238
 ]

ASF GitHub Bot logged work on BEAM-9122:


Author: ASF GitHub Bot
Created on: 21/Jan/20 21:56
Start Date: 21/Jan/20 21:56
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #10596: [BEAM-9122] Add 
uses_keyed_state step property in python dataflow run…
URL: https://github.com/apache/beam/pull/10596#issuecomment-576901355
 
 
   Run Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375238)
Time Spent: 2h  (was: 1h 50m)

> Add uses_keyed_state step property to python dataflow runner
> 
>
> Key: BEAM-9122
> URL: https://issues.apache.org/jira/browse/BEAM-9122
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Add additional step property to dataflow job property when a DoFn is stateful 
> in python sdk. So that the backend runner can recognize stateful steps.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9122) Add uses_keyed_state step property to python dataflow runner

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9122?focusedWorklogId=375236=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375236
 ]

ASF GitHub Bot logged work on BEAM-9122:


Author: ASF GitHub Bot
Created on: 21/Jan/20 21:55
Start Date: 21/Jan/20 21:55
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #10596: [BEAM-9122] Add 
uses_keyed_state step property in python dataflow run…
URL: https://github.com/apache/beam/pull/10596#issuecomment-576900982
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375236)
Time Spent: 1h 50m  (was: 1h 40m)

> Add uses_keyed_state step property to python dataflow runner
> 
>
> Key: BEAM-9122
> URL: https://issues.apache.org/jira/browse/BEAM-9122
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Add additional step property to dataflow job property when a DoFn is stateful 
> in python sdk. So that the backend runner can recognize stateful steps.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7516) Add a watermark manager for the fn_api_runner

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7516?focusedWorklogId=375235=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375235
 ]

ASF GitHub Bot logged work on BEAM-7516:


Author: ASF GitHub Bot
Created on: 21/Jan/20 21:48
Start Date: 21/Jan/20 21:48
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10291: 
[BEAM-7516][BEAM-8823] FnApiRunner works with work queues, and a primitive 
watermark manager
URL: https://github.com/apache/beam/pull/10291#issuecomment-576898410
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375235)
Time Spent: 4h  (was: 3h 50m)

> Add a watermark manager for the fn_api_runner
> -
>
> Key: BEAM-7516
> URL: https://issues.apache.org/jira/browse/BEAM-7516
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> To track watermarks for each stage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8626) Implement status api handler in python sdk harness

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8626?focusedWorklogId=375230=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375230
 ]

ASF GitHub Bot logged work on BEAM-8626:


Author: ASF GitHub Bot
Created on: 21/Jan/20 21:44
Start Date: 21/Jan/20 21:44
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #10598: [BEAM-8626] 
Implement status fn api handler in python sdk
URL: https://github.com/apache/beam/pull/10598#discussion_r369252542
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/worker_status.py
 ##
 @@ -0,0 +1,148 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""Worker status api handler for reporting SDK harness debug info."""
+
+from __future__ import absolute_import
+from __future__ import division
+
+import queue
+import sys
+import threading
+import traceback
+from collections import defaultdict
+
+import grpc
+
+from apache_beam.portability.api import beam_fn_api_pb2
+from apache_beam.portability.api import beam_fn_api_pb2_grpc
+from apache_beam.runners.worker.channel_factory import GRPCChannelFactory
+from apache_beam.runners.worker.worker_id_interceptor import 
WorkerIdInterceptor
+
+
+def thread_dump():
+  """Get a thread dump for the current SDK worker harness. """
+  # deduplicate threads with same stack trace
+  stack_traces = defaultdict(list)
+  frames = sys._current_frames()  # pylint: disable=protected-access
+
+  for t in threading.enumerate():
+stack_trace = ''.join(traceback.format_stack(frames[t.ident]))
+thread_ident_name = (t.ident, t.name)
+stack_traces[stack_trace].append(thread_ident_name)
+
+  all_traces = ['=' * 10 + 'THREAD DUMP' + '=' * 10]
+  for stack, identity in stack_traces.items():
 
 Review comment:
   You are right. It's already in this PR. 
   We can print names of all the threads along with count so that we don't miss 
any information.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375230)
Time Spent: 4h  (was: 3h 50m)

> Implement status api handler in python sdk harness
> --
>
> Key: BEAM-8626
> URL: https://issues.apache.org/jira/browse/BEAM-8626
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8626) Implement status api handler in python sdk harness

2020-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8626?focusedWorklogId=375231=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375231
 ]

ASF GitHub Bot logged work on BEAM-8626:


Author: ASF GitHub Bot
Created on: 21/Jan/20 21:44
Start Date: 21/Jan/20 21:44
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #10598: [BEAM-8626] 
Implement status fn api handler in python sdk
URL: https://github.com/apache/beam/pull/10598#discussion_r369257775
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/worker_status.py
 ##
 @@ -0,0 +1,148 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""Worker status api handler for reporting SDK harness debug info."""
+
+from __future__ import absolute_import
+from __future__ import division
+
+import queue
+import sys
+import threading
+import traceback
+from collections import defaultdict
+
+import grpc
+
+from apache_beam.portability.api import beam_fn_api_pb2
+from apache_beam.portability.api import beam_fn_api_pb2_grpc
+from apache_beam.runners.worker.channel_factory import GRPCChannelFactory
+from apache_beam.runners.worker.worker_id_interceptor import 
WorkerIdInterceptor
+
+
+def thread_dump():
+  """Get a thread dump for the current SDK worker harness. """
+  # deduplicate threads with same stack trace
+  stack_traces = defaultdict(list)
+  frames = sys._current_frames()  # pylint: disable=protected-access
+
+  for t in threading.enumerate():
+stack_trace = ''.join(traceback.format_stack(frames[t.ident]))
+thread_ident_name = (t.ident, t.name)
+stack_traces[stack_trace].append(thread_ident_name)
+
+  all_traces = ['=' * 10 + 'THREAD DUMP' + '=' * 10]
+  for stack, identity in stack_traces.items():
+ident, name = identity[0]
+trace = '--- Thread #%s name: %s %s---\n' % (
 
 Review comment:
   ```suggestion
   trace = '--- Threads (%d) %s --- \n' % (len(identity), [ident+':'+name 
for (ident, name) in identity])
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375231)
Time Spent: 4h 10m  (was: 4h)

> Implement status api handler in python sdk harness
> --
>
> Key: BEAM-8626
> URL: https://issues.apache.org/jira/browse/BEAM-8626
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 3 >

1 - 100 of 214 matches

Mail list logo