[jira] [Work logged] (BEAM-8941) Create a common place for Load Tests configuration
[ https://issues.apache.org/jira/browse/BEAM-8941?focusedWorklogId=375415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375415 ] ASF GitHub Bot logged work on BEAM-8941: Author: ASF GitHub Bot Created on: 22/Jan/20 05:54 Start Date: 22/Jan/20 05:54 Worklog Time Spent: 10m Work Description: pawelpasterz commented on issue #10543: [BEAM-8941] Implement simple DSL for load tests URL: https://github.com/apache/beam/pull/10543#issuecomment-577020379 Hey @kkucharc , I fixed typo, should work now :) Please rerun seed job and smoke test, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375415) Time Spent: 4h 10m (was: 4h) > Create a common place for Load Tests configuration > -- > > Key: BEAM-8941 > URL: https://issues.apache.org/jira/browse/BEAM-8941 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Kamil Wasilewski >Assignee: Pawel Pasterz >Priority: Minor > Fix For: Not applicable > > Time Spent: 4h 10m > Remaining Estimate: 0h > > The Apache Beam community maintains different versions of each Load Test. For > example, right now, there are two versions of all Python Load Tests: the > first one runs on Dataflow runner, and the second one runs on Flink. With the > lack of a common place where configuration for the tests can be stored, the > configuration is duplicated many times with minimal differences. > The goal is to create a common place for the configuration, so that it could > be passed to different files with tests (.test-infra/jenkins/*.groovy) and > filtered according to needs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8618) Tear down unused DoFns periodically in Python SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8618?focusedWorklogId=375408=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375408 ] ASF GitHub Bot logged work on BEAM-8618: Author: ASF GitHub Bot Created on: 22/Jan/20 05:39 Start Date: 22/Jan/20 05:39 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on issue #10655: [BEAM-8618] Tear down unused DoFns periodically in Python SDK harness. URL: https://github.com/apache/beam/pull/10655#issuecomment-577016962 R: @robertwb @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375408) Time Spent: 20m (was: 10m) > Tear down unused DoFns periodically in Python SDK harness > - > > Key: BEAM-8618 > URL: https://issues.apache.org/jira/browse/BEAM-8618 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Per the discussion in the ML, detail can be found [1], the teardown of DoFns > should be supported in the portability framework. It happens at two places: > 1) Upon the control service termination > 2) Tear down the unused DoFns periodically > The aim of this JIRA is to add support for tear down the unused DoFns > periodically in Python SDK harness. > [1] > https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8618) Tear down unused DoFns periodically in Python SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8618?focusedWorklogId=375407=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375407 ] ASF GitHub Bot logged work on BEAM-8618: Author: ASF GitHub Bot Created on: 22/Jan/20 05:39 Start Date: 22/Jan/20 05:39 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on pull request #10655: [BEAM-8618] Tear down unused DoFns periodically in Python SDK harness. URL: https://github.com/apache/beam/pull/10655 Tear down unused DoFns periodically in Python SDK harness. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-7847) Generate Python SDK docs using Python 3
[ https://issues.apache.org/jira/browse/BEAM-7847?focusedWorklogId=375403=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375403 ] ASF GitHub Bot logged work on BEAM-7847: Author: ASF GitHub Bot Created on: 22/Jan/20 05:21 Start Date: 22/Jan/20 05:21 Worklog Time Spent: 10m Work Description: lazylynx commented on pull request #10141: [BEAM-7847] enabled to generate SDK docs with Python3 URL: https://github.com/apache/beam/pull/10141#discussion_r369374288 ## File path: sdks/python/tox.ini ## @@ -290,7 +290,7 @@ commands = [testenv:docs] extras = test,gcp,docs,interactive deps = - Sphinx==1.6.5 + Sphinx==1.8.5 sphinx_rtd_theme==0.2.4 Review comment: Got it. Thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375403) Time Spent: 5h 50m (was: 5h 40m) > Generate Python SDK docs using Python 3 > > > Key: BEAM-7847 > URL: https://issues.apache.org/jira/browse/BEAM-7847 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: yoshiki obata >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > Currently scripts/generate_pydoc.sh script fails on Python 3 with > "RuntimeError: empty_like method already has a docstring" errors: > {noformat} > pip install -e .[gcp,test] > pip install Sphinx==1.6.5 > pip install sphinx_rtd_theme==0.2.4 > ./scripts/generate_pydoc.sh > /home/valentyn/projects/beam/beam/beam/sdks/python/target/docs/source/apache_beam.testing.benchmarks.nexmark.queries.query0.rst:4: > WARNING: autodoc: failed to import module > 'apache_beam.testing.benchmarks.nexmark.queries.query0'; the following > exception was raised: > Traceback (most recent call last): > File > "/home/valentyn/tmp/venv/py3/lib/python3.6/site-packages/sphinx/ext/autodoc.py", > line 658, in import_object > __import__(self.modname) > File > "/home/valentyn/projects/beam/beam/beam/sdks/python/apache_beam/__init__.py", > line 98, in > from apache_beam import io > File > "/home/valentyn/projects/beam/beam/beam/sdks/python/apache_beam/io/__init__.py", > line 22, in > from apache_beam.io.avroio import * > File > "/home/valentyn/projects/beam/beam/beam/sdks/python/apache_beam/io/avroio.py", > line 61, in > from apache_beam.io import filebasedsink > File > "/home/valentyn/projects/beam/beam/beam/sdks/python/apache_beam/io/filebasedsink.py", > line 34, in > from apache_beam.io import iobase > File > "/home/valentyn/projects/beam/beam/beam/sdks/python/apache_beam/io/iobase.py", > line 50, in > from apache_beam.transforms import core > File > "/home/valentyn/projects/beam/beam/beam/sdks/python/apache_beam/transforms/__init__.py", > line 29, in > from apache_beam.transforms.util import * > File > "/home/valentyn/projects/beam/beam/beam/sdks/python/apache_beam/transforms/util.py", > line 228, in > class _BatchSizeEstimator(object): > File > "/home/valentyn/projects/beam/beam/beam/sdks/python/apache_beam/transforms/util.py", > line 359, in _BatchSizeEstimator > import numpy as np > File > "/home/valentyn/tmp/venv/py3/lib/python3.6/site-packages/numpy/__init__.py", > line 142, in > from . import core > File > "/home/valentyn/tmp/venv/py3/lib/python3.6/site-packages/numpy/core/__init__.py", > line 17, in > from . import multiarray > File > "/home/valentyn/tmp/venv/py3/lib/python3.6/site-packages/numpy/core/multiarray.py", > line 78, in > def empty_like(prototype, dtype=None, order=None, subok=None, shape=None): > File > "/home/valentyn/tmp/venv/py3/lib/python3.6/site-packages/numpy/core/overrides.py", > line 203, in decorator > docs_from_dispatcher=docs_from_dispatcher)(implementation) > File > "/home/valentyn/tmp/venv/py3/lib/python3.6/site-packages/numpy/core/overrides.py", > line 159, in decorator > add_docstring(implementation, dispatcher.__doc__) > RuntimeError: empty_like method already has a docstring > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9008) Add readAll() method to CassandraIO
[ https://issues.apache.org/jira/browse/BEAM-9008?focusedWorklogId=375391=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375391 ] ASF GitHub Bot logged work on BEAM-9008: Author: ASF GitHub Bot Created on: 22/Jan/20 04:09 Start Date: 22/Jan/20 04:09 Worklog Time Spent: 10m Work Description: vmarquez commented on issue #10546: [BEAM-9008] Add CassandraIO readAll method URL: https://github.com/apache/beam/pull/10546#issuecomment-576998681 Thanks for taking a look, I've pushed a new commit (I won't be squashing anymore until we finalize everything) that hopefully addresses all the minor style issues. LMK if I missed anything. I do want to make sure we keep the cassandraIO 'idiomatic' to the rest of the IO connectors, but I don't think modeling this after the SOLR one will work. For one thing, if we want to share the `ReadFn` class between both Read and ReadAll, it means we have to have some way of having both use it and pass in 'connection' information, which we can't do if the signature of ReadFn is `ReadFn extends DoFn, A>`. I think another class to look at for something that has both Read and ReadAll PTransforms is the SpannerIO, which is modeled similarly to how I did it here (though not exactly). https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java#L315 They have a configuration class there is public (which we can't do since we want to keep backwards compatibility with the current way `Read` works), it has two different PTransforms, `Read` uses `ReadAll` internally, etc. I do think instead of taking a collection of RingRanges, taking some sort of 'Query' object makes sense, and the idea that it doesn't have to tie in to the actual connection means we can split up the CassandraConfig class. Thoughts on that? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375391) Time Spent: 2h 40m (was: 2.5h) > Add readAll() method to CassandraIO > --- > > Key: BEAM-9008 > URL: https://issues.apache.org/jira/browse/BEAM-9008 > Project: Beam > Issue Type: New Feature > Components: io-java-cassandra >Affects Versions: 2.16.0 >Reporter: vincent marquez >Assignee: vincent marquez >Priority: Minor > Time Spent: 2h 40m > Remaining Estimate: 0h > > When querying a large cassandra database, it's often *much* more useful to > programatically generate the queries needed to to be run rather than reading > all partitions and attempting some filtering. > As an example: > {code:java} > public class Event { >@PartitionKey(0) public UUID accountId; >@PartitionKey(1)public String yearMonthDay; >@ClusteringKey public UUID eventId; >//other data... > }{code} > If there is ten years worth of data, you may want to only query one year's > worth. Here each token range would represent one 'token' but all events for > the day. > {code:java} > Set accounts = getRelevantAccounts(); > Set dateRange = generateDateRange("2018-01-01", "2019-01-01"); > PCollection tokens = generateTokens(accounts, dateRange); > {code} > > I propose an additional _readAll()_ PTransform that can take a PCollection > of token ranges and can return a PCollection of what the query would > return. > *Question: How much code should be in common between both methods?* > Currently the read connector already groups all partitions into a List of > Token Ranges, so it would be simple to refactor the current read() based > method to a 'ParDo' based one and have them both share the same function. > Reasons against sharing code between read and readAll > * Not having the read based method return a BoundedSource connector would > mean losing the ability to know the size of the data returned > * Currently the CassandraReader executes all the grouped TokenRange queries > *asynchronously* which is (maybe?) fine when all that's happening is > splitting up all the partition ranges but terrible for executing potentially > millions of queries. > Reasons _for_ sharing code would be simplified code base and that both of > the above issues would most likely have a negligable performance impact. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-6550) ParDo Async Java API
[ https://issues.apache.org/jira/browse/BEAM-6550?focusedWorklogId=375390=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375390 ] ASF GitHub Bot logged work on BEAM-6550: Author: ASF GitHub Bot Created on: 22/Jan/20 04:04 Start Date: 22/Jan/20 04:04 Worklog Time Spent: 10m Work Description: mynameborat commented on pull request #10651: [BEAM-6550] ParDo Async Java API URL: https://github.com/apache/beam/pull/10651 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375390) Time Spent: 20m (was: 10m) > ParDo Async Java API > > > Key: BEAM-6550 > URL: https://issues.apache.org/jira/browse/BEAM-6550 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Xinyu Liu >Assignee: Bharath Kumarasubramanian >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > This ticket is to track the work on adding the ParDo async API. The > motivation for this is: > - Many users are experienced in asynchronous programming. With async > frameworks such as Netty and ParSeq and libs like async jersey client, they > are able to make remote calls efficiently and the libraries help manage the > execution threads underneath. Async remote calls are very common in most of > our streaming applications today. > - Many jobs are running on a multi-tenancy cluster. Async processing helps > for less resource usage and fast computation (less context switch). > This API has become one of the most asked Java api from SamzaRunner users. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite
[ https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=375364=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375364 ] ASF GitHub Bot logged work on BEAM-7961: Author: ASF GitHub Bot Created on: 22/Jan/20 02:06 Start Date: 22/Jan/20 02:06 Worklog Time Spent: 10m Work Description: ihji commented on pull request #10051: [BEAM-7961] Add tests for all runner native transforms for XLang URL: https://github.com/apache/beam/pull/10051#discussion_r369338933 ## File path: runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ValidateRunnerXlangTest.java ## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction; + +import static org.hamcrest.Matchers.equalTo; +import static org.junit.Assert.assertThat; + +import java.io.Serializable; +import java.nio.charset.StandardCharsets; +import java.util.Arrays; +import org.apache.beam.sdk.PipelineResult; +import org.apache.beam.sdk.options.ExperimentalOptions; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.testing.UsesCrossLanguageTransforms; +import org.apache.beam.sdk.testing.ValidatesRunner; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionList; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.TypeDescriptors; +import org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ConnectivityState; +import org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ManagedChannelBuilder; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.junit.After; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.Test; +import org.junit.experimental.categories.Category; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Test External transforms. */ +@RunWith(JUnit4.class) +public class ValidateRunnerXlangTest implements Serializable { + @Rule public transient TestPipeline testPipeline = TestPipeline.create(); + private PipelineResult pipelineResult; + + private static final String TEST_PREFIX_URN = "beam:transforms:xlang:test:prefix"; + private static final String TEST_MULTI_URN = "beam:transforms:xlang:test:multi"; + private static final String TEST_GBK_URN = "beam:transforms:xlang:test:gbk"; + private static final String TEST_CGBK_URN = "beam:transforms:xlang:test:cgbk"; + private static final String TEST_COMGL_URN = "beam:transforms:xlang:test:comgl"; + private static final String TEST_COMPK_URN = "beam:transforms:xlang:test:compk"; + private static final String TEST_FLATTEN_URN = "beam:transforms:xlang:test:flatten"; + private static final String TEST_PARTITION_URN = "beam:transforms:xlang:test:partition"; + + private static String expansionAddr; + private static String expansionJar; + + @BeforeClass + public static void setUpClass() { +expansionAddr = +String.format("localhost:%s", Integer.valueOf(System.getProperty("expansionPort"))); +expansionJar = System.getProperty("expansionJar"); + } + + @Before + public void setUp() { +testPipeline +.getOptions() +.as(ExperimentalOptions.class) +.setExperiments(ImmutableList.of("jar_packages=" + expansionJar)); +waitForReady(); + } + + @After + public void tearDown() { +pipelineResult.waitUntilFinish(); +assertThat(pipelineResult.getState(), equalTo(PipelineResult.State.DONE)); + } + + private void waitForReady() { +try { + ManagedChannel channel = ManagedChannelBuilder.forTarget(expansionAddr).build(); + ConnectivityState state
[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite
[ https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=375363=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375363 ] ASF GitHub Bot logged work on BEAM-7961: Author: ASF GitHub Bot Created on: 22/Jan/20 02:05 Start Date: 22/Jan/20 02:05 Worklog Time Spent: 10m Work Description: ihji commented on pull request #10051: [BEAM-7961] Add tests for all runner native transforms for XLang URL: https://github.com/apache/beam/pull/10051#discussion_r369338651 ## File path: runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ValidateRunnerXlangTest.java ## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction; + +import static org.hamcrest.Matchers.equalTo; +import static org.junit.Assert.assertThat; + +import java.io.Serializable; +import java.nio.charset.StandardCharsets; +import java.util.Arrays; +import org.apache.beam.sdk.PipelineResult; +import org.apache.beam.sdk.options.ExperimentalOptions; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.testing.UsesCrossLanguageTransforms; +import org.apache.beam.sdk.testing.ValidatesRunner; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionList; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.TypeDescriptors; +import org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ConnectivityState; +import org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ManagedChannelBuilder; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; +import org.junit.After; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.Test; +import org.junit.experimental.categories.Category; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Test External transforms. */ +@RunWith(JUnit4.class) +public class ValidateRunnerXlangTest implements Serializable { + @Rule public transient TestPipeline testPipeline = TestPipeline.create(); + private PipelineResult pipelineResult; + + private static final String TEST_PREFIX_URN = "beam:transforms:xlang:test:prefix"; Review comment: I put the link to the design doc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375363) Time Spent: 16h 20m (was: 16h 10m) > Add tests for all runner native transforms and some widely used composite > transforms to cross-language validates runner test suite > -- > > Key: BEAM-7961 > URL: https://issues.apache.org/jira/browse/BEAM-7961 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 16h 20m > Remaining Estimate: 0h > > Add tests for all runner native transforms and some widely used composite > transforms to cross-language validates runner test suite -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite
[ https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=375362=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375362 ] ASF GitHub Bot logged work on BEAM-7961: Author: ASF GitHub Bot Created on: 22/Jan/20 02:04 Start Date: 22/Jan/20 02:04 Worklog Time Spent: 10m Work Description: ihji commented on pull request #10051: [BEAM-7961] Add tests for all runner native transforms for XLang URL: https://github.com/apache/beam/pull/10051#discussion_r369338544 ## File path: runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExternalTest.java ## @@ -56,19 +57,18 @@ @RunWith(JUnit4.class) public class ExternalTest implements Serializable { @Rule public transient TestPipeline testPipeline = TestPipeline.create(); + private PipelineResult pipelineResult; private static final String TEST_URN_SIMPLE = "simple"; private static final String TEST_URN_LE = "le"; private static final String TEST_URN_MULTI = "multi"; - private static Integer expansionPort; private static String localExpansionAddr; private static Server localExpansionServer; @BeforeClass - public static void setUp() throws IOException { -expansionPort = Integer.valueOf(System.getProperty("expansionPort")); -int localExpansionPort = expansionPort + 100; + public static void setUpClass() throws IOException { +int localExpansionPort = Integer.parseInt(System.getProperty("expansionPort")) + 100; Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375362) Time Spent: 16h 10m (was: 16h) > Add tests for all runner native transforms and some widely used composite > transforms to cross-language validates runner test suite > -- > > Key: BEAM-7961 > URL: https://issues.apache.org/jira/browse/BEAM-7961 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 16h 10m > Remaining Estimate: 0h > > Add tests for all runner native transforms and some widely used composite > transforms to cross-language validates runner test suite -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9154) Move Chicago Taxi Example to Python 3
[ https://issues.apache.org/jira/browse/BEAM-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020714#comment-17020714 ] Udi Meiri commented on BEAM-9154: - How did you run the example? I'm trying this (having copied the 2 tasks from py2 to the py37/build.gradle file): {code} ./gradlew :sdks:python:test-suites:dataflow:py37:chicagoTaxiExample -PgcsRoot=gs://BUCKET/chicago-taxi {code} > Move Chicago Taxi Example to Python 3 > - > > Key: BEAM-9154 > URL: https://issues.apache.org/jira/browse/BEAM-9154 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > > The Chicago Taxi Example[1] should be moved to the latest version of Python > supported by Beam (currently it's Python 3.7). > At the moment, the following error occurs when running the benchmark on > Python 3.7 (requires futher investigation): > {code:java} > Traceback (most recent call last): > File "preprocess.py", line 259, in > main() > File "preprocess.py", line 254, in main > project=known_args.metric_reporting_project > File "preprocess.py", line 155, in transform_data > ('Analyze' >> tft_beam.AnalyzeDataset(preprocessing_fn))) > File > "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py", > line 987, in __ror__ > return self.transform.__ror__(pvalueish, self.label) > File > "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py", > line 547, in __ror__ > result = p.apply(self, pvalueish, label) > File > "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line > 532, in apply > return self.apply(transform, pvalueish) > File > "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line > 573, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File > "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File > "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", > line 223, in apply_PTransform > return transform.expand(input) > File > "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", > line 825, in expand > input_metadata)) > File > "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", > line 716, in expand > output_signature = self._preprocessing_fn(copied_inputs) > File "preprocess.py", line 102, in preprocessing_fn > _fill_in_missing(inputs[key]), > KeyError: 'company' > {code} > [1] sdks/python/apache_beam/testing/benchmarks/chicago_taxi -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn
[ https://issues.apache.org/jira/browse/BEAM-9168?focusedWorklogId=375361=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375361 ] ASF GitHub Bot logged work on BEAM-9168: Author: ASF GitHub Bot Created on: 22/Jan/20 01:54 Start Date: 22/Jan/20 01:54 Worklog Time Spent: 10m Work Description: udim commented on issue #10652: [BEAM-9168] Temporary fix for RunnerAPIPTransformHolder usage URL: https://github.com/apache/beam/pull/10652#issuecomment-576971789 still waiting for tests (just starting now) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375361) Time Spent: 1h 20m (was: 1h 10m) > AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with > PAR_DO.urn > - > > Key: BEAM-9168 > URL: https://issues.apache.org/jira/browse/BEAM-9168 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > This is failing on a google-internal test. > Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder. > Failed assertion: > https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn
[ https://issues.apache.org/jira/browse/BEAM-9168?focusedWorklogId=375360=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375360 ] ASF GitHub Bot logged work on BEAM-9168: Author: ASF GitHub Bot Created on: 22/Jan/20 01:48 Start Date: 22/Jan/20 01:48 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10652: [BEAM-9168] Temporary fix for RunnerAPIPTransformHolder usage URL: https://github.com/apache/beam/pull/10652#discussion_r369335166 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -1050,7 +1051,9 @@ def is_side_input(tag): for tag, id in proto.outputs.items()} # This annotation is expected by some runners. if proto.spec.urn == common_urns.primitives.PAR_DO.urn: - assert isinstance(result.transform, ParDo) + # TODO(BEAM-9168): Figure out what to do for RunnerAPIPTransformHolder. + assert isinstance(result.transform, (ParDo, RunnerAPIPTransformHolder)),\ Review comment: I see. Nevermind then. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375360) Time Spent: 1h 10m (was: 1h) > AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with > PAR_DO.urn > - > > Key: BEAM-9168 > URL: https://issues.apache.org/jira/browse/BEAM-9168 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > This is failing on a google-internal test. > Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder. > Failed assertion: > https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9167) Reduce overhead of Go SDK side metrics
[ https://issues.apache.org/jira/browse/BEAM-9167?focusedWorklogId=375359=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375359 ] ASF GitHub Bot logged work on BEAM-9167: Author: ASF GitHub Bot Created on: 22/Jan/20 01:45 Start Date: 22/Jan/20 01:45 Worklog Time Spent: 10m Work Description: lostluck commented on issue #10654: [BEAM-9167] Reduce Go SDK metric overhead URL: https://github.com/apache/beam/pull/10654#issuecomment-576969875 R: @youngoli This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375359) Time Spent: 20m (was: 10m) > Reduce overhead of Go SDK side metrics > -- > > Key: BEAM-9167 > URL: https://issues.apache.org/jira/browse/BEAM-9167 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Locking overhead due to the global store and local caches of SDK counter data > can dominate certain workloads, which means we can do better. > Instead of having a global store of metrics data to extract counters, we > should use per ptransform (or per bundle) counter sets, which would avoid > requiring locking per counter operation. The main detriment compared to the > current implementation is that a user would need to add their own locking if > they were to spawn multiple goroutines to process a Bundle's work in a DoFn. > Given that self multithreaded DoFns aren't recommended/safe in Java, largely > impossible in Python, and the other beam Go SDK provided constructs (like > Iterators and Emitters) are not thread safe, this is a small concern, > provided the documentation is clear on this. > Removing the locking and switching to atomic ops reduces the overhead > significantly in example jobs and in the benchmarks. > A second part of this change should be to move the exec package to manage > it's own per bundle state, rather than relying on a global datastore to > extract the per bundle,per ptransform values. > Related: https://issues.apache.org/jira/browse/BEAM-6541 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9167) Reduce overhead of Go SDK side metrics
[ https://issues.apache.org/jira/browse/BEAM-9167?focusedWorklogId=375358=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375358 ] ASF GitHub Bot logged work on BEAM-9167: Author: ASF GitHub Bot Created on: 22/Jan/20 01:44 Start Date: 22/Jan/20 01:44 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #10654: [BEAM-9167] Reduce Go SDK metric overhead URL: https://github.com/apache/beam/pull/10654 This PR dramatically reduces the overhead of metrics in the Go SDK. A contemporary side by side comparison of the benchmark in the metrics package on my current machine: benchmarkold ns/op new ns/op delta BenchmarkMetrics/counter_inplace-12 585 249 -57.44% BenchmarkMetrics/distribution_inplace-12 622 270 -56.59% BenchmarkMetrics/gauge_inplace-12812 311 -61.70% BenchmarkMetrics/counter_predeclared-12 227 15.8 -93.04% BenchmarkMetrics/distribution_predeclared-12 282 24.0 -91.49% BenchmarkMetrics/gauge_predeclared-12389 63.7 -83.62% benchmarkold allocs new allocs delta BenchmarkMetrics/counter_inplace-12 4 1 -75.00% BenchmarkMetrics/distribution_inplace-12 4 1 -75.00% BenchmarkMetrics/gauge_inplace-124 1 -75.00% BenchmarkMetrics/counter_predeclared-12 3 0 -100.00% BenchmarkMetrics/distribution_predeclared-12 3 0 -100.00% BenchmarkMetrics/gauge_predeclared-123 0 -100.00% benchmarkold bytes new bytes delta BenchmarkMetrics/counter_inplace-12 160 48 -70.00% BenchmarkMetrics/distribution_inplace-12 192 48 -75.00% BenchmarkMetrics/gauge_inplace-12192 48 -75.00% BenchmarkMetrics/counter_predeclared-12 480 -100.00% BenchmarkMetrics/distribution_predeclared-12 800 -100.00% BenchmarkMetrics/gauge_predeclared-12800 -100.00% In particular this PR moves away from a global datastore for all metrics towards a perBundle based countersets. This allows for the removal of the per layer locks and the global lock that needed to be checked since all bundles had to check the same datastore. Now they only store a metric cell in the global store on first creation (still stored per bundle and per ptransform). A subsequent change will remove the global store altogether in favour of better exposing the metrics per bundle, and allowing a callback visitor to thread-safely access the data inside each metric. This will also permit removing the dependency on the protos from the package, which was a mistake I made when I first wrote the package. Further, Counters now use atomic operations rather than locks, which additional speeds them up vs the previous mutex approach. Counter "names" are hashed ahead of time and the hash value cached in the proxy to increase the speed of subsequent lookups using the same proxy object. This does make the proxies unsafe to use concurrently within the same bundle prior to first use, but this matches the general rule of Beam runners managing the concurrency for efficient processing, and that framework constructs are not safe for concurrent use by user code, without user managed locks. As an exploration, I did try using sync.Map to avoid the above restriction, but the overhead for the additional interface wraping and unwraping was significant enough that this approach was worthwhile. This may be worth revisiting if Go gains Generics, as that would probably avoid this cost. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor
[jira] [Work logged] (BEAM-7739) Add ValueState in Python sdk
[ https://issues.apache.org/jira/browse/BEAM-7739?focusedWorklogId=375356=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375356 ] ASF GitHub Bot logged work on BEAM-7739: Author: ASF GitHub Bot Created on: 22/Jan/20 01:27 Start Date: 22/Jan/20 01:27 Worklog Time Spent: 10m Work Description: stale[bot] commented on pull request #9067: [BEAM-7739] Implement ReadModifyWriteState in Python SDK URL: https://github.com/apache/beam/pull/9067 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375356) Time Spent: 6.5h (was: 6h 20m) > Add ValueState in Python sdk > > > Key: BEAM-7739 > URL: https://issues.apache.org/jira/browse/BEAM-7739 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Rakesh Kumar >Assignee: Rakesh Kumar >Priority: Major > Time Spent: 6.5h > Remaining Estimate: 0h > > Currently ValueState is missing from Python Sdks but it is existing in Java > sdks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7739) Add ValueState in Python sdk
[ https://issues.apache.org/jira/browse/BEAM-7739?focusedWorklogId=375355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375355 ] ASF GitHub Bot logged work on BEAM-7739: Author: ASF GitHub Bot Created on: 22/Jan/20 01:27 Start Date: 22/Jan/20 01:27 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #9067: [BEAM-7739] Implement ReadModifyWriteState in Python SDK URL: https://github.com/apache/beam/pull/9067#issuecomment-576965575 This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375355) Time Spent: 6h 20m (was: 6h 10m) > Add ValueState in Python sdk > > > Key: BEAM-7739 > URL: https://issues.apache.org/jira/browse/BEAM-7739 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Rakesh Kumar >Assignee: Rakesh Kumar >Priority: Major > Time Spent: 6h 20m > Remaining Estimate: 0h > > Currently ValueState is missing from Python Sdks but it is existing in Java > sdks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite
[ https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=375354=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375354 ] ASF GitHub Bot logged work on BEAM-7961: Author: ASF GitHub Bot Created on: 22/Jan/20 01:22 Start Date: 22/Jan/20 01:22 Worklog Time Spent: 10m Work Description: ihji commented on pull request #10051: [BEAM-7961] Add tests for all runner native transforms for XLang URL: https://github.com/apache/beam/pull/10051#discussion_r369329159 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -1684,22 +1688,21 @@ class BeamModulePlugin implements Plugin { args '-c', "$pythonDir/scripts/run_expansion_services.sh stop --group_id ${project.name}" } setupTask.finalizedBy cleanupTask + config.startJobServer.finalizedBy config.cleanupJobServer // Task for running testcases in Java SDK def beamJavaTestPipelineOptions = [ - "--runner=org.apache.beam.runners.portability.testing.TestPortableRunner", -"--jobServerDriver=${config.jobServerDriver}", +"--runner=PortableRunner", Review comment: The change enabled remote execution of portable pipelines. `TestPortableRunner` provides `--jobServerDriver` just for easier testing (otherwise there needs to be a separate jobserver process to run integration tests). Validate xlang test suite already implemented the setup/shutdown script managing external processes so I modified the code to adopt the original `PortableRunner`. I didn't check but OOM shouldn't happen with the new tests here even with in-process job server drivers. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375354) Time Spent: 16h (was: 15h 50m) > Add tests for all runner native transforms and some widely used composite > transforms to cross-language validates runner test suite > -- > > Key: BEAM-7961 > URL: https://issues.apache.org/jira/browse/BEAM-7961 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 16h > Remaining Estimate: 0h > > Add tests for all runner native transforms and some widely used composite > transforms to cross-language validates runner test suite -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-6541) Consider converting bundle & ptransform ids to ints eagerly.
[ https://issues.apache.org/jira/browse/BEAM-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke closed BEAM-6541. -- Fix Version/s: Not applicable Assignee: Robert Burke Resolution: Won't Fix I'm taking a different approach in https://issues.apache.org/jira/browse/BEAM-9167 which better relies on the structure bundles and ptransforms to reduce the overhead. Granted, I'm also using the technique mentioned here, but with hashing the metric names rather than the higher level structs. > Consider converting bundle & ptransform ids to ints eagerly. > > > Key: BEAM-6541 > URL: https://issues.apache.org/jira/browse/BEAM-6541 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Minor > Fix For: Not applicable > > > BundleIDs and PTransformIDs necessary for communicating with the Runner > interface in the go SDK are currently strings, and used as is for metrics > contexts. We use them for getting bundle & ptransform specific metrics, and > transmitting the same. We could instead eagerly assign them a local index > that is then converted out when communicating metrics over the FnAPI, this > would reduce overhead on metric lookups in the various maps. > Note: the same could be done for the user's metric-name, completing the > optimization. Measuring the per-report overhead for tentative/final metric > reporting is required before committing to this approach. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn
[ https://issues.apache.org/jira/browse/BEAM-9168?focusedWorklogId=375353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375353 ] ASF GitHub Bot logged work on BEAM-9168: Author: ASF GitHub Bot Created on: 22/Jan/20 01:20 Start Date: 22/Jan/20 01:20 Worklog Time Spent: 10m Work Description: udim commented on pull request #10652: [BEAM-9168] Temporary fix for RunnerAPIPTransformHolder usage URL: https://github.com/apache/beam/pull/10652#discussion_r369328701 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -1050,7 +1051,9 @@ def is_side_input(tag): for tag, id in proto.outputs.items()} # This annotation is expected by some runners. if proto.spec.urn == common_urns.primitives.PAR_DO.urn: - assert isinstance(result.transform, ParDo) + # TODO(BEAM-9168): Figure out what to do for RunnerAPIPTransformHolder. + assert isinstance(result.transform, (ParDo, RunnerAPIPTransformHolder)),\ Review comment: ```py >>> assert (False, 'blah') :1: SyntaxWarning: assertion is always true, perhaps remove parentheses? ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375353) Time Spent: 1h (was: 50m) > AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with > PAR_DO.urn > - > > Key: BEAM-9168 > URL: https://issues.apache.org/jira/browse/BEAM-9168 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > This is failing on a google-internal test. > Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder. > Failed assertion: > https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=375352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375352 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 22/Jan/20 01:19 Start Date: 22/Jan/20 01:19 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10346: [BEAM-7926] Data-centric Interactive Part2 URL: https://github.com/apache/beam/pull/10346#issuecomment-576963609 looking This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375352) Time Spent: 33h 50m (was: 33h 40m) > Show PCollection with Interactive Beam in a data-centric user flow > -- > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 33h 50m > Remaining Estimate: 0h > > Support auto plotting / charting of materialized data of a given PCollection > with Interactive Beam. > Say an Interactive Beam pipeline defined as > > {code:java} > p = beam.Pipeline(InteractiveRunner()) > pcoll = p | 'Transform' >> transform() > pcoll2 = ... > pcoll3 = ...{code} > The use can call a single function and get auto-magical charting of the data. > e.g., > {code:java} > show(pcoll, pcoll2) > {code} > Throughout the process, a pipeline fragment is built to include only > transforms necessary to produce the desired pcolls (pcoll and pcoll2) and > execute that fragment. > This makes the Interactive Beam user flow data-centric. > > Detailed > [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7516) Add a watermark manager for the fn_api_runner
[ https://issues.apache.org/jira/browse/BEAM-7516?focusedWorklogId=375351=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375351 ] ASF GitHub Bot logged work on BEAM-7516: Author: ASF GitHub Bot Created on: 22/Jan/20 01:16 Start Date: 22/Jan/20 01:16 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10291: [BEAM-7516][BEAM-8823] FnApiRunner works with work queues, and a primitive watermark manager URL: https://github.com/apache/beam/pull/10291#issuecomment-576962848 The failed test in precommit is dataflow_runner related. I'll leave it as is for review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375351) Time Spent: 4h 10m (was: 4h) > Add a watermark manager for the fn_api_runner > - > > Key: BEAM-7516 > URL: https://issues.apache.org/jira/browse/BEAM-7516 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > To track watermarks for each stage -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=375350=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375350 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Jan/20 01:07 Start Date: 22/Jan/20 01:07 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10368: [BEAM-8335] Modify PipelineInstrument to add TestStream for unbounded PCollections URL: https://github.com/apache/beam/pull/10368#discussion_r369325488 ## File path: sdks/python/apache_beam/runners/interactive/pipeline_instrument.py ## @@ -418,6 +423,45 @@ def _replace_with_cached_inputs(self, pipeline): cache, noop. """ +# Find all cached unbounded PCollections. +class CacheableUnboundedPCollectionVisitor(PipelineVisitor): + def __init__(self, pin): +self._pin = pin +self.unbounded_pcolls = set() + + def enter_composite_transform(self, transform_node): +self.visit_transform(transform_node) + + def visit_transform(self, transform_node): +if transform_node.inputs: + for input_pcoll in transform_node.inputs: +key = self._pin.cache_key(input_pcoll) +if (key in self._pin._cached_pcoll_read and +not input_pcoll.is_bounded): + self.unbounded_pcolls.add(key) + +v = CacheableUnboundedPCollectionVisitor(self) +pipeline.visit(v) + +# The set of keys from the cached unbounded PCollections will be used as the +# output tags for the TestStream. This is to remember what cache-key is +# associated with which PCollection. +unbounded_cacheables = v.unbounded_pcolls +output_tags = unbounded_cacheables + +# Take the PCollections that will be read from the TestStream and insert +# them back into the dictionary of cached PCollections. The next step will +# replace the downstream consumer of the non-cached PCollections with these +# PCollections. +if output_tags: + output_pcolls = pipeline | test_stream.TestStream(output_tags=output_tags) + if len(output_tags) == 1: +self._cached_pcoll_read[None] = output_pcolls + else: +for tag, pcoll in output_pcolls.items(): + self._cached_pcoll_read[tag] = pcoll Review comment: nit `self._cached_pcoll_read.update(output_pcolls)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375350) Time Spent: 53h 50m (was: 53h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 53h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=375348=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375348 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Jan/20 01:07 Start Date: 22/Jan/20 01:07 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10368: [BEAM-8335] Modify PipelineInstrument to add TestStream for unbounded PCollections URL: https://github.com/apache/beam/pull/10368#discussion_r369325196 ## File path: sdks/python/apache_beam/runners/interactive/pipeline_instrument.py ## @@ -418,6 +423,45 @@ def _replace_with_cached_inputs(self, pipeline): cache, noop. """ +# Find all cached unbounded PCollections. +class CacheableUnboundedPCollectionVisitor(PipelineVisitor): + def __init__(self, pin): +self._pin = pin +self.unbounded_pcolls = set() + + def enter_composite_transform(self, transform_node): +self.visit_transform(transform_node) + + def visit_transform(self, transform_node): +if transform_node.inputs: + for input_pcoll in transform_node.inputs: +key = self._pin.cache_key(input_pcoll) +if (key in self._pin._cached_pcoll_read and +not input_pcoll.is_bounded): + self.unbounded_pcolls.add(key) + +v = CacheableUnboundedPCollectionVisitor(self) +pipeline.visit(v) + +# The set of keys from the cached unbounded PCollections will be used as the +# output tags for the TestStream. This is to remember what cache-key is +# associated with which PCollection. +unbounded_cacheables = v.unbounded_pcolls Review comment: `unbounded_cacheables` seems like a superfluous variable? It's not used anywhere else? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375348) Time Spent: 53h 40m (was: 53.5h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 53h 40m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=375349=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375349 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Jan/20 01:07 Start Date: 22/Jan/20 01:07 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10368: [BEAM-8335] Modify PipelineInstrument to add TestStream for unbounded PCollections URL: https://github.com/apache/beam/pull/10368#discussion_r369322292 ## File path: sdks/python/apache_beam/testing/test_stream.py ## @@ -172,13 +172,14 @@ class TestStream(PTransform): output. """ - def __init__(self, coder=coders.FastPrimitivesCoder(), events=None): + def __init__(self, coder=coders.FastPrimitivesCoder(), events=None, + output_tags=None): super(TestStream, self).__init__() assert coder is not None self.coder = coder self.watermarks = {None: timestamp.MIN_TIMESTAMP} -self._events = [] if events is None else list(events) -self.output_tags = set() +self._events = list(events) if events is not None else [] +self.output_tags = set(output_tags) if output_tags is not None else set() Review comment: You can do `list(var) if var else list()` and `set(var) if var else set()` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375349) Time Spent: 53h 50m (was: 53h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 53h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020694#comment-17020694 ] Yueyang Qiu commented on BEAM-9169: --- Sounds good. Thanks for helping! > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Kirill Kozlov >Priority: Minor > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} > From what I can see now this character is introduced during the Calcite > unparsing step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3419) Enable iterable side input for beam runners.
[ https://issues.apache.org/jira/browse/BEAM-3419?focusedWorklogId=375346=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375346 ] ASF GitHub Bot logged work on BEAM-3419: Author: ASF GitHub Bot Created on: 22/Jan/20 00:56 Start Date: 22/Jan/20 00:56 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10648: [BEAM-3419] Support iterable on Dataflow runner when using the unified worker. URL: https://github.com/apache/beam/pull/10648#issuecomment-576958103 R: @tvalentyn @ananvay @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375346) Time Spent: 4h 20m (was: 4h 10m) > Enable iterable side input for beam runners. > > > Key: BEAM-3419 > URL: https://issues.apache.org/jira/browse/BEAM-3419 > Project: Beam > Issue Type: Improvement > Components: runner-core >Reporter: Robert Bradshaw >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn
[ https://issues.apache.org/jira/browse/BEAM-9168?focusedWorklogId=375345=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375345 ] ASF GitHub Bot logged work on BEAM-9168: Author: ASF GitHub Bot Created on: 22/Jan/20 00:53 Start Date: 22/Jan/20 00:53 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10652: [BEAM-9168] Temporary fix for RunnerAPIPTransformHolder usage URL: https://github.com/apache/beam/pull/10652#discussion_r369322255 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -1050,7 +1051,9 @@ def is_side_input(tag): for tag, id in proto.outputs.items()} # This annotation is expected by some runners. if proto.spec.urn == common_urns.primitives.PAR_DO.urn: - assert isinstance(result.transform, ParDo) + # TODO(BEAM-9168): Figure out what to do for RunnerAPIPTransformHolder. + assert isinstance(result.transform, (ParDo, RunnerAPIPTransformHolder)),\ Review comment: nit: use paranthesis instead of \ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375345) Time Spent: 50m (was: 40m) > AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with > PAR_DO.urn > - > > Key: BEAM-9168 > URL: https://issues.apache.org/jira/browse/BEAM-9168 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > This is failing on a google-internal test. > Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder. > Failed assertion: > https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn
[ https://issues.apache.org/jira/browse/BEAM-9168?focusedWorklogId=375344=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375344 ] ASF GitHub Bot logged work on BEAM-9168: Author: ASF GitHub Bot Created on: 22/Jan/20 00:52 Start Date: 22/Jan/20 00:52 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10652: [BEAM-9168] Temporary fix for RunnerAPIPTransformHolder usage URL: https://github.com/apache/beam/pull/10652#issuecomment-576957411 LGTM. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375344) Time Spent: 40m (was: 0.5h) > AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with > PAR_DO.urn > - > > Key: BEAM-9168 > URL: https://issues.apache.org/jira/browse/BEAM-9168 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > This is failing on a google-internal test. > Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder. > Failed assertion: > https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn
[ https://issues.apache.org/jira/browse/BEAM-9168?focusedWorklogId=375342=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375342 ] ASF GitHub Bot logged work on BEAM-9168: Author: ASF GitHub Bot Created on: 22/Jan/20 00:49 Start Date: 22/Jan/20 00:49 Worklog Time Spent: 10m Work Description: udim commented on issue #10652: [BEAM-9168] Temporary fix for RunnerAPIPTransformHolder usage URL: https://github.com/apache/beam/pull/10652#issuecomment-576956473 R: @aaltay This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375342) Time Spent: 0.5h (was: 20m) > AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with > PAR_DO.urn > - > > Key: BEAM-9168 > URL: https://issues.apache.org/jira/browse/BEAM-9168 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > This is failing on a google-internal test. > Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder. > Failed assertion: > https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn
[ https://issues.apache.org/jira/browse/BEAM-9168?focusedWorklogId=375340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375340 ] ASF GitHub Bot logged work on BEAM-9168: Author: ASF GitHub Bot Created on: 22/Jan/20 00:41 Start Date: 22/Jan/20 00:41 Worklog Time Spent: 10m Work Description: udim commented on issue #10652: [BEAM-9168] Temporary fix for RunnerAPIPTransformHolder usage URL: https://github.com/apache/beam/pull/10652#issuecomment-576954653 R: @chamikaramj This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375340) Time Spent: 20m (was: 10m) > AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with > PAR_DO.urn > - > > Key: BEAM-9168 > URL: https://issues.apache.org/jira/browse/BEAM-9168 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > This is failing on a google-internal test. > Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder. > Failed assertion: > https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020690#comment-17020690 ] Kirill Kozlov commented on BEAM-9169: - Yes, we might be able to fix this by specifying what needs escaping and what does not (or using a different escaping method). Not sure how to achieve that, but I will look into it. > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Kirill Kozlov >Priority: Minor > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} > From what I can see now this character is introduced during the Calcite > unparsing step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn
[ https://issues.apache.org/jira/browse/BEAM-9168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020689#comment-17020689 ] Udi Meiri commented on BEAM-9168: - Note that the output_tags attribute is required here: https://github.com/apache/beam/blob/6a6adc8433deff10a5594bbf77cc9148ce0a951a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L856 > AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with > PAR_DO.urn > - > > Key: BEAM-9168 > URL: https://issues.apache.org/jira/browse/BEAM-9168 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > This is failing on a google-internal test. > Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder. > Failed assertion: > https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=375338=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375338 ] ASF GitHub Bot logged work on BEAM-8889: Author: ASF GitHub Bot Created on: 22/Jan/20 00:35 Start Date: 22/Jan/20 00:35 Worklog Time Spent: 10m Work Description: veblush commented on pull request #10617: [BEAM-8889] adding gRPC connectivity to Beam/GCS connector URL: https://github.com/apache/beam/pull/10617#discussion_r369317768 ## File path: sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java ## @@ -427,22 +440,14 @@ public WritableByteChannel create(GcsPath path, String type) throws IOException */ public WritableByteChannel create(GcsPath path, String type, Integer uploadBufferSizeBytes) throws IOException { -GoogleCloudStorageWriteChannel channel = -new GoogleCloudStorageWriteChannel( -executorService, -storageClient, -new ClientRequestHelper<>(), -path.getBucket(), -path.getObject(), -type, -/* kmsKeyName= */ null, -AsyncWriteChannelOptions.newBuilder().build(), -new ObjectWriteConditions(), -Collections.emptyMap()); +WritableByteChannel channel = getCloudStorage().create(new StorageResourceId(path.getBucket())); if (uploadBufferSizeBytes != null) { - channel.setUploadBufferSize(uploadBufferSizeBytes); + if (channel instanceof GoogleCloudStorageWriteChannel) { Review comment: This seems to be replaced with using `AsyncWriteChannelOptions.setUploadChunkSize` in `GoogleCloudStorageOptions.WriteChannelOptions`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375338) Remaining Estimate: 167h (was: 167h 10m) Time Spent: 1h (was: 50m) > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: Major > Labels: gcs > Original Estimate: 168h > Time Spent: 1h > Remaining Estimate: 167h > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn
[ https://issues.apache.org/jira/browse/BEAM-9168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020687#comment-17020687 ] Udi Meiri commented on BEAM-9168: - Please add appropriate test cases that cover this assert. > AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with > PAR_DO.urn > - > > Key: BEAM-9168 > URL: https://issues.apache.org/jira/browse/BEAM-9168 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > This is failing on a google-internal test. > Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder. > Failed assertion: > https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn
[ https://issues.apache.org/jira/browse/BEAM-9168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020686#comment-17020686 ] Udi Meiri commented on BEAM-9168: - Possible options (I'm not that familiar with the semantics): 1. When replacing a ParDo with RunnerAPIPTransformHolder, change the URN as well (add a new one?). 2. Make a special-case RunnerAPIPTransformHolder class that also inherits from ParDo, or make ParDo and RunnerAPIPTransformHolder inherit from the same base class and assert isinstance on that base. 3. Explicitly handle cases in the code where RunnerAPIPTransformHolder may come instead of ParDo. > AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with > PAR_DO.urn > - > > Key: BEAM-9168 > URL: https://issues.apache.org/jira/browse/BEAM-9168 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > This is failing on a google-internal test. > Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder. > Failed assertion: > https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7861) Make it easy to change between multi-process and multi-thread mode for Python Direct runners
[ https://issues.apache.org/jira/browse/BEAM-7861?focusedWorklogId=375335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375335 ] ASF GitHub Bot logged work on BEAM-7861: Author: ASF GitHub Bot Created on: 22/Jan/20 00:33 Start Date: 22/Jan/20 00:33 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #10616: [BEAM-7861] update documentation about --direct_running_mode option with direct runner. URL: https://github.com/apache/beam/pull/10616 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375335) Time Spent: 4h 50m (was: 4h 40m) > Make it easy to change between multi-process and multi-thread mode for Python > Direct runners > > > Key: BEAM-7861 > URL: https://issues.apache.org/jira/browse/BEAM-7861 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.19.0 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > BEAM-3645 makes it possible to run a map task parallel. > However, users need to change runner when switch between multithreading and > multiprocessing mode. > We want to add a flag (ex: --use-multiprocess) to make the switch easy > without changing the runner each time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8626) Implement status api handler in python sdk harness
[ https://issues.apache.org/jira/browse/BEAM-8626?focusedWorklogId=375332=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375332 ] ASF GitHub Bot logged work on BEAM-8626: Author: ASF GitHub Bot Created on: 22/Jan/20 00:32 Start Date: 22/Jan/20 00:32 Worklog Time Spent: 10m Work Description: angoenka commented on issue #10598: [BEAM-8626] Implement status fn api handler in python sdk URL: https://github.com/apache/beam/pull/10598#issuecomment-576952415 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375332) Time Spent: 4h 50m (was: 4h 40m) > Implement status api handler in python sdk harness > -- > > Key: BEAM-8626 > URL: https://issues.apache.org/jira/browse/BEAM-8626 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7861) Make it easy to change between multi-process and multi-thread mode for Python Direct runners
[ https://issues.apache.org/jira/browse/BEAM-7861?focusedWorklogId=375334=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375334 ] ASF GitHub Bot logged work on BEAM-7861: Author: ASF GitHub Bot Created on: 22/Jan/20 00:32 Start Date: 22/Jan/20 00:32 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10616: [BEAM-7861] update documentation about --direct_running_mode option with direct runner. URL: https://github.com/apache/beam/pull/10616#issuecomment-576952554 LGTM, thank you! In the future, please don't squash reviewed and unreviewed commits before a review has finalized, see: https://beam.apache.org/contribute/#make-reviewers-job-easier. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375334) Time Spent: 4h 40m (was: 4.5h) > Make it easy to change between multi-process and multi-thread mode for Python > Direct runners > > > Key: BEAM-7861 > URL: https://issues.apache.org/jira/browse/BEAM-7861 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.19.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > BEAM-3645 makes it possible to run a map task parallel. > However, users need to change runner when switch between multithreading and > multiprocessing mode. > We want to add a flag (ex: --use-multiprocess) to make the switch easy > without changing the runner each time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8626) Implement status api handler in python sdk harness
[ https://issues.apache.org/jira/browse/BEAM-8626?focusedWorklogId=375329=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375329 ] ASF GitHub Bot logged work on BEAM-8626: Author: ASF GitHub Bot Created on: 22/Jan/20 00:31 Start Date: 22/Jan/20 00:31 Worklog Time Spent: 10m Work Description: angoenka commented on issue #10598: [BEAM-8626] Implement status fn api handler in python sdk URL: https://github.com/apache/beam/pull/10598#issuecomment-576952149 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375329) Time Spent: 4.5h (was: 4h 20m) > Implement status api handler in python sdk harness > -- > > Key: BEAM-8626 > URL: https://issues.apache.org/jira/browse/BEAM-8626 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8626) Implement status api handler in python sdk harness
[ https://issues.apache.org/jira/browse/BEAM-8626?focusedWorklogId=375331=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375331 ] ASF GitHub Bot logged work on BEAM-8626: Author: ASF GitHub Bot Created on: 22/Jan/20 00:31 Start Date: 22/Jan/20 00:31 Worklog Time Spent: 10m Work Description: angoenka commented on issue #10598: [BEAM-8626] Implement status fn api handler in python sdk URL: https://github.com/apache/beam/pull/10598#issuecomment-576952309 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375331) Time Spent: 4h 40m (was: 4.5h) > Implement status api handler in python sdk harness > -- > > Key: BEAM-8626 > URL: https://issues.apache.org/jira/browse/BEAM-8626 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020681#comment-17020681 ] Kirill Kozlov edited comment on BEAM-9169 at 1/22/20 12:30 AM: --- [~robinyqiu] This might be the cause: [https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52] Edit: This was added to insure that strings with special characters (like: ' {noformat} SELECT 'abc\\n'{noformat} ') get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes characters ZetaSQL does not expect to be escaped. was (Author: kirillkozlov): [~robinyqiu] This might be the cause: [https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52] Edit: This was added to insure that strings with special characters (like: 'abc\\n') get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes characters ZetaSQL does not expect to be escaped. > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Kirill Kozlov >Priority: Minor > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} > From what I can see now this character is introduced during the Calcite > unparsing step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020681#comment-17020681 ] Kirill Kozlov edited comment on BEAM-9169 at 1/22/20 12:30 AM: --- [~robinyqiu] This might be the cause: [https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52] Edit: This was added to insure that strings with special characters (like: {noformat} SELECT 'abc\\n'{noformat} ) get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes characters ZetaSQL does not expect to be escaped. was (Author: kirillkozlov): [~robinyqiu] This might be the cause: [https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52] Edit: This was added to insure that strings with special characters (like: ' {noformat} SELECT 'abc\\n'{noformat} ') get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes characters ZetaSQL does not expect to be escaped. > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Kirill Kozlov >Priority: Minor > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} > From what I can see now this character is introduced during the Calcite > unparsing step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020684#comment-17020684 ] Yueyang Qiu edited comment on BEAM-9169 at 1/22/20 12:30 AM: - Thanks for the quick reply. This seems to be a bug in Calcite util functions. Can we have a work around for it in Beam? I am currently assigning this bug to you since you have more context on the issue. was (Author: robinyqiu): Thanks for the quick reply. This seems to be a bug in Calcite util functions. Can we have a work around for it in Beam? I am currently assigning this bug to you since you have more context on the issue. > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Kirill Kozlov >Priority: Minor > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} > From what I can see now this character is introduced during the Calcite > unparsing step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yueyang Qiu reassigned BEAM-9169: - Assignee: Kirill Kozlov (was: Yueyang Qiu) > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Kirill Kozlov >Priority: Minor > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} > From what I can see now this character is introduced during the Calcite > unparsing step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020684#comment-17020684 ] Yueyang Qiu commented on BEAM-9169: --- Thanks for the quick reply. This seems to be a bug in Calcite util functions. Can we have a work around for it in Beam? I am currently assigning this bug to you since you have more context on the issue. > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Minor > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} > From what I can see now this character is introduced during the Calcite > unparsing step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020681#comment-17020681 ] Kirill Kozlov edited comment on BEAM-9169 at 1/22/20 12:28 AM: --- [~robinyqiu] This might be the cause: [https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52] Edit: This was added to insure that strings with special characters (like: 'abc\\n') get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes characters ZetaSQL does not expect to be escaped. was (Author: kirillkozlov): [~robinyqiu] This might be the cause: [https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52] > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Minor > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} > From what I can see now this character is introduced during the Calcite > unparsing step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9149) Support ZetaSQL positional parameters
[ https://issues.apache.org/jira/browse/BEAM-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-9149: -- Description: While they are not yet exposed to the end user, ZetaSQL query parameters are currently being passed internally. However, the existing code assumes that all parameters are named parameters, not positional parameters. To support positional parameters, we will need to make at least the following changes: 1) Set mode to PARAMETER_POSITIONAL and use addPositionalQueryParameter instead of addQueryParameter in SqlAnalyzer: https://github.com/apache/beam/blob/671b02ac5f1be87a591de8f5f456d0e5a199d771/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java#L119 2) Code currently takes a Map everywhere parameters are provided. This is not suitable for positional parameters, which are better represented as an ordered collection such as a list. was: While they are not yet exposed to the end user, ZetaSQL query parameters are currently being passed internally. However, the existing code assumes that all parameters are named parameters, not positional parameters. To support positional parameters, we will need to make at least the following changes: 1) Set mode to PARAMETER_POSITIONAL and use addPositionalQueryParameter instead of addQueryParameter in SqlAnalyzer: https://github.com/apache/beam/blob/671b02ac5f1be87a591de8f5f456d0e5a199d771/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java#L119 2) Code currently assumes that resolved parameters are named. While even positional parameters must be named when they are used as inputs, after they are resolved their names are removed. Thus this check will deref a null pointer and must be fixed: https://github.com/apache/beam/blob/8915d6e95c405aeee0f29152545d3210e8e09f1f/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ExpressionConverter.java#L1004 > Support ZetaSQL positional parameters > - > > Key: BEAM-9149 > URL: https://issues.apache.org/jira/browse/BEAM-9149 > Project: Beam > Issue Type: New Feature > Components: dsl-sql-zetasql >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > > While they are not yet exposed to the end user, ZetaSQL query parameters are > currently being passed internally. However, the existing code assumes that > all parameters are named parameters, not positional parameters. To support > positional parameters, we will need to make at least the following changes: > 1) Set mode to PARAMETER_POSITIONAL and use addPositionalQueryParameter > instead of addQueryParameter in SqlAnalyzer: > https://github.com/apache/beam/blob/671b02ac5f1be87a591de8f5f456d0e5a199d771/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java#L119 > 2) Code currently takes a Map everywhere parameters are > provided. This is not suitable for positional parameters, which are better > represented as an ordered collection such as a list. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn
[ https://issues.apache.org/jira/browse/BEAM-9168?focusedWorklogId=375326=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375326 ] ASF GitHub Bot logged work on BEAM-9168: Author: ASF GitHub Bot Created on: 22/Jan/20 00:28 Start Date: 22/Jan/20 00:28 Worklog Time Spent: 10m Work Description: udim commented on pull request #10652: [BEAM-9168] Temporary fix for RunnerAPIPTransformHolder usage URL: https://github.com/apache/beam/pull/10652 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
[jira] [Comment Edited] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020681#comment-17020681 ] Kirill Kozlov edited comment on BEAM-9169 at 1/22/20 12:28 AM: --- [~robinyqiu] This might be the cause: [https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52] Edit: This was added to insure that strings with special characters (like: 'abc\\n') get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes characters ZetaSQL does not expect to be escaped. was (Author: kirillkozlov): [~robinyqiu] This might be the cause: [https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52] Edit: This was added to insure that strings with special characters (like: 'abc\\n') get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes characters ZetaSQL does not expect to be escaped. > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Minor > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} > From what I can see now this character is introduced during the Calcite > unparsing step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-6550) ParDo Async Java API
[ https://issues.apache.org/jira/browse/BEAM-6550?focusedWorklogId=375325=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375325 ] ASF GitHub Bot logged work on BEAM-6550: Author: ASF GitHub Bot Created on: 22/Jan/20 00:26 Start Date: 22/Jan/20 00:26 Worklog Time Spent: 10m Work Description: mynameborat commented on pull request #10651: [BEAM-6550] ParDo Async Java API URL: https://github.com/apache/beam/pull/10651 * Introduced support for async Java ParDo * Added direct runner support * Unit tests and sample Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
[jira] [Commented] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020681#comment-17020681 ] Kirill Kozlov commented on BEAM-9169: - [~robinyqiu] This might be the cause: [https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52] > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Minor > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} > From what I can see now this character is introduced during the Calcite > unparsing step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yueyang Qiu updated BEAM-9169: -- Description: When I am testing query string {code:java} "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", \"America/Los_Angeles\")"{code} on BeamZetaSqlCalcRel I found that the second string parameter to the function is unparsed to {code:java} America\/Los_Angeles{code} (note an extra backslash character is added). This breaks the ZetaSQL evaluator with error {code:java} Syntax error: Illegal escape sequence: \/{code} >From what I can see now this character is introduced during the Calcite >unparsing step. was: When I am testing query string {code:java} "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", \"America/Los_Angeles\")"{code} on BeamZetaSqlCalcRel I found that the second string parameter to the function is unparsed to {code:java} America\/Los_Angeles{code} (note an extra backslash character is added). This breaks the ZetaSQL evaluator with error {code:java} Syntax error: Illegal escape sequence: \/{code} > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Minor > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} > From what I can see now this character is introduced during the Calcite > unparsing step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020677#comment-17020677 ] Yueyang Qiu commented on BEAM-9169: --- [~kirillkozlov] [~apilloud] do you have any idea why this happend? > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Minor > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yueyang Qiu updated BEAM-9169: -- Description: When I am testing query string {code:java} "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", \"America/Los_Angeles\")"{code} on BeamZetaSqlCalcRel I found that the second string parameter to the function is unparsed to {code:java} America\/Los_Angeles{code} (note an extra backslash character is added). This breaks the ZetaSQL evaluator with error {code:java} Syntax error: Illegal escape sequence: \/{code} was: When I am testing query string {code:java} "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", \"America/Los_Angeles\")"{code} on BeamZetaSqlCalcRel I found that the second string parameter to the function is unparsed to {code:java} America\/Los_Angeles{code} (note an extra backslash character is added). This breaks the ZetaSQL evaluator with error {code:java} Syntax error: Illegal escape sequence: \\{code} > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Minor > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yueyang Qiu updated BEAM-9169: -- Description: When I am testing query string {code:java} "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", \"America/Los_Angeles\")"{code} on BeamZetaSqlCalcRel I found that the second string parameter to the function is unparsed to {code:java} America\/Los_Angeles{code} (note an extra backslash character is added). This breaks the ZetaSQL evaluator with error {code:java} Syntax error: Illegal escape sequence: \\{code} was: When I am testing query string {code:java} "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", \"America/Los_Angeles\")"{code} on BeamZetaSqlCalcRel I found that the second string parameter to the function is unparsed to {code:java} America\/Los_Angeles{code} (note an extra backslash character is added). This breaks the ZetaSQL evaluator with error {code:java} Syntax error: Illegal escape sequence: \{code} > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Minor > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \\{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yueyang Qiu updated BEAM-9169: -- Description: When I am testing query string {code:java} "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", \"America/Los_Angeles\")"{code} on BeamZetaSqlCalcRel I found that the second string parameter to the function is unparsed to {code:java} America\/Los_Angeles{code} (note an extra backslash character is added). This breaks the ZetaSQL evaluator with error {code:java} Syntax error: Illegal escape sequence: \{code} was: When I am testing query `SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", \"America/Los_Angeles\")` on `BeamZetaSqlCalcRel` I found that the second string parameter to the function is unparsed to `America\/Los_Angeles` (note an extra backslash character is added). This breaks the ZetaSQL evaluator with error `Syntax error: Illegal escape sequence: \` > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Minor > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9169) Extra character introduced during Calcite unparsing
Yueyang Qiu created BEAM-9169: - Summary: Extra character introduced during Calcite unparsing Key: BEAM-9169 URL: https://issues.apache.org/jira/browse/BEAM-9169 Project: Beam Issue Type: Improvement Components: dsl-sql-zetasql Reporter: Yueyang Qiu Assignee: Yueyang Qiu When I am testing query `SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", \"America/Los_Angeles\")` on `BeamZetaSqlCalcRel` I found that the second string parameter to the function is unparsed to `America\/Los_Angeles` (note an extra backslash character is added). This breaks the ZetaSQL evaluator with error `Syntax error: Illegal escape sequence: \` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn
Udi Meiri created BEAM-9168: --- Summary: AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn Key: BEAM-9168 URL: https://issues.apache.org/jira/browse/BEAM-9168 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Udi Meiri Assignee: Chamikara Madhusanka Jayalath This is failing on a google-internal test. Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder. Failed assertion: https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9168) AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with PAR_DO.urn
[ https://issues.apache.org/jira/browse/BEAM-9168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri updated BEAM-9168: Status: Open (was: Triage Needed) > AppliedPTransform.from_runner_api fails on unexpected non-ParDo class with > PAR_DO.urn > - > > Key: BEAM-9168 > URL: https://issues.apache.org/jira/browse/BEAM-9168 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > > This is failing on a google-internal test. > Unexpected class is apache_beam.transforms.core.RunnerAPIPTransformHolder. > Failed assertion: > https://github.com/apache/beam/blob/a59f897a64b0006ef3fcbe5a750d5f46499cfe61/sdks/python/apache_beam/pipeline.py#L1052-L1053 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8042) Parsing of aggregate query fails
[ https://issues.apache.org/jira/browse/BEAM-8042?focusedWorklogId=375319=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375319 ] ASF GitHub Bot logged work on BEAM-8042: Author: ASF GitHub Bot Created on: 22/Jan/20 00:13 Start Date: 22/Jan/20 00:13 Worklog Time Spent: 10m Work Description: angoenka commented on issue #10649: [BEAM-8042] [ZetaSQL] Fix aggregate column reference URL: https://github.com/apache/beam/pull/10649#issuecomment-576947355 Run SQL postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375319) Time Spent: 1h 40m (was: 1.5h) > Parsing of aggregate query fails > > > Key: BEAM-8042 > URL: https://issues.apache.org/jira/browse/BEAM-8042 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Kirill Kozlov >Priority: Critical > Time Spent: 1h 40m > Remaining Estimate: 0h > > {code} > @Rule > public TestPipeline pipeline = > TestPipeline.fromOptions(createPipelineOptions()); > private static PipelineOptions createPipelineOptions() { > BeamSqlPipelineOptions opts = > PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class); > opts.setPlannerName(ZetaSQLQueryPlanner.class.getName()); > return opts; > } > @Test > public void testAggregate() { > Schema inputSchema = Schema.builder() > .addByteArrayField("id") > .addInt64Field("has_f1") > .addInt64Field("has_f2") > .addInt64Field("has_f3") > .addInt64Field("has_f4") > .addInt64Field("has_f5") > .addInt64Field("has_f6") > .build(); > String sql = "SELECT \n" + > " id, \n" + > " COUNT(*) as count, \n" + > " SUM(has_f1) as f1_count, \n" + > " SUM(has_f2) as f2_count, \n" + > " SUM(has_f3) as f3_count, \n" + > " SUM(has_f4) as f4_count, \n" + > " SUM(has_f5) as f5_count, \n" + > " SUM(has_f6) as f6_count \n" + > "FROM PCOLLECTION \n" + > "GROUP BY id"; > pipeline > .apply(Create.empty(inputSchema)) > .apply(SqlTransform.query(sql)); > pipeline.run(); > } > {code} > {code} > Caused by: java.lang.RuntimeException: Error while applying rule > AggregateProjectMergeRule, args > [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)), > > rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)] > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637) > at > org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104) > at > ... 39 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205) > ... 48 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8042) Parsing of aggregate query fails
[ https://issues.apache.org/jira/browse/BEAM-8042?focusedWorklogId=375317=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375317 ] ASF GitHub Bot logged work on BEAM-8042: Author: ASF GitHub Bot Created on: 22/Jan/20 00:12 Start Date: 22/Jan/20 00:12 Worklog Time Spent: 10m Work Description: angoenka commented on issue #10649: [BEAM-8042] [ZetaSQL] Fix aggregate column reference URL: https://github.com/apache/beam/pull/10649#issuecomment-576947174 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375317) Time Spent: 1.5h (was: 1h 20m) > Parsing of aggregate query fails > > > Key: BEAM-8042 > URL: https://issues.apache.org/jira/browse/BEAM-8042 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Kirill Kozlov >Priority: Critical > Time Spent: 1.5h > Remaining Estimate: 0h > > {code} > @Rule > public TestPipeline pipeline = > TestPipeline.fromOptions(createPipelineOptions()); > private static PipelineOptions createPipelineOptions() { > BeamSqlPipelineOptions opts = > PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class); > opts.setPlannerName(ZetaSQLQueryPlanner.class.getName()); > return opts; > } > @Test > public void testAggregate() { > Schema inputSchema = Schema.builder() > .addByteArrayField("id") > .addInt64Field("has_f1") > .addInt64Field("has_f2") > .addInt64Field("has_f3") > .addInt64Field("has_f4") > .addInt64Field("has_f5") > .addInt64Field("has_f6") > .build(); > String sql = "SELECT \n" + > " id, \n" + > " COUNT(*) as count, \n" + > " SUM(has_f1) as f1_count, \n" + > " SUM(has_f2) as f2_count, \n" + > " SUM(has_f3) as f3_count, \n" + > " SUM(has_f4) as f4_count, \n" + > " SUM(has_f5) as f5_count, \n" + > " SUM(has_f6) as f6_count \n" + > "FROM PCOLLECTION \n" + > "GROUP BY id"; > pipeline > .apply(Create.empty(inputSchema)) > .apply(SqlTransform.query(sql)); > pipeline.run(); > } > {code} > {code} > Caused by: java.lang.RuntimeException: Error while applying rule > AggregateProjectMergeRule, args > [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)), > > rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)] > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637) > at > org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104) > at > ... 39 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205) > ... 48 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-8042) Parsing of aggregate query fails
[ https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-8042 started by Kirill Kozlov. --- > Parsing of aggregate query fails > > > Key: BEAM-8042 > URL: https://issues.apache.org/jira/browse/BEAM-8042 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Kirill Kozlov >Priority: Critical > Time Spent: 1h 20m > Remaining Estimate: 0h > > {code} > @Rule > public TestPipeline pipeline = > TestPipeline.fromOptions(createPipelineOptions()); > private static PipelineOptions createPipelineOptions() { > BeamSqlPipelineOptions opts = > PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class); > opts.setPlannerName(ZetaSQLQueryPlanner.class.getName()); > return opts; > } > @Test > public void testAggregate() { > Schema inputSchema = Schema.builder() > .addByteArrayField("id") > .addInt64Field("has_f1") > .addInt64Field("has_f2") > .addInt64Field("has_f3") > .addInt64Field("has_f4") > .addInt64Field("has_f5") > .addInt64Field("has_f6") > .build(); > String sql = "SELECT \n" + > " id, \n" + > " COUNT(*) as count, \n" + > " SUM(has_f1) as f1_count, \n" + > " SUM(has_f2) as f2_count, \n" + > " SUM(has_f3) as f3_count, \n" + > " SUM(has_f4) as f4_count, \n" + > " SUM(has_f5) as f5_count, \n" + > " SUM(has_f6) as f6_count \n" + > "FROM PCOLLECTION \n" + > "GROUP BY id"; > pipeline > .apply(Create.empty(inputSchema)) > .apply(SqlTransform.query(sql)); > pipeline.run(); > } > {code} > {code} > Caused by: java.lang.RuntimeException: Error while applying rule > AggregateProjectMergeRule, args > [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)), > > rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)] > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637) > at > org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104) > at > ... 39 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205) > ... 48 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9151) Dataflow legacy worker tests are mis-configured
[ https://issues.apache.org/jira/browse/BEAM-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyuan Zhang resolved BEAM-9151. Resolution: Fixed > Dataflow legacy worker tests are mis-configured > --- > > Key: BEAM-9151 > URL: https://issues.apache.org/jira/browse/BEAM-9151 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.19.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Please refer to the last comment of https://github.com/apache/beam/pull/8183 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9151) Dataflow legacy worker tests are mis-configured
[ https://issues.apache.org/jira/browse/BEAM-9151?focusedWorklogId=375315=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375315 ] ASF GitHub Bot logged work on BEAM-9151: Author: ASF GitHub Bot Created on: 22/Jan/20 00:05 Start Date: 22/Jan/20 00:05 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #10647: [BEAM-9151] Cherry-pick: Fix misconfigured legacy dataflow tests URL: https://github.com/apache/beam/pull/10647 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375315) Time Spent: 1h 10m (was: 1h) > Dataflow legacy worker tests are mis-configured > --- > > Key: BEAM-9151 > URL: https://issues.apache.org/jira/browse/BEAM-9151 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.19.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Please refer to the last comment of https://github.com/apache/beam/pull/8183 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9167) Reduce overhead of Go SDK side metrics
[ https://issues.apache.org/jira/browse/BEAM-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9167: --- Description: Locking overhead due to the global store and local caches of SDK counter data can dominate certain workloads, which means we can do better. Instead of having a global store of metrics data to extract counters, we should use per ptransform (or per bundle) counter sets, which would avoid requiring locking per counter operation. The main detriment compared to the current implementation is that a user would need to add their own locking if they were to spawn multiple goroutines to process a Bundle's work in a DoFn. Given that self multithreaded DoFns aren't recommended/safe in Java, largely impossible in Python, and the other beam Go SDK provided constructs (like Iterators and Emitters) are not thread safe, this is a small concern, provided the documentation is clear on this. Removing the locking and switching to atomic ops reduces the overhead significantly in example jobs and in the benchmarks. A second part of this change should be to move the exec package to manage it's own per bundle state, rather than relying on a global datastore to extract the per bundle,per ptransform values. Related: https://issues.apache.org/jira/browse/BEAM-6541 was: Locking overhead due to the global store and local caches of SDK counter data can dominate certain workloads, which means we can do better. Instead of having a global store of metrics data to extract counters, we should use per ptransform (or per bundle) counter sets, which would avoid requiring locking per counter operation. The main detriment compared to the current implementation is that a user would need to add their own locking if they were to spawn multiple goroutines to process a Bundle's work in a DoFn. Given that self multithreaded DoFns aren't recommended/safe in Java, largely impossible in Python, and the other beam Go SDK provided constructs (like Iterators and Emitters) are not thread safe, this is a small concern, provided the documentation is clear on this. Removing the locking and switching to atomic ops reduces the overhead significantly in example jobs and in the benchmarks. Related: https://issues.apache.org/jira/browse/BEAM-6541 > Reduce overhead of Go SDK side metrics > -- > > Key: BEAM-9167 > URL: https://issues.apache.org/jira/browse/BEAM-9167 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > > Locking overhead due to the global store and local caches of SDK counter data > can dominate certain workloads, which means we can do better. > Instead of having a global store of metrics data to extract counters, we > should use per ptransform (or per bundle) counter sets, which would avoid > requiring locking per counter operation. The main detriment compared to the > current implementation is that a user would need to add their own locking if > they were to spawn multiple goroutines to process a Bundle's work in a DoFn. > Given that self multithreaded DoFns aren't recommended/safe in Java, largely > impossible in Python, and the other beam Go SDK provided constructs (like > Iterators and Emitters) are not thread safe, this is a small concern, > provided the documentation is clear on this. > Removing the locking and switching to atomic ops reduces the overhead > significantly in example jobs and in the benchmarks. > A second part of this change should be to move the exec package to manage > it's own per bundle state, rather than relying on a global datastore to > extract the per bundle,per ptransform values. > Related: https://issues.apache.org/jira/browse/BEAM-6541 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9167) Reduce overhead of Go SDK side metrics
[ https://issues.apache.org/jira/browse/BEAM-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-9167: -- Assignee: Robert Burke > Reduce overhead of Go SDK side metrics > -- > > Key: BEAM-9167 > URL: https://issues.apache.org/jira/browse/BEAM-9167 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > > Locking overhead due to the global store and local caches of SDK counter data > can dominate certain workloads, which means we can do better. > Instead of having a global store of metrics data to extract counters, we > should use per ptransform (or per bundle) counter sets, which would avoid > requiring locking per counter operation. The main detriment compared to the > current implementation is that a user would need to add their own locking if > they were to spawn multiple goroutines to process a Bundle's work in a DoFn. > Given that self multithreaded DoFns aren't recommended/safe in Java, largely > impossible in Python, and the other beam Go SDK provided constructs (like > Iterators and Emitters) are not thread safe, this is a small concern, > provided the documentation is clear on this. > Removing the locking and switching to atomic ops reduces the overhead > significantly in example jobs and in the benchmarks. > Related: https://issues.apache.org/jira/browse/BEAM-6541 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9072) [SQL] Add support for Datastore source
[ https://issues.apache.org/jira/browse/BEAM-9072?focusedWorklogId=375303=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375303 ] ASF GitHub Bot logged work on BEAM-9072: Author: ASF GitHub Bot Created on: 21/Jan/20 23:46 Start Date: 21/Jan/20 23:46 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #10440: [BEAM-9072] [SQL] DataStoreV1 IO connector URL: https://github.com/apache/beam/pull/10440#issuecomment-576940136 Run JavaBeamZetaSQL PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375303) Time Spent: 6h 10m (was: 6h) > [SQL] Add support for Datastore source > -- > > Key: BEAM-9072 > URL: https://issues.apache.org/jira/browse/BEAM-9072 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > * Create a Datastore table and table provider > * Conversion between Datastore and Beam data types > * Implement buildIOReader > * Implement buildIOWrite > * Implement getTableStatistics > Doc: > [https://docs.google.com/document/d/1FxuEGewJ3GPDl0IKglfOYf1edwa2m_wryFZYRMpRNbA/edit?pli=1] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8042) Parsing of aggregate query fails
[ https://issues.apache.org/jira/browse/BEAM-8042?focusedWorklogId=375301=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375301 ] ASF GitHub Bot logged work on BEAM-8042: Author: ASF GitHub Bot Created on: 21/Jan/20 23:46 Start Date: 21/Jan/20 23:46 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #10649: [BEAM-8042] [ZetaSQL] Fix aggregate column reference URL: https://github.com/apache/beam/pull/10649#discussion_r369304269 ## File path: sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java ## @@ -1347,6 +1347,44 @@ public void testZetaSQLStructFieldAccessInTumble() { pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); } + @Test + public void testAggregateWithAndWithoutColumnRefs() { Review comment: Great suggestions! Will add a message in a commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375301) Time Spent: 1h 20m (was: 1h 10m) > Parsing of aggregate query fails > > > Key: BEAM-8042 > URL: https://issues.apache.org/jira/browse/BEAM-8042 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Kirill Kozlov >Priority: Critical > Time Spent: 1h 20m > Remaining Estimate: 0h > > {code} > @Rule > public TestPipeline pipeline = > TestPipeline.fromOptions(createPipelineOptions()); > private static PipelineOptions createPipelineOptions() { > BeamSqlPipelineOptions opts = > PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class); > opts.setPlannerName(ZetaSQLQueryPlanner.class.getName()); > return opts; > } > @Test > public void testAggregate() { > Schema inputSchema = Schema.builder() > .addByteArrayField("id") > .addInt64Field("has_f1") > .addInt64Field("has_f2") > .addInt64Field("has_f3") > .addInt64Field("has_f4") > .addInt64Field("has_f5") > .addInt64Field("has_f6") > .build(); > String sql = "SELECT \n" + > " id, \n" + > " COUNT(*) as count, \n" + > " SUM(has_f1) as f1_count, \n" + > " SUM(has_f2) as f2_count, \n" + > " SUM(has_f3) as f3_count, \n" + > " SUM(has_f4) as f4_count, \n" + > " SUM(has_f5) as f5_count, \n" + > " SUM(has_f6) as f6_count \n" + > "FROM PCOLLECTION \n" + > "GROUP BY id"; > pipeline > .apply(Create.empty(inputSchema)) > .apply(SqlTransform.query(sql)); > pipeline.run(); > } > {code} > {code} > Caused by: java.lang.RuntimeException: Error while applying rule > AggregateProjectMergeRule, args > [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)), > > rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)] > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637) > at > org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104) > at > ... 39 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96) > at >
[jira] [Work logged] (BEAM-9072) [SQL] Add support for Datastore source
[ https://issues.apache.org/jira/browse/BEAM-9072?focusedWorklogId=375302=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375302 ] ASF GitHub Bot logged work on BEAM-9072: Author: ASF GitHub Bot Created on: 21/Jan/20 23:46 Start Date: 21/Jan/20 23:46 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #10440: [BEAM-9072] [SQL] DataStoreV1 IO connector URL: https://github.com/apache/beam/pull/10440#issuecomment-576940105 Run SQL postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375302) Time Spent: 6h (was: 5h 50m) > [SQL] Add support for Datastore source > -- > > Key: BEAM-9072 > URL: https://issues.apache.org/jira/browse/BEAM-9072 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > > * Create a Datastore table and table provider > * Conversion between Datastore and Beam data types > * Implement buildIOReader > * Implement buildIOWrite > * Implement getTableStatistics > Doc: > [https://docs.google.com/document/d/1FxuEGewJ3GPDl0IKglfOYf1edwa2m_wryFZYRMpRNbA/edit?pli=1] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8042) Parsing of aggregate query fails
[ https://issues.apache.org/jira/browse/BEAM-8042?focusedWorklogId=375299=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375299 ] ASF GitHub Bot logged work on BEAM-8042: Author: ASF GitHub Bot Created on: 21/Jan/20 23:44 Start Date: 21/Jan/20 23:44 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #10649: [BEAM-8042] [ZetaSQL] Fix aggregate column reference URL: https://github.com/apache/beam/pull/10649#discussion_r369303857 ## File path: sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java ## @@ -1347,6 +1347,44 @@ public void testZetaSQLStructFieldAccessInTumble() { pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); } + @Test + public void testAggregateWithAndWithoutColumnRefs() { Review comment: Nit: you might can include a comment like "this test is used to fix BEAM-8042", which can provide some context for readers. But it's not required. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375299) Time Spent: 1h (was: 50m) > Parsing of aggregate query fails > > > Key: BEAM-8042 > URL: https://issues.apache.org/jira/browse/BEAM-8042 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Kirill Kozlov >Priority: Critical > Time Spent: 1h > Remaining Estimate: 0h > > {code} > @Rule > public TestPipeline pipeline = > TestPipeline.fromOptions(createPipelineOptions()); > private static PipelineOptions createPipelineOptions() { > BeamSqlPipelineOptions opts = > PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class); > opts.setPlannerName(ZetaSQLQueryPlanner.class.getName()); > return opts; > } > @Test > public void testAggregate() { > Schema inputSchema = Schema.builder() > .addByteArrayField("id") > .addInt64Field("has_f1") > .addInt64Field("has_f2") > .addInt64Field("has_f3") > .addInt64Field("has_f4") > .addInt64Field("has_f5") > .addInt64Field("has_f6") > .build(); > String sql = "SELECT \n" + > " id, \n" + > " COUNT(*) as count, \n" + > " SUM(has_f1) as f1_count, \n" + > " SUM(has_f2) as f2_count, \n" + > " SUM(has_f3) as f3_count, \n" + > " SUM(has_f4) as f4_count, \n" + > " SUM(has_f5) as f5_count, \n" + > " SUM(has_f6) as f6_count \n" + > "FROM PCOLLECTION \n" + > "GROUP BY id"; > pipeline > .apply(Create.empty(inputSchema)) > .apply(SqlTransform.query(sql)); > pipeline.run(); > } > {code} > {code} > Caused by: java.lang.RuntimeException: Error while applying rule > AggregateProjectMergeRule, args > [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)), > > rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)] > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637) > at > org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104) > at > ... 39 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96) > at >
[jira] [Work logged] (BEAM-8042) Parsing of aggregate query fails
[ https://issues.apache.org/jira/browse/BEAM-8042?focusedWorklogId=375298=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375298 ] ASF GitHub Bot logged work on BEAM-8042: Author: ASF GitHub Bot Created on: 21/Jan/20 23:44 Start Date: 21/Jan/20 23:44 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #10649: [BEAM-8042] [ZetaSQL] Fix aggregate column reference URL: https://github.com/apache/beam/pull/10649#discussion_r369303722 ## File path: sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java ## @@ -1347,6 +1347,44 @@ public void testZetaSQLStructFieldAccessInTumble() { pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); } + @Test + public void testAggregateWithAndWithoutColumnRefs() { +ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); + +String sql = +"SELECT \n" ++ " id, \n" ++ " SUM(has_f1) as f1_count, \n" ++ " SUM(has_f2) as f2_count, \n" ++ " SUM(has_f3) as f3_count, \n" ++ " SUM(has_f4) as f4_count, \n" ++ " SUM(has_f5) as f5_count, \n" ++ " COUNT(*) as count, \n" ++ " SUM(has_f6) as f6_count \n" ++ "FROM (select 0 as id, 1 as has_f1, 2 as has_f2, 3 as has_f3, 4 as has_f4, 5 as has_f5, 6 as has_f6)\n" Review comment: Might want to use named parameters to not rely on hard-coded values. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375298) Time Spent: 50m (was: 40m) > Parsing of aggregate query fails > > > Key: BEAM-8042 > URL: https://issues.apache.org/jira/browse/BEAM-8042 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Kirill Kozlov >Priority: Critical > Time Spent: 50m > Remaining Estimate: 0h > > {code} > @Rule > public TestPipeline pipeline = > TestPipeline.fromOptions(createPipelineOptions()); > private static PipelineOptions createPipelineOptions() { > BeamSqlPipelineOptions opts = > PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class); > opts.setPlannerName(ZetaSQLQueryPlanner.class.getName()); > return opts; > } > @Test > public void testAggregate() { > Schema inputSchema = Schema.builder() > .addByteArrayField("id") > .addInt64Field("has_f1") > .addInt64Field("has_f2") > .addInt64Field("has_f3") > .addInt64Field("has_f4") > .addInt64Field("has_f5") > .addInt64Field("has_f6") > .build(); > String sql = "SELECT \n" + > " id, \n" + > " COUNT(*) as count, \n" + > " SUM(has_f1) as f1_count, \n" + > " SUM(has_f2) as f2_count, \n" + > " SUM(has_f3) as f3_count, \n" + > " SUM(has_f4) as f4_count, \n" + > " SUM(has_f5) as f5_count, \n" + > " SUM(has_f6) as f6_count \n" + > "FROM PCOLLECTION \n" + > "GROUP BY id"; > pipeline > .apply(Create.empty(inputSchema)) > .apply(SqlTransform.query(sql)); > pipeline.run(); > } > {code} > {code} > Caused by: java.lang.RuntimeException: Error while applying rule > AggregateProjectMergeRule, args > [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)), > > rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)] > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637) > at > org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87) > at >
[jira] [Work logged] (BEAM-8042) Parsing of aggregate query fails
[ https://issues.apache.org/jira/browse/BEAM-8042?focusedWorklogId=375300=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375300 ] ASF GitHub Bot logged work on BEAM-8042: Author: ASF GitHub Bot Created on: 21/Jan/20 23:44 Start Date: 21/Jan/20 23:44 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #10649: [BEAM-8042] [ZetaSQL] Fix aggregate column reference URL: https://github.com/apache/beam/pull/10649#discussion_r369303857 ## File path: sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java ## @@ -1347,6 +1347,44 @@ public void testZetaSQLStructFieldAccessInTumble() { pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); } + @Test + public void testAggregateWithAndWithoutColumnRefs() { Review comment: Nit: you might can include a comment like "this test is used to verify BEAM-8042", which can provide some context for readers. But it's not required. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375300) Time Spent: 1h 10m (was: 1h) > Parsing of aggregate query fails > > > Key: BEAM-8042 > URL: https://issues.apache.org/jira/browse/BEAM-8042 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Kirill Kozlov >Priority: Critical > Time Spent: 1h 10m > Remaining Estimate: 0h > > {code} > @Rule > public TestPipeline pipeline = > TestPipeline.fromOptions(createPipelineOptions()); > private static PipelineOptions createPipelineOptions() { > BeamSqlPipelineOptions opts = > PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class); > opts.setPlannerName(ZetaSQLQueryPlanner.class.getName()); > return opts; > } > @Test > public void testAggregate() { > Schema inputSchema = Schema.builder() > .addByteArrayField("id") > .addInt64Field("has_f1") > .addInt64Field("has_f2") > .addInt64Field("has_f3") > .addInt64Field("has_f4") > .addInt64Field("has_f5") > .addInt64Field("has_f6") > .build(); > String sql = "SELECT \n" + > " id, \n" + > " COUNT(*) as count, \n" + > " SUM(has_f1) as f1_count, \n" + > " SUM(has_f2) as f2_count, \n" + > " SUM(has_f3) as f3_count, \n" + > " SUM(has_f4) as f4_count, \n" + > " SUM(has_f5) as f5_count, \n" + > " SUM(has_f6) as f6_count \n" + > "FROM PCOLLECTION \n" + > "GROUP BY id"; > pipeline > .apply(Create.empty(inputSchema)) > .apply(SqlTransform.query(sql)); > pipeline.run(); > } > {code} > {code} > Caused by: java.lang.RuntimeException: Error while applying rule > AggregateProjectMergeRule, args > [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)), > > rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)] > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637) > at > org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104) > at > ... 39 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96) > at >
[jira] [Work logged] (BEAM-8042) Parsing of aggregate query fails
[ https://issues.apache.org/jira/browse/BEAM-8042?focusedWorklogId=375291=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375291 ] ASF GitHub Bot logged work on BEAM-8042: Author: ASF GitHub Bot Created on: 21/Jan/20 23:37 Start Date: 21/Jan/20 23:37 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #10649: [BEAM-8042] [ZetaSQL] Fix aggregate column reference URL: https://github.com/apache/beam/pull/10649 Some aggregate operations do not require a column reference (Ex: `COUNT(*)`, unlike `COUNT(id)`). Such expressions should not increment reference offset when construction `LogicalAggregate`. R: @amaliujia CC: @kanterov CC: @apilloud Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build
[jira] [Created] (BEAM-9167) Reduce overhead of Go SDK side metrics
Robert Burke created BEAM-9167: -- Summary: Reduce overhead of Go SDK side metrics Key: BEAM-9167 URL: https://issues.apache.org/jira/browse/BEAM-9167 Project: Beam Issue Type: Improvement Components: sdk-go Reporter: Robert Burke Locking overhead due to the global store and local caches of SDK counter data can dominate certain workloads, which means we can do better. Instead of having a global store of metrics data to extract counters, we should use per ptransform (or per bundle) counter sets, which would avoid requiring locking per counter operation. The main detriment compared to the current implementation is that a user would need to add their own locking if they were to spawn multiple goroutines to process a Bundle's work in a DoFn. Given that self multithreaded DoFns aren't recommended/safe in Java, largely impossible in Python, and the other beam Go SDK provided constructs (like Iterators and Emitters) are not thread safe, this is a small concern, provided the documentation is clear on this. Removing the locking and switching to atomic ops reduces the overhead significantly in example jobs and in the benchmarks. Related: https://issues.apache.org/jira/browse/BEAM-6541 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3419) Enable iterable side input for beam runners.
[ https://issues.apache.org/jira/browse/BEAM-3419?focusedWorklogId=375288=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375288 ] ASF GitHub Bot logged work on BEAM-3419: Author: ASF GitHub Bot Created on: 21/Jan/20 23:31 Start Date: 21/Jan/20 23:31 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10648: [BEAM-3419] Support iterable on Dataflow runner when using the unified worker. URL: https://github.com/apache/beam/pull/10648#issuecomment-576936184 Ran tests internal to Google to validate this change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375288) Time Spent: 4h 10m (was: 4h) > Enable iterable side input for beam runners. > > > Key: BEAM-3419 > URL: https://issues.apache.org/jira/browse/BEAM-3419 > Project: Beam > Issue Type: Improvement > Components: runner-core >Reporter: Robert Bradshaw >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8042) Parsing of aggregate query fails
[ https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov reassigned BEAM-8042: --- Assignee: Kirill Kozlov > Parsing of aggregate query fails > > > Key: BEAM-8042 > URL: https://issues.apache.org/jira/browse/BEAM-8042 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Kirill Kozlov >Priority: Critical > Time Spent: 0.5h > Remaining Estimate: 0h > > {code} > @Rule > public TestPipeline pipeline = > TestPipeline.fromOptions(createPipelineOptions()); > private static PipelineOptions createPipelineOptions() { > BeamSqlPipelineOptions opts = > PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class); > opts.setPlannerName(ZetaSQLQueryPlanner.class.getName()); > return opts; > } > @Test > public void testAggregate() { > Schema inputSchema = Schema.builder() > .addByteArrayField("id") > .addInt64Field("has_f1") > .addInt64Field("has_f2") > .addInt64Field("has_f3") > .addInt64Field("has_f4") > .addInt64Field("has_f5") > .addInt64Field("has_f6") > .build(); > String sql = "SELECT \n" + > " id, \n" + > " COUNT(*) as count, \n" + > " SUM(has_f1) as f1_count, \n" + > " SUM(has_f2) as f2_count, \n" + > " SUM(has_f3) as f3_count, \n" + > " SUM(has_f4) as f4_count, \n" + > " SUM(has_f5) as f5_count, \n" + > " SUM(has_f6) as f6_count \n" + > "FROM PCOLLECTION \n" + > "GROUP BY id"; > pipeline > .apply(Create.empty(inputSchema)) > .apply(SqlTransform.query(sql)); > pipeline.run(); > } > {code} > {code} > Caused by: java.lang.RuntimeException: Error while applying rule > AggregateProjectMergeRule, args > [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)), > > rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)] > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637) > at > org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104) > at > ... 39 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205) > ... 48 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3419) Enable iterable side input for beam runners.
[ https://issues.apache.org/jira/browse/BEAM-3419?focusedWorklogId=375285=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375285 ] ASF GitHub Bot logged work on BEAM-3419: Author: ASF GitHub Bot Created on: 21/Jan/20 23:24 Start Date: 21/Jan/20 23:24 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10648: [BEAM-3419] Support iterable on Dataflow runner when using the unified worker. URL: https://github.com/apache/beam/pull/10648 Note that all other portable runners are using iterable side inputs. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
[jira] [Commented] (BEAM-8584) Remove TestPipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020644#comment-17020644 ] Brian Hulette commented on BEAM-8584: - Sorry I didn't get back to you with that link Mark. I've been out of the office a bit dealing with a medical issue (all is well now :)) I think the reasoning qualifies as language-specific, I just noted that much of the functionality provided by TestPipelineOptions could be provided in other ways. > Remove TestPipelineOptions > -- > > Key: BEAM-8584 > URL: https://issues.apache.org/jira/browse/BEAM-8584 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > > See [ML > thread|https://lists.apache.org/thread.html/cc2ac6db764e0d750688f8bae540728e38759365b86ba6f3fabfa6dd@%3Cdev.beam.apache.org%3E] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9063) Migrate docker images to apache namespace.
[ https://issues.apache.org/jira/browse/BEAM-9063?focusedWorklogId=375277=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375277 ] ASF GitHub Bot logged work on BEAM-9063: Author: ASF GitHub Bot Created on: 21/Jan/20 23:07 Start Date: 21/Jan/20 23:07 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10612: [NOT READY TO MERGE][BEAM-9063] migrate docker images to apache URL: https://github.com/apache/beam/pull/10612#issuecomment-576929102 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375277) Time Spent: 3h 20m (was: 3h 10m) > Migrate docker images to apache namespace. > -- > > Key: BEAM-9063 > URL: https://issues.apache.org/jira/browse/BEAM-9063 > Project: Beam > Issue Type: Task > Components: beam-community >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: Not applicable > > Time Spent: 3h 20m > Remaining Estimate: 0h > > https://hub.docker.com/u/apache -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9166) Add unit tests for ZetaSQL time-related functions that depends on system clock
Yueyang Qiu created BEAM-9166: - Summary: Add unit tests for ZetaSQL time-related functions that depends on system clock Key: BEAM-9166 URL: https://issues.apache.org/jira/browse/BEAM-9166 Project: Beam Issue Type: Improvement Components: dsl-sql-zetasql Reporter: Yueyang Qiu Assignee: Yueyang Qiu Example test query: "SELECT CURRENT_TIMESTAMP()" We cannot test this because currently the Java ZetaSQL evaluator wrapper does not support configuring clocks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-9080) [Go SDK] beam.Partition should support PCollection>s and not just PCollection
[ https://issues.apache.org/jira/browse/BEAM-9080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke closed BEAM-9080. -- Fix Version/s: Not applicable Resolution: Fixed > [Go SDK] beam.Partition should support PCollection>s and not just > PCollection > > > Key: BEAM-9080 > URL: https://issues.apache.org/jira/browse/BEAM-9080 > Project: Beam > Issue Type: New Feature > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Minor > Labels: beginner, noob, starter > Fix For: Not applicable > > Time Spent: 0.5h > Remaining Estimate: 0h > > beam.Partition should support PCollection>s and not just > PCollection > If StructualDynFns also present themselves, the emitters can be optimized too. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9029) Two bugs in Python SDK S3 filesystem support
[ https://issues.apache.org/jira/browse/BEAM-9029?focusedWorklogId=375263=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375263 ] ASF GitHub Bot logged work on BEAM-9029: Author: ASF GitHub Bot Created on: 21/Jan/20 22:48 Start Date: 21/Jan/20 22:48 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10459: [BEAM-9029]Fix two bugs in Python SDK S3 filesystem support URL: https://github.com/apache/beam/pull/10459 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375263) Remaining Estimate: 21h (was: 21h 10m) Time Spent: 3h (was: 2h 50m) > Two bugs in Python SDK S3 filesystem support > > > Key: BEAM-9029 > URL: https://issues.apache.org/jira/browse/BEAM-9029 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Wenhai Pan >Assignee: Wenhai Pan >Priority: Major > Labels: pull-request-available > Original Estimate: 24h > Time Spent: 3h > Remaining Estimate: 21h > > Hi :) > There seem to be 2 bugs in the S3 filesystem support. > I tried to use S3 storage for a simple wordcount demo with DirectRunner. > The demo script: > {code:java} > def main(): > options = PipelineOptions().view_as(StandardOptions) > options.runner = 'DirectRunner' > pipeline = beam.Pipeline(options = options) > ( > pipeline > | ReadFromText("s3://mx-machine-learning/panwenhai/beam_test/test_data") > | "extract_words" >> beam.FlatMap(lambda x: re.findall(r" [A-Za-z\']+", x)) > | beam.combiners.Count.PerElement() > | beam.MapTuple(lambda word, count: "%s: %s" % (word, count)) > | WriteToText("s3://mx-machine-learning/panwenhai/beam_test/output") > ) > result = pipeline.run() > result.wait_until_finish() > return > {code} > > Error message 1: > {noformat} > apache_beam.io.filesystem.BeamIOError: Match operation failed with exceptions > {'s3://mx-machine-learning/panwenhai/beam_test/output-*-of-1': > BeamIOError("List operation failed with exceptions > {'s3://mx-machine-learning/panwenhai/beam_test/output-': S3ClientError('Tried > to list nonexistent S3 path: > s3://mx-machine-learning/panwenhai/beam_test/output-', 404)}")} [while > running 'WriteToText/Write/WriteImpl/PreFinalize'] with exceptions > None{noformat} > > After digging into the code, it seems the Boto3 client's list function will > raise an exception when trying to list a nonexistent S3 path > (beam/sdks/pythonapache_beam/io/aws/clients/s3/boto3_client.py line 111). And > the S3IO class does not handle this exception in list_prefix function > (beam/sdks/python/apache_beam/io/aws/s3io.py line 121). > When the runner tries to list and delete the existing output file, if there > is no existing output file, it will try to list a nonexistent S3 path and > will trigger the exception. > This should not be an issue here. I think we can ignore this exception safely > in the S3IO list_prefix function. > Error Message 2: > {noformat} > File > "/Users/wenhai.pan/venvs/tfx/lib/python3.7/site-packages/apache_beam-2.19.0.dev0-py3.7.egg/apache_beam/io/aws/s3filesystem.py", > line 272, in delete > exceptions = {path: error for (path, error) in results > File > "/Users/wenhai.pan/venvs/tfx/lib/python3.7/site-packages/apache_beam-2.19.0.dev0-py3.7.egg/apache_beam/io/aws/s3filesystem.py", > line 272, in > exceptions = {path: error for (path, error) in results > ValueError: too many values to unpack (expected 2) [while running > 'WriteToText/Write/WriteImpl/FinalizeWrite']{noformat} > > When the runner tries to delete the temporary output directory, it will > trigger this exception. This exception is caused by parsing (path, error) > directly from the "results" which is a dict > (beam/sdks/python/apache_beam/io/aws/s3filesystem.py line 272). I think we > should use results.items() here. > I have submitted a patch for these 2 bugs: > https://github.com/apache/beam/pull/10459 > > Thank you. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9029) Two bugs in Python SDK S3 filesystem support
[ https://issues.apache.org/jira/browse/BEAM-9029?focusedWorklogId=375262=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375262 ] ASF GitHub Bot logged work on BEAM-9029: Author: ASF GitHub Bot Created on: 21/Jan/20 22:47 Start Date: 21/Jan/20 22:47 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10459: [BEAM-9029]Fix two bugs in Python SDK S3 filesystem support URL: https://github.com/apache/beam/pull/10459#issuecomment-576922546 LGTM. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375262) Remaining Estimate: 21h 10m (was: 21h 20m) Time Spent: 2h 50m (was: 2h 40m) > Two bugs in Python SDK S3 filesystem support > > > Key: BEAM-9029 > URL: https://issues.apache.org/jira/browse/BEAM-9029 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Wenhai Pan >Assignee: Wenhai Pan >Priority: Major > Labels: pull-request-available > Original Estimate: 24h > Time Spent: 2h 50m > Remaining Estimate: 21h 10m > > Hi :) > There seem to be 2 bugs in the S3 filesystem support. > I tried to use S3 storage for a simple wordcount demo with DirectRunner. > The demo script: > {code:java} > def main(): > options = PipelineOptions().view_as(StandardOptions) > options.runner = 'DirectRunner' > pipeline = beam.Pipeline(options = options) > ( > pipeline > | ReadFromText("s3://mx-machine-learning/panwenhai/beam_test/test_data") > | "extract_words" >> beam.FlatMap(lambda x: re.findall(r" [A-Za-z\']+", x)) > | beam.combiners.Count.PerElement() > | beam.MapTuple(lambda word, count: "%s: %s" % (word, count)) > | WriteToText("s3://mx-machine-learning/panwenhai/beam_test/output") > ) > result = pipeline.run() > result.wait_until_finish() > return > {code} > > Error message 1: > {noformat} > apache_beam.io.filesystem.BeamIOError: Match operation failed with exceptions > {'s3://mx-machine-learning/panwenhai/beam_test/output-*-of-1': > BeamIOError("List operation failed with exceptions > {'s3://mx-machine-learning/panwenhai/beam_test/output-': S3ClientError('Tried > to list nonexistent S3 path: > s3://mx-machine-learning/panwenhai/beam_test/output-', 404)}")} [while > running 'WriteToText/Write/WriteImpl/PreFinalize'] with exceptions > None{noformat} > > After digging into the code, it seems the Boto3 client's list function will > raise an exception when trying to list a nonexistent S3 path > (beam/sdks/pythonapache_beam/io/aws/clients/s3/boto3_client.py line 111). And > the S3IO class does not handle this exception in list_prefix function > (beam/sdks/python/apache_beam/io/aws/s3io.py line 121). > When the runner tries to list and delete the existing output file, if there > is no existing output file, it will try to list a nonexistent S3 path and > will trigger the exception. > This should not be an issue here. I think we can ignore this exception safely > in the S3IO list_prefix function. > Error Message 2: > {noformat} > File > "/Users/wenhai.pan/venvs/tfx/lib/python3.7/site-packages/apache_beam-2.19.0.dev0-py3.7.egg/apache_beam/io/aws/s3filesystem.py", > line 272, in delete > exceptions = {path: error for (path, error) in results > File > "/Users/wenhai.pan/venvs/tfx/lib/python3.7/site-packages/apache_beam-2.19.0.dev0-py3.7.egg/apache_beam/io/aws/s3filesystem.py", > line 272, in > exceptions = {path: error for (path, error) in results > ValueError: too many values to unpack (expected 2) [while running > 'WriteToText/Write/WriteImpl/FinalizeWrite']{noformat} > > When the runner tries to delete the temporary output directory, it will > trigger this exception. This exception is caused by parsing (path, error) > directly from the "results" which is a dict > (beam/sdks/python/apache_beam/io/aws/s3filesystem.py line 272). I think we > should use results.items() here. > I have submitted a patch for these 2 bugs: > https://github.com/apache/beam/pull/10459 > > Thank you. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9120) Deprecate onSuccessMatcher, onCreateMatcher
[ https://issues.apache.org/jira/browse/BEAM-9120?focusedWorklogId=375261=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375261 ] ASF GitHub Bot logged work on BEAM-9120: Author: ASF GitHub Bot Created on: 21/Jan/20 22:37 Start Date: 21/Jan/20 22:37 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #10589: [WIP] [BEAM-9120] Deprecate onSuccessMatcher, onCreateMatcher URL: https://github.com/apache/beam/pull/10589#discussion_r369281313 ## File path: runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/TestDataflowRunner.java ## @@ -102,8 +101,6 @@ DataflowPipelineJob run(Pipeline pipeline, DataflowRunner runner) { job.getJobId(), expectedNumberOfAssertions); -assertThat(job, testPipelineOptions.getOnCreateMatcher()); Review comment: Does it do that? My understanding was that this would be a no-op by default, since OnCreateMatcher is an instance of AlwaysPassMatcher. It will only fail fast if the user sets an OnCreateMatcher, which I don't think is happening anywhere outside of TestDataflowRunnerTest. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375261) Time Spent: 2h (was: 1h 50m) > Deprecate onSuccessMatcher, onCreateMatcher > --- > > Key: BEAM-9120 > URL: https://issues.apache.org/jira/browse/BEAM-9120 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > Instead of creating matchers on PipelineResult we should just make assertions > on real matchers after waiting for the pipeline to finish. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9120) Deprecate onSuccessMatcher, onCreateMatcher
[ https://issues.apache.org/jira/browse/BEAM-9120?focusedWorklogId=375258=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375258 ] ASF GitHub Bot logged work on BEAM-9120: Author: ASF GitHub Bot Created on: 21/Jan/20 22:31 Start Date: 21/Jan/20 22:31 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #10586: [BEAM-9120] Make BigqueryMatcher extend TypeSafeMatcher URL: https://github.com/apache/beam/pull/10586 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375258) Time Spent: 1h 40m (was: 1.5h) > Deprecate onSuccessMatcher, onCreateMatcher > --- > > Key: BEAM-9120 > URL: https://issues.apache.org/jira/browse/BEAM-9120 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > Instead of creating matchers on PipelineResult we should just make assertions > on real matchers after waiting for the pipeline to finish. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9120) Deprecate onSuccessMatcher, onCreateMatcher
[ https://issues.apache.org/jira/browse/BEAM-9120?focusedWorklogId=375259=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375259 ] ASF GitHub Bot logged work on BEAM-9120: Author: ASF GitHub Bot Created on: 21/Jan/20 22:31 Start Date: 21/Jan/20 22:31 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #10588: [BEAM-9120] Make FileChecksumMatcher extend TypeSafeMatcher URL: https://github.com/apache/beam/pull/10588 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375259) Time Spent: 1h 50m (was: 1h 40m) > Deprecate onSuccessMatcher, onCreateMatcher > --- > > Key: BEAM-9120 > URL: https://issues.apache.org/jira/browse/BEAM-9120 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Instead of creating matchers on PipelineResult we should just make assertions > on real matchers after waiting for the pipeline to finish. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9151) Dataflow legacy worker tests are mis-configured
[ https://issues.apache.org/jira/browse/BEAM-9151?focusedWorklogId=375255=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375255 ] ASF GitHub Bot logged work on BEAM-9151: Author: ASF GitHub Bot Created on: 21/Jan/20 22:29 Start Date: 21/Jan/20 22:29 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #10647: [BEAM-9151] Cherry-pick: Fix misconfigured legacy dataflow tests URL: https://github.com/apache/beam/pull/10647#issuecomment-576915611 R: @aaltay This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375255) Time Spent: 1h (was: 50m) > Dataflow legacy worker tests are mis-configured > --- > > Key: BEAM-9151 > URL: https://issues.apache.org/jira/browse/BEAM-9151 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.19.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Please refer to the last comment of https://github.com/apache/beam/pull/8183 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9151) Dataflow legacy worker tests are mis-configured
[ https://issues.apache.org/jira/browse/BEAM-9151?focusedWorklogId=375254=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375254 ] ASF GitHub Bot logged work on BEAM-9151: Author: ASF GitHub Bot Created on: 21/Jan/20 22:26 Start Date: 21/Jan/20 22:26 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #10647: [BEAM-9151] Cherry-pick: Fix misconfigured legacy dataflow tests URL: https://github.com/apache/beam/pull/10647#issuecomment-576914509 Run Java_Examples_Dataflow PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375254) Time Spent: 50m (was: 40m) > Dataflow legacy worker tests are mis-configured > --- > > Key: BEAM-9151 > URL: https://issues.apache.org/jira/browse/BEAM-9151 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.19.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Please refer to the last comment of https://github.com/apache/beam/pull/8183 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9063) Migrate docker images to apache namespace.
[ https://issues.apache.org/jira/browse/BEAM-9063?focusedWorklogId=375253=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375253 ] ASF GitHub Bot logged work on BEAM-9063: Author: ASF GitHub Bot Created on: 21/Jan/20 22:25 Start Date: 21/Jan/20 22:25 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10612: [NOT READY TO MERGE][BEAM-9063] migrate docker images to apache URL: https://github.com/apache/beam/pull/10612#issuecomment-576914372 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375253) Time Spent: 3h 10m (was: 3h) > Migrate docker images to apache namespace. > -- > > Key: BEAM-9063 > URL: https://issues.apache.org/jira/browse/BEAM-9063 > Project: Beam > Issue Type: Task > Components: beam-community >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: Not applicable > > Time Spent: 3h 10m > Remaining Estimate: 0h > > https://hub.docker.com/u/apache -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9151) Dataflow legacy worker tests are mis-configured
[ https://issues.apache.org/jira/browse/BEAM-9151?focusedWorklogId=375252=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375252 ] ASF GitHub Bot logged work on BEAM-9151: Author: ASF GitHub Bot Created on: 21/Jan/20 22:22 Start Date: 21/Jan/20 22:22 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #10647: [BEAM-9151] Cherry-pick: Fix misconfigured legacy dataflow tests URL: https://github.com/apache/beam/pull/10647 (cherry picked from commit f99c7d0a3ef55161797d6d00c7acf3a67ae0ee6e) **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build
[jira] [Work logged] (BEAM-9151) Dataflow legacy worker tests are mis-configured
[ https://issues.apache.org/jira/browse/BEAM-9151?focusedWorklogId=375250=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375250 ] ASF GitHub Bot logged work on BEAM-9151: Author: ASF GitHub Bot Created on: 21/Jan/20 22:14 Start Date: 21/Jan/20 22:14 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10635: [BEAM-9151] Fix misconfigured legacy dataflow tests. URL: https://github.com/apache/beam/pull/10635 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375250) Time Spent: 0.5h (was: 20m) > Dataflow legacy worker tests are mis-configured > --- > > Key: BEAM-9151 > URL: https://issues.apache.org/jira/browse/BEAM-9151 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.19.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Please refer to the last comment of https://github.com/apache/beam/pull/8183 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9140) Update to ZetaSQL 2020.01.1
[ https://issues.apache.org/jira/browse/BEAM-9140?focusedWorklogId=375246=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375246 ] ASF GitHub Bot logged work on BEAM-9140: Author: ASF GitHub Bot Created on: 21/Jan/20 22:10 Start Date: 21/Jan/20 22:10 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #10620: [BEAM-9140] Upgrade to ZetaSQL 2020.01.1 URL: https://github.com/apache/beam/pull/10620#issuecomment-576908170 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375246) Time Spent: 1h 50m (was: 1h 40m) > Update to ZetaSQL 2020.01.1 > --- > > Key: BEAM-9140 > URL: https://issues.apache.org/jira/browse/BEAM-9140 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Andrew Pilloud >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > I hear ZetaSQL 2020.01.1 will be coming out in the next few hours. We should > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9093) Pipeline options which with different underlying store variable does not get over written
[ https://issues.apache.org/jira/browse/BEAM-9093?focusedWorklogId=375240=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375240 ] ASF GitHub Bot logged work on BEAM-9093: Author: ASF GitHub Bot Created on: 21/Jan/20 21:59 Start Date: 21/Jan/20 21:59 Worklog Time Spent: 10m Work Description: ihji commented on pull request #10613: [BEAM-9093] Log invalid overwrites in pipeline options URL: https://github.com/apache/beam/pull/10613#discussion_r369264919 ## File path: sdks/python/apache_beam/options/pipeline_options_test.py ## @@ -264,6 +264,18 @@ def test_override_options(self): self.assertEqual(options.get_all_options()['num_workers'], 5) self.assertTrue(options.get_all_options()['mock_flag']) + def test_override_init_options(self): +base_flags = ['--num_workers', '5'] +options = PipelineOptions(base_flags, mock_flag=True) +self.assertEqual(options.get_all_options()['num_workers'], 5) +self.assertEqual(options.get_all_options()['mock_flag'], True) + + def test_invalid_override_init_options(self): +base_flags = ['--num_workers', '5'] +options = PipelineOptions(base_flags, mock_invalid_flag=True) +self.assertEqual(options.get_all_options()['num_workers'], 5) +self.assertEqual(options.get_all_options()['mock_flag'], False) Review comment: It would be great if we could also capture the logging in addition to only checking `mock_flag` is untouched. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375240) Time Spent: 40m (was: 0.5h) > Pipeline options which with different underlying store variable does not get > over written > - > > Key: BEAM-9093 > URL: https://issues.apache.org/jira/browse/BEAM-9093 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ankur Goenka >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Example: > PipelineOptions(flags=[],**\{'no_use_public_ips': True,}) > Expectation: use_public_ips should be set False. > Actual: the value is not used as its not passed through argparser > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9093) Pipeline options which with different underlying store variable does not get over written
[ https://issues.apache.org/jira/browse/BEAM-9093?focusedWorklogId=375241=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375241 ] ASF GitHub Bot logged work on BEAM-9093: Author: ASF GitHub Bot Created on: 21/Jan/20 21:59 Start Date: 21/Jan/20 21:59 Worklog Time Spent: 10m Work Description: ihji commented on pull request #10613: [BEAM-9093] Log invalid overwrites in pipeline options URL: https://github.com/apache/beam/pull/10613#discussion_r369260182 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -288,15 +288,20 @@ def get_all_options(self, _LOGGER.warning("Discarding unparseable args: %s", unknown_args) result = vars(known_args) +overrides = self._all_options.copy() # Apply the overrides if any for k in list(result): + overrides.pop(k, None) if k in self._all_options: result[k] = self._all_options[k] if (drop_default and parser.get_default(k) == result[k] and not isinstance(parser.get_default(k), ValueProvider)): del result[k] +if overrides: Review comment: Looks like there are two spaces. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375241) Time Spent: 40m (was: 0.5h) > Pipeline options which with different underlying store variable does not get > over written > - > > Key: BEAM-9093 > URL: https://issues.apache.org/jira/browse/BEAM-9093 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ankur Goenka >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Example: > PipelineOptions(flags=[],**\{'no_use_public_ips': True,}) > Expectation: use_public_ips should be set False. > Actual: the value is not used as its not passed through argparser > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8626) Implement status api handler in python sdk harness
[ https://issues.apache.org/jira/browse/BEAM-8626?focusedWorklogId=375239=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375239 ] ASF GitHub Bot logged work on BEAM-8626: Author: ASF GitHub Bot Created on: 21/Jan/20 21:57 Start Date: 21/Jan/20 21:57 Worklog Time Spent: 10m Work Description: y1chi commented on pull request #10598: [BEAM-8626] Implement status fn api handler in python sdk URL: https://github.com/apache/beam/pull/10598#discussion_r369264960 ## File path: sdks/python/apache_beam/runners/worker/worker_status.py ## @@ -0,0 +1,148 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +"""Worker status api handler for reporting SDK harness debug info.""" + +from __future__ import absolute_import +from __future__ import division + +import queue +import sys +import threading +import traceback +from collections import defaultdict + +import grpc + +from apache_beam.portability.api import beam_fn_api_pb2 +from apache_beam.portability.api import beam_fn_api_pb2_grpc +from apache_beam.runners.worker.channel_factory import GRPCChannelFactory +from apache_beam.runners.worker.worker_id_interceptor import WorkerIdInterceptor + + +def thread_dump(): + """Get a thread dump for the current SDK worker harness. """ + # deduplicate threads with same stack trace + stack_traces = defaultdict(list) + frames = sys._current_frames() # pylint: disable=protected-access + + for t in threading.enumerate(): +stack_trace = ''.join(traceback.format_stack(frames[t.ident])) +thread_ident_name = (t.ident, t.name) +stack_traces[stack_trace].append(thread_ident_name) + + all_traces = ['=' * 10 + 'THREAD DUMP' + '=' * 10] + for stack, identity in stack_traces.items(): +ident, name = identity[0] +trace = '--- Thread #%s name: %s %s---\n' % ( Review comment: this is already printed in a separated line below. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375239) Time Spent: 4h 20m (was: 4h 10m) > Implement status api handler in python sdk harness > -- > > Key: BEAM-8626 > URL: https://issues.apache.org/jira/browse/BEAM-8626 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9122) Add uses_keyed_state step property to python dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-9122?focusedWorklogId=375238=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375238 ] ASF GitHub Bot logged work on BEAM-9122: Author: ASF GitHub Bot Created on: 21/Jan/20 21:56 Start Date: 21/Jan/20 21:56 Worklog Time Spent: 10m Work Description: angoenka commented on issue #10596: [BEAM-9122] Add uses_keyed_state step property in python dataflow run… URL: https://github.com/apache/beam/pull/10596#issuecomment-576901355 Run Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375238) Time Spent: 2h (was: 1h 50m) > Add uses_keyed_state step property to python dataflow runner > > > Key: BEAM-9122 > URL: https://issues.apache.org/jira/browse/BEAM-9122 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > Add additional step property to dataflow job property when a DoFn is stateful > in python sdk. So that the backend runner can recognize stateful steps. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9122) Add uses_keyed_state step property to python dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-9122?focusedWorklogId=375236=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375236 ] ASF GitHub Bot logged work on BEAM-9122: Author: ASF GitHub Bot Created on: 21/Jan/20 21:55 Start Date: 21/Jan/20 21:55 Worklog Time Spent: 10m Work Description: angoenka commented on issue #10596: [BEAM-9122] Add uses_keyed_state step property in python dataflow run… URL: https://github.com/apache/beam/pull/10596#issuecomment-576900982 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375236) Time Spent: 1h 50m (was: 1h 40m) > Add uses_keyed_state step property to python dataflow runner > > > Key: BEAM-9122 > URL: https://issues.apache.org/jira/browse/BEAM-9122 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Add additional step property to dataflow job property when a DoFn is stateful > in python sdk. So that the backend runner can recognize stateful steps. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7516) Add a watermark manager for the fn_api_runner
[ https://issues.apache.org/jira/browse/BEAM-7516?focusedWorklogId=375235=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375235 ] ASF GitHub Bot logged work on BEAM-7516: Author: ASF GitHub Bot Created on: 21/Jan/20 21:48 Start Date: 21/Jan/20 21:48 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10291: [BEAM-7516][BEAM-8823] FnApiRunner works with work queues, and a primitive watermark manager URL: https://github.com/apache/beam/pull/10291#issuecomment-576898410 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375235) Time Spent: 4h (was: 3h 50m) > Add a watermark manager for the fn_api_runner > - > > Key: BEAM-7516 > URL: https://issues.apache.org/jira/browse/BEAM-7516 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > To track watermarks for each stage -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8626) Implement status api handler in python sdk harness
[ https://issues.apache.org/jira/browse/BEAM-8626?focusedWorklogId=375230=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375230 ] ASF GitHub Bot logged work on BEAM-8626: Author: ASF GitHub Bot Created on: 21/Jan/20 21:44 Start Date: 21/Jan/20 21:44 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #10598: [BEAM-8626] Implement status fn api handler in python sdk URL: https://github.com/apache/beam/pull/10598#discussion_r369252542 ## File path: sdks/python/apache_beam/runners/worker/worker_status.py ## @@ -0,0 +1,148 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +"""Worker status api handler for reporting SDK harness debug info.""" + +from __future__ import absolute_import +from __future__ import division + +import queue +import sys +import threading +import traceback +from collections import defaultdict + +import grpc + +from apache_beam.portability.api import beam_fn_api_pb2 +from apache_beam.portability.api import beam_fn_api_pb2_grpc +from apache_beam.runners.worker.channel_factory import GRPCChannelFactory +from apache_beam.runners.worker.worker_id_interceptor import WorkerIdInterceptor + + +def thread_dump(): + """Get a thread dump for the current SDK worker harness. """ + # deduplicate threads with same stack trace + stack_traces = defaultdict(list) + frames = sys._current_frames() # pylint: disable=protected-access + + for t in threading.enumerate(): +stack_trace = ''.join(traceback.format_stack(frames[t.ident])) +thread_ident_name = (t.ident, t.name) +stack_traces[stack_trace].append(thread_ident_name) + + all_traces = ['=' * 10 + 'THREAD DUMP' + '=' * 10] + for stack, identity in stack_traces.items(): Review comment: You are right. It's already in this PR. We can print names of all the threads along with count so that we don't miss any information. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375230) Time Spent: 4h (was: 3h 50m) > Implement status api handler in python sdk harness > -- > > Key: BEAM-8626 > URL: https://issues.apache.org/jira/browse/BEAM-8626 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8626) Implement status api handler in python sdk harness
[ https://issues.apache.org/jira/browse/BEAM-8626?focusedWorklogId=375231=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375231 ] ASF GitHub Bot logged work on BEAM-8626: Author: ASF GitHub Bot Created on: 21/Jan/20 21:44 Start Date: 21/Jan/20 21:44 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #10598: [BEAM-8626] Implement status fn api handler in python sdk URL: https://github.com/apache/beam/pull/10598#discussion_r369257775 ## File path: sdks/python/apache_beam/runners/worker/worker_status.py ## @@ -0,0 +1,148 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +"""Worker status api handler for reporting SDK harness debug info.""" + +from __future__ import absolute_import +from __future__ import division + +import queue +import sys +import threading +import traceback +from collections import defaultdict + +import grpc + +from apache_beam.portability.api import beam_fn_api_pb2 +from apache_beam.portability.api import beam_fn_api_pb2_grpc +from apache_beam.runners.worker.channel_factory import GRPCChannelFactory +from apache_beam.runners.worker.worker_id_interceptor import WorkerIdInterceptor + + +def thread_dump(): + """Get a thread dump for the current SDK worker harness. """ + # deduplicate threads with same stack trace + stack_traces = defaultdict(list) + frames = sys._current_frames() # pylint: disable=protected-access + + for t in threading.enumerate(): +stack_trace = ''.join(traceback.format_stack(frames[t.ident])) +thread_ident_name = (t.ident, t.name) +stack_traces[stack_trace].append(thread_ident_name) + + all_traces = ['=' * 10 + 'THREAD DUMP' + '=' * 10] + for stack, identity in stack_traces.items(): +ident, name = identity[0] +trace = '--- Thread #%s name: %s %s---\n' % ( Review comment: ```suggestion trace = '--- Threads (%d) %s --- \n' % (len(identity), [ident+':'+name for (ident, name) in identity]) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 375231) Time Spent: 4h 10m (was: 4h) > Implement status api handler in python sdk harness > -- > > Key: BEAM-8626 > URL: https://issues.apache.org/jira/browse/BEAM-8626 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)