[jira] [Work logged] (BEAM-8113) FlinkRunner: Stage files from context classloader
[ https://issues.apache.org/jira/browse/BEAM-8113?focusedWorklogId=307689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307689 ] ASF GitHub Bot logged work on BEAM-8113: Author: ASF GitHub Bot Created on: 06/Sep/19 07:59 Start Date: 06/Sep/19 07:59 Worklog Time Spent: 10m Work Description: mxm commented on pull request #9451: [BEAM-8113] Stage files from context classloader URL: https://github.com/apache/beam/pull/9451#discussion_r321618536 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineResources.java ## @@ -40,34 +44,42 @@ public class PipelineResources { /** - * Attempts to detect all the resources the class loader has access to. This does not recurse to - * class loader parents stopping it from pulling in resources from the system class loader. + * Detects all URLs that are present in all class loaders in between context class loader of + * calling thread and class loader of class passed as parameter. It doesn't follow parents above + * this class loader stopping it from pulling in resources from the system class loader. * - * @param classLoader The URLClassLoader to use to detect resources to stage. - * @throws IllegalArgumentException If either the class loader is not a URLClassLoader or one of - * the resources the class loader exposes is not a file resource. + * @param cls Class whose class loader stops recursion into parent loaders + * @throws IllegalArgumentException no classloader in context hierarchy is URLClassloader or if + * one of the resources any class loader exposes is not a file resource. * @return A list of absolute paths to the resources the class loader uses. */ - public static List detectClassPathResourcesToStage(ClassLoader classLoader) { -if (!(classLoader instanceof URLClassLoader)) { - String message = - String.format( - "Unable to use ClassLoader to detect classpath elements. " - + "Current ClassLoader is %s, only URLClassLoaders are supported.", - classLoader); - throw new IllegalArgumentException(message); -} + public static List detectClassPathResourcesToStage(Class cls) { +return detectClassPathResourcesToStage(cls.getClassLoader()); + } -List files = new ArrayList<>(); -for (URL url : ((URLClassLoader) classLoader).getURLs()) { - try { -files.add(new File(url.toURI()).getAbsolutePath()); - } catch (IllegalArgumentException | URISyntaxException e) { -String message = String.format("Unable to convert url (%s) to file.", url); -throw new IllegalArgumentException(message, e); - } + /** + * Detect resources to stage, by using hierarchy between current context classloader and the given + * one. Always include the given class loader in the process. + * + * @param loader the loader to use as stopping condition when traversing context class loaders + * @return A list of absolute paths to the resources the class loader uses. + */ + @VisibleForTesting + static List detectClassPathResourcesToStage(final ClassLoader loader) { +Set files = new HashSet<>(); Review comment: You are right, this is bound to cause trouble. We need to preserve the class loader hierarchy. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307689) Time Spent: 7h 20m (was: 7h 10m) > FlinkRunner: Stage files from context classloader > - > > Key: BEAM-8113 > URL: https://issues.apache.org/jira/browse/BEAM-8113 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Jan Lukavský >Assignee: Jan Lukavský >Priority: Major > Time Spent: 7h 20m > Remaining Estimate: 0h > > Currently, only files from {{FlinkRunner.class.getClassLoader()}} are staged > by default. Add also files from > {{Thread.currentThread().getContextClassLoader()}}. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-5428) Implement cross-bundle state caching.
[ https://issues.apache.org/jira/browse/BEAM-5428?focusedWorklogId=307679=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307679 ] ASF GitHub Bot logged work on BEAM-5428: Author: ASF GitHub Bot Created on: 06/Sep/19 07:41 Start Date: 06/Sep/19 07:41 Worklog Time Spent: 10m Work Description: mxm commented on pull request #9418: [BEAM-5428] Implement cross-bundle user state caching in the Python SDK URL: https://github.com/apache/beam/pull/9418#discussion_r321611844 ## File path: sdks/python/apache_beam/runners/worker/worker_pool_main.py ## @@ -47,21 +47,26 @@ class BeamFnExternalWorkerPoolServicer( beam_fn_api_pb2_grpc.BeamFnExternalWorkerPoolServicer): - def __init__(self, worker_threads, use_process=False, - container_executable=None): + def __init__(self, worker_threads, + use_process=False, + container_executable=None, + state_cache_size=0): self._worker_threads = worker_threads self._use_process = use_process self._container_executable = container_executable +self._state_cache_size = state_cache_size self._worker_processes = {} @classmethod def start(cls, worker_threads=1, use_process=False, port=0, -container_executable=None): +state_cache_size=0, container_executable=None): worker_server = grpc.server(futures.ThreadPoolExecutor(max_workers=10)) worker_address = 'localhost:%s' % worker_server.add_insecure_port( '[::]:%s' % port) -worker_pool = cls(worker_threads, use_process=use_process, - container_executable=container_executable) +worker_pool = cls(worker_threads, + use_process=use_process, + container_executable=container_executable, + state_cache_size=state_cache_size) Review comment: We need this here as well because this class is also used for testing. It will be overridden for the non-testing case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307679) Time Spent: 18h (was: 17h 50m) > Implement cross-bundle state caching. > - > > Key: BEAM-5428 > URL: https://issues.apache.org/jira/browse/BEAM-5428 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Robert Bradshaw >Assignee: Rakesh Kumar >Priority: Major > Time Spent: 18h > Remaining Estimate: 0h > > Tech spec: > [https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m] > Relevant document: > [https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit#|https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit] > Mailing list link: > [https://lists.apache.org/thread.html/caa8d9bc6ca871d13de2c5e6ba07fdc76f85d26497d95d90893aa1f6@%3Cdev.beam.apache.org%3E] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness
[ https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=307745=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307745 ] ASF GitHub Bot logged work on BEAM-7945: Author: ASF GitHub Bot Created on: 06/Sep/19 10:29 Start Date: 06/Sep/19 10:29 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on pull request #9452: [BEAM-7945] Allow runner to configure semi_persist_dir which is used … URL: https://github.com/apache/beam/pull/9452#discussion_r321674314 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/options/RemoteEnvironmentOptions.java ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.options; + +import com.google.auto.service.AutoService; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +/** Options that are used to control configuration of the remote environment. */ +@Experimental +@Hidden +public interface RemoteEnvironmentOptions extends PipelineOptions { + + @Description("Local semi-persistent directory") + @Default.String("/tmp") Review comment: Currently, we keep the same as the default value of other default configuration, such as:`boot.go`. - https://github.com/apache/beam/blob/d21bbaf4c70986c2dbdbe8f6fce35b2b2cb4843d/sdks/go/container/boot.go#L41 - https://github.com/apache/beam/blob/d21bbaf4c70986c2dbdbe8f6fce35b2b2cb4843d/sdks/python/container/boot.go#L51 So, how about we keep using `/tmp` as default value ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307745) Time Spent: 2h (was: 1h 50m) > Allow runner to configure "semi_persist_dir" which is used in the SDK harness > - > > Key: BEAM-7945 > URL: https://issues.apache.org/jira/browse/BEAM-7945 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.16.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Currently "semi_persist_dir" is not configurable. This may become a problem > in certain scenarios. For example, the default value of "semi_persist_dir" is > "/tmp" > ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48]) > in Python SDK harness. When the environment type is "PROCESS", the disk of > "/tmp" may be filled up and unexpected issues will occur in production > environment. We should provide a way to configure "semi_persist_dir" in > EnvironmentFactory at the runner side. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8131) Provide Kubernetes setup with Prometheus
[ https://issues.apache.org/jira/browse/BEAM-8131?focusedWorklogId=307779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307779 ] ASF GitHub Bot logged work on BEAM-8131: Author: ASF GitHub Bot Created on: 06/Sep/19 11:50 Start Date: 06/Sep/19 11:50 Worklog Time Spent: 10m Work Description: kamilwu commented on pull request #9482: [BEAM-8131] Provide Kubernetes setup for Prometheus URL: https://github.com/apache/beam/pull/9482#discussion_r321690505 ## File path: .test-infra/metrics/README.md ## @@ -84,23 +95,31 @@ docker-compose build # Spinup docker-compose related containers. docker-compose up Review comment: I agree. Combining these two docker-compose files is a good idea. What do you think about Kubernetes deployments (`beamgrafana-deploy.yaml` and `beamprometheus-deploy.yaml`)? It's a similar situation. Prometheus is an independent entity, but Grafana depends on Prometheus. Should we combine them too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307779) Time Spent: 1h 40m (was: 1.5h) > Provide Kubernetes setup with Prometheus > > > Key: BEAM-8131 > URL: https://issues.apache.org/jira/browse/BEAM-8131 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8100) Add exception handling to Json transforms in Java SDK
[ https://issues.apache.org/jira/browse/BEAM-8100?focusedWorklogId=307816=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307816 ] ASF GitHub Bot logged work on BEAM-8100: Author: ASF GitHub Bot Created on: 06/Sep/19 13:04 Start Date: 06/Sep/19 13:04 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #9499: [BEAM-8100] Exception handling for AsJsons and ParseJsons URL: https://github.com/apache/beam/pull/9499#issuecomment-528846455 R: @jklukas R: @reuvenlax Could you take a look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307816) Time Spent: 0.5h (was: 20m) > Add exception handling to Json transforms in Java SDK > - > > Key: BEAM-8100 > URL: https://issues.apache.org/jira/browse/BEAM-8100 > Project: Beam > Issue Type: Improvement > Components: extensions-java-json, sdk-java-core >Reporter: Jeff Klukas >Assignee: Alexey Romanenko >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > Follow-up to https://issues.apache.org/jira/browse/BEAM-5638. We have > exception handling for MapElements and FlatMapElements. It should be > straightforward to add parallel signatures to AsJsons and ParseJsons for > catching parsing errors. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-5428) Implement cross-bundle state caching.
[ https://issues.apache.org/jira/browse/BEAM-5428?focusedWorklogId=307860=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307860 ] ASF GitHub Bot logged work on BEAM-5428: Author: ASF GitHub Bot Created on: 06/Sep/19 13:38 Start Date: 06/Sep/19 13:38 Worklog Time Spent: 10m Work Description: mxm commented on issue #9418: [BEAM-5428] Implement cross-bundle user state caching in the Python SDK URL: https://github.com/apache/beam/pull/9418#issuecomment-528858139 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307860) Time Spent: 18h 10m (was: 18h) > Implement cross-bundle state caching. > - > > Key: BEAM-5428 > URL: https://issues.apache.org/jira/browse/BEAM-5428 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Robert Bradshaw >Assignee: Rakesh Kumar >Priority: Major > Time Spent: 18h 10m > Remaining Estimate: 0h > > Tech spec: > [https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m] > Relevant document: > [https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit#|https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit] > Mailing list link: > [https://lists.apache.org/thread.html/caa8d9bc6ca871d13de2c5e6ba07fdc76f85d26497d95d90893aa1f6@%3Cdev.beam.apache.org%3E] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8100) Add exception handling to Json transforms in Java SDK
[ https://issues.apache.org/jira/browse/BEAM-8100?focusedWorklogId=307815=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307815 ] ASF GitHub Bot logged work on BEAM-8100: Author: ASF GitHub Bot Created on: 06/Sep/19 13:02 Start Date: 06/Sep/19 13:02 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on pull request #9499: [BEAM-8100] Exception handling for AsJsons and ParseJsons URL: https://github.com/apache/beam/pull/9499#discussion_r321723002 ## File path: sdks/java/extensions/jackson/src/main/java/org/apache/beam/sdk/extensions/jackson/AsJsons.java ## @@ -56,6 +63,35 @@ private AsJsons(Class outputClass) { return newTransform; } + /** + * Returns a new {@link AsJsonsWithFailures} transform that catches exceptions raised while + * writing JSON elements, passing the raised exception instance and the input element being + * processed through the given {@code exceptionHandler} and emitting the result to a failure + * collection. + * + * Example usage: + * + * {@code + * WithFailures.Result, KV>> result = + * pojos.apply( + * AsJsons.of(MyPojo.class) + * .withFailures(new WithFailures.ExceptionAsMapHandler() {})); + * + * PCollection output = result.output(); // valid json elements + * PCollection>> failures = result.failures(); + * } + */ + @Experimental(Experimental.Kind.WITH_EXCEPTIONS) + public AsJsonsWithFailures withFailures( Review comment: I'm not sure that `withFailures` is a good name in this case and I couldn't find out if any agreement about that exists. So, any advices are welcomed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307815) Time Spent: 20m (was: 10m) > Add exception handling to Json transforms in Java SDK > - > > Key: BEAM-8100 > URL: https://issues.apache.org/jira/browse/BEAM-8100 > Project: Beam > Issue Type: Improvement > Components: extensions-java-json, sdk-java-core >Reporter: Jeff Klukas >Assignee: Alexey Romanenko >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Follow-up to https://issues.apache.org/jira/browse/BEAM-5638. We have > exception handling for MapElements and FlatMapElements. It should be > straightforward to add parallel signatures to AsJsons and ParseJsons for > catching parsing errors. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8100) Add exception handling to Json transforms in Java SDK
[ https://issues.apache.org/jira/browse/BEAM-8100?focusedWorklogId=307847=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307847 ] ASF GitHub Bot logged work on BEAM-8100: Author: ASF GitHub Bot Created on: 06/Sep/19 13:27 Start Date: 06/Sep/19 13:27 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #9499: [BEAM-8100] Exception handling for AsJsons and ParseJsons URL: https://github.com/apache/beam/pull/9499#issuecomment-528854072 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307847) Time Spent: 40m (was: 0.5h) > Add exception handling to Json transforms in Java SDK > - > > Key: BEAM-8100 > URL: https://issues.apache.org/jira/browse/BEAM-8100 > Project: Beam > Issue Type: Improvement > Components: extensions-java-json, sdk-java-core >Reporter: Jeff Klukas >Assignee: Alexey Romanenko >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > Follow-up to https://issues.apache.org/jira/browse/BEAM-5638. We have > exception handling for MapElements and FlatMapElements. It should be > straightforward to add parallel signatures to AsJsons and ParseJsons for > catching parsing errors. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8100) Add exception handling to Json transforms in Java SDK
[ https://issues.apache.org/jira/browse/BEAM-8100?focusedWorklogId=307856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307856 ] ASF GitHub Bot logged work on BEAM-8100: Author: ASF GitHub Bot Created on: 06/Sep/19 13:37 Start Date: 06/Sep/19 13:37 Worklog Time Spent: 10m Work Description: jklukas commented on pull request #9499: [BEAM-8100] Exception handling for AsJsons and ParseJsons URL: https://github.com/apache/beam/pull/9499#discussion_r321736061 ## File path: sdks/java/extensions/jackson/src/main/java/org/apache/beam/sdk/extensions/jackson/AsJsons.java ## @@ -56,6 +63,35 @@ private AsJsons(Class outputClass) { return newTransform; } + /** + * Returns a new {@link AsJsonsWithFailures} transform that catches exceptions raised while + * writing JSON elements, passing the raised exception instance and the input element being + * processed through the given {@code exceptionHandler} and emitting the result to a failure + * collection. + * + * Example usage: + * + * {@code + * WithFailures.Result, KV>> result = + * pojos.apply( + * AsJsons.of(MyPojo.class) + * .withFailures(new WithFailures.ExceptionAsMapHandler() {})); + * + * PCollection output = result.output(); // valid json elements + * PCollection>> failures = result.failures(); + * } + */ + @Experimental(Experimental.Kind.WITH_EXCEPTIONS) + public AsJsonsWithFailures withFailures( Review comment: And perhaps these JSON cases are constrained enough that we could provide a default exceptionHandler so that the user doesn't need to specify one. Perhaps it would catch only the exceptions we expect to be raised (`JsonProcessingException` for AsJsons and `IOException` for ParseJsons), reraising an other exceptions, and return a KV that contains the input element along with a string containing the exception message. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307856) Time Spent: 1h (was: 50m) > Add exception handling to Json transforms in Java SDK > - > > Key: BEAM-8100 > URL: https://issues.apache.org/jira/browse/BEAM-8100 > Project: Beam > Issue Type: Improvement > Components: extensions-java-json, sdk-java-core >Reporter: Jeff Klukas >Assignee: Alexey Romanenko >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > Follow-up to https://issues.apache.org/jira/browse/BEAM-5638. We have > exception handling for MapElements and FlatMapElements. It should be > straightforward to add parallel signatures to AsJsons and ParseJsons for > catching parsing errors. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8100) Add exception handling to Json transforms in Java SDK
[ https://issues.apache.org/jira/browse/BEAM-8100?focusedWorklogId=307855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307855 ] ASF GitHub Bot logged work on BEAM-8100: Author: ASF GitHub Bot Created on: 06/Sep/19 13:37 Start Date: 06/Sep/19 13:37 Worklog Time Spent: 10m Work Description: jklukas commented on pull request #9499: [BEAM-8100] Exception handling for AsJsons and ParseJsons URL: https://github.com/apache/beam/pull/9499#discussion_r321737681 ## File path: sdks/java/extensions/jackson/src/main/java/org/apache/beam/sdk/extensions/jackson/AsJsons.java ## @@ -56,6 +63,35 @@ private AsJsons(Class outputClass) { return newTransform; } + /** + * Returns a new {@link AsJsonsWithFailures} transform that catches exceptions raised while + * writing JSON elements, passing the raised exception instance and the input element being + * processed through the given {@code exceptionHandler} and emitting the result to a failure + * collection. + * + * Example usage: + * + * {@code + * WithFailures.Result, KV>> result = + * pojos.apply( + * AsJsons.of(MyPojo.class) + * .withFailures(new WithFailures.ExceptionAsMapHandler() {})); Review comment: Perhaps we should provide specific default exception handlers for AsJsons and ParseJsons that are identical to ExceptionAsMapHandler except that they catch only the exceptions we expect to be raised (`JsonProcessingException` for AsJsons and `IOException` for ParseJsons), reraising any other exceptions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307855) Time Spent: 50m (was: 40m) > Add exception handling to Json transforms in Java SDK > - > > Key: BEAM-8100 > URL: https://issues.apache.org/jira/browse/BEAM-8100 > Project: Beam > Issue Type: Improvement > Components: extensions-java-json, sdk-java-core >Reporter: Jeff Klukas >Assignee: Alexey Romanenko >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > Follow-up to https://issues.apache.org/jira/browse/BEAM-5638. We have > exception handling for MapElements and FlatMapElements. It should be > straightforward to add parallel signatures to AsJsons and ParseJsons for > catching parsing errors. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8100) Add exception handling to Json transforms in Java SDK
[ https://issues.apache.org/jira/browse/BEAM-8100?focusedWorklogId=307858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307858 ] ASF GitHub Bot logged work on BEAM-8100: Author: ASF GitHub Bot Created on: 06/Sep/19 13:37 Start Date: 06/Sep/19 13:37 Worklog Time Spent: 10m Work Description: jklukas commented on pull request #9499: [BEAM-8100] Exception handling for AsJsons and ParseJsons URL: https://github.com/apache/beam/pull/9499#discussion_r321733747 ## File path: sdks/java/extensions/jackson/src/main/java/org/apache/beam/sdk/extensions/jackson/ParseJsons.java ## @@ -55,6 +61,34 @@ private ParseJsons(Class outputClass) { return newTransform; } + /** + * Returns a new {@link ParseJsonsWithFailures} transform that catches exceptions raised while + * parsing elements, passing the raised exception instance and the input element being processed + * through the given {@code exceptionHandler} and emitting the result to a failure collection. + * + * Example usage: Review comment: ```suggestion * See {@link WithFailures} documentation for usage patterns of the returned {@link * WithFailures.Result}. * * Example usage: ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307858) Time Spent: 1h 20m (was: 1h 10m) > Add exception handling to Json transforms in Java SDK > - > > Key: BEAM-8100 > URL: https://issues.apache.org/jira/browse/BEAM-8100 > Project: Beam > Issue Type: Improvement > Components: extensions-java-json, sdk-java-core >Reporter: Jeff Klukas >Assignee: Alexey Romanenko >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > Follow-up to https://issues.apache.org/jira/browse/BEAM-5638. We have > exception handling for MapElements and FlatMapElements. It should be > straightforward to add parallel signatures to AsJsons and ParseJsons for > catching parsing errors. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8100) Add exception handling to Json transforms in Java SDK
[ https://issues.apache.org/jira/browse/BEAM-8100?focusedWorklogId=307859=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307859 ] ASF GitHub Bot logged work on BEAM-8100: Author: ASF GitHub Bot Created on: 06/Sep/19 13:37 Start Date: 06/Sep/19 13:37 Worklog Time Spent: 10m Work Description: jklukas commented on pull request #9499: [BEAM-8100] Exception handling for AsJsons and ParseJsons URL: https://github.com/apache/beam/pull/9499#discussion_r321730735 ## File path: sdks/java/extensions/jackson/src/main/java/org/apache/beam/sdk/extensions/jackson/AsJsons.java ## @@ -56,6 +63,35 @@ private AsJsons(Class outputClass) { return newTransform; } + /** + * Returns a new {@link AsJsonsWithFailures} transform that catches exceptions raised while + * writing JSON elements, passing the raised exception instance and the input element being + * processed through the given {@code exceptionHandler} and emitting the result to a failure + * collection. + * + * Example usage: + * + * {@code + * WithFailures.Result, KV>> result = + * pojos.apply( + * AsJsons.of(MyPojo.class) + * .withFailures(new WithFailures.ExceptionAsMapHandler() {})); + * + * PCollection output = result.output(); // valid json elements + * PCollection>> failures = result.failures(); + * } + */ + @Experimental(Experimental.Kind.WITH_EXCEPTIONS) + public AsJsonsWithFailures withFailures( Review comment: My bias would be towards following the API in `MapElements` as closely as possible, which would make this method `exceptionsVia` and would suggest that we should also have `exceptionsInto` and `exceptionsVia(ProcessFunction)` to allow users to define exception processing inline. I would love to see this factored out somehow so it's easier to add exception handling to different classes with a stable API and without repeating all this boilerplate, but I haven't found a suitable alternative. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307859) Time Spent: 1h 20m (was: 1h 10m) > Add exception handling to Json transforms in Java SDK > - > > Key: BEAM-8100 > URL: https://issues.apache.org/jira/browse/BEAM-8100 > Project: Beam > Issue Type: Improvement > Components: extensions-java-json, sdk-java-core >Reporter: Jeff Klukas >Assignee: Alexey Romanenko >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > Follow-up to https://issues.apache.org/jira/browse/BEAM-5638. We have > exception handling for MapElements and FlatMapElements. It should be > straightforward to add parallel signatures to AsJsons and ParseJsons for > catching parsing errors. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8100) Add exception handling to Json transforms in Java SDK
[ https://issues.apache.org/jira/browse/BEAM-8100?focusedWorklogId=307857=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307857 ] ASF GitHub Bot logged work on BEAM-8100: Author: ASF GitHub Bot Created on: 06/Sep/19 13:37 Start Date: 06/Sep/19 13:37 Worklog Time Spent: 10m Work Description: jklukas commented on pull request #9499: [BEAM-8100] Exception handling for AsJsons and ParseJsons URL: https://github.com/apache/beam/pull/9499#discussion_r321733602 ## File path: sdks/java/extensions/jackson/src/main/java/org/apache/beam/sdk/extensions/jackson/AsJsons.java ## @@ -56,6 +63,35 @@ private AsJsons(Class outputClass) { return newTransform; } + /** + * Returns a new {@link AsJsonsWithFailures} transform that catches exceptions raised while + * writing JSON elements, passing the raised exception instance and the input element being + * processed through the given {@code exceptionHandler} and emitting the result to a failure + * collection. + * + * Example usage: Review comment: ```suggestion * See {@link WithFailures} documentation for usage patterns of the returned {@link * WithFailures.Result}. * * Example usage: ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307857) Time Spent: 1h 10m (was: 1h) > Add exception handling to Json transforms in Java SDK > - > > Key: BEAM-8100 > URL: https://issues.apache.org/jira/browse/BEAM-8100 > Project: Beam > Issue Type: Improvement > Components: extensions-java-json, sdk-java-core >Reporter: Jeff Klukas >Assignee: Alexey Romanenko >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > Follow-up to https://issues.apache.org/jira/browse/BEAM-5638. We have > exception handling for MapElements and FlatMapElements. It should be > straightforward to add parallel signatures to AsJsons and ParseJsons for > catching parsing errors. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7305) Add first version of Hazelcast Jet Runner
[ https://issues.apache.org/jira/browse/BEAM-7305?focusedWorklogId=307881=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307881 ] ASF GitHub Bot logged work on BEAM-7305: Author: ASF GitHub Bot Created on: 06/Sep/19 14:16 Start Date: 06/Sep/19 14:16 Worklog Time Spent: 10m Work Description: RyanSkraba commented on pull request #9471: [BEAM-7305] Improve Jet Runner related documentation URL: https://github.com/apache/beam/pull/9471#discussion_r321749449 ## File path: website/src/get-started/wordcount-example.md ## @@ -739,6 +755,12 @@ $ mvn package -Pnemo-runner && java -cp target/word-count-beam-bundled-0.1.jar o --runner=NemoRunner --inputFile=`pwd`/pom.xml --output=counts ``` +{:.runner-jet} +``` +$ mvn package -P jet-runner && java -cp target/word-count-beam-bundled-0.1.jar org.apache.beam.examples.DebuggingWordCount \ + --runner=JetRunner --jetLocalMode=3 --inputFile=$pwd/pom.xml --output=counts Review comment: ```suggestion --runner=JetRunner --jetLocalMode=3 --output=counts ``` It looks like most of the runners are running against the default inputFile (the Shakespeare corpus) which demonstrates the PAsserts succeeding in the example. Nemo is running against `pom.xml` which demonstrates failing the PAssert in the example. I suggest following the majority! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307881) Time Spent: 11h 10m (was: 11h) > Add first version of Hazelcast Jet Runner > - > > Key: BEAM-7305 > URL: https://issues.apache.org/jira/browse/BEAM-7305 > Project: Beam > Issue Type: New Feature > Components: runner-jet >Reporter: Maximilian Michels >Assignee: Jozsef Bartok >Priority: Major > Fix For: 2.14.0 > > Time Spent: 11h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7305) Add first version of Hazelcast Jet Runner
[ https://issues.apache.org/jira/browse/BEAM-7305?focusedWorklogId=307880=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307880 ] ASF GitHub Bot logged work on BEAM-7305: Author: ASF GitHub Bot Created on: 06/Sep/19 14:16 Start Date: 06/Sep/19 14:16 Worklog Time Spent: 10m Work Description: RyanSkraba commented on pull request #9471: [BEAM-7305] Improve Jet Runner related documentation URL: https://github.com/apache/beam/pull/9471#discussion_r321748337 ## File path: website/src/get-started/wordcount-example.md ## @@ -390,6 +390,12 @@ $ mvn package -Pnemo-runner && java -cp target/word-count-beam-bundled-0.1.jar o --runner=NemoRunner --inputFile=`pwd`/pom.xml --output=counts ``` +{:.runner-jet} +``` +$ mvn package -P jet-runner && java -cp target/word-count-beam-bundled-0.1.jar org.apache.beam.examples.WordCount \ + --runner=JetRunner --jetLocalMode=3 --inputFile=$pwd/pom.xml --output=counts Review comment: ```suggestion --runner=JetRunner --jetLocalMode=3 --inputFile=`pwd`/pom.xml --output=counts ``` Not a big deal -- `$pwd` doesn't work for my shell (unless I set it of course), and `` `pwd` `` is consistent with the others. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307880) Time Spent: 11h 10m (was: 11h) > Add first version of Hazelcast Jet Runner > - > > Key: BEAM-7305 > URL: https://issues.apache.org/jira/browse/BEAM-7305 > Project: Beam > Issue Type: New Feature > Components: runner-jet >Reporter: Maximilian Michels >Assignee: Jozsef Bartok >Priority: Major > Fix For: 2.14.0 > > Time Spent: 11h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7305) Add first version of Hazelcast Jet Runner
[ https://issues.apache.org/jira/browse/BEAM-7305?focusedWorklogId=307882=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307882 ] ASF GitHub Bot logged work on BEAM-7305: Author: ASF GitHub Bot Created on: 06/Sep/19 14:16 Start Date: 06/Sep/19 14:16 Worklog Time Spent: 10m Work Description: RyanSkraba commented on pull request #9471: [BEAM-7305] Improve Jet Runner related documentation URL: https://github.com/apache/beam/pull/9471#discussion_r321749773 ## File path: website/src/get-started/wordcount-example.md ## @@ -1088,6 +1120,12 @@ $ mvn package -Pnemo-runner && java -cp target/word-count-beam-bundled-0.1.jar o --runner=NemoRunner --inputFile=`pwd`/pom.xml --output=counts ``` +{:.runner-jet} +``` +$ mvn package -P jet-runner && java -cp target/word-count-beam-bundled-0.1.jar org.apache.beam.examples.WindowedWordCount \ + --runner=JetRunner --jetLocalMode=3 --inputFile=$pwd/pom.xml --output=counts Review comment: ```suggestion --runner=JetRunner --jetLocalMode=3 --inputFile=`pwd`/pom.xml --output=counts ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307882) Time Spent: 11h 10m (was: 11h) > Add first version of Hazelcast Jet Runner > - > > Key: BEAM-7305 > URL: https://issues.apache.org/jira/browse/BEAM-7305 > Project: Beam > Issue Type: New Feature > Components: runner-jet >Reporter: Maximilian Michels >Assignee: Jozsef Bartok >Priority: Major > Fix For: 2.14.0 > > Time Spent: 11h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads
[ https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=307845=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307845 ] ASF GitHub Bot logged work on BEAM-8151: Author: ASF GitHub Bot Created on: 06/Sep/19 13:25 Start Date: 06/Sep/19 13:25 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #9477: [BEAM-8151, BEAM-7848] Up the max number of threads inside the SDK harness to a default of 10k URL: https://github.com/apache/beam/pull/9477#issuecomment-528853628 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307845) Time Spent: 4h 50m (was: 4h 40m) > Allow the Python SDK to use many many threads > - > > Key: BEAM-8151 > URL: https://issues.apache.org/jira/browse/BEAM-8151 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > We need to use a thread pool which shrinks the number of active threads when > they are not being used. > > This is to prevent any stuckness issues related to a runner scheduling more > work items then there are "work" threads inside the SDK harness. > > By default the control plane should have all "requests" being processed in > parallel and the runner is responsible for not overloading the SDK with too > much work. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8157) Flink state requests return wrong state in timers when encoded key is length-prefixed
[ https://issues.apache.org/jira/browse/BEAM-8157?focusedWorklogId=307854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307854 ] ASF GitHub Bot logged work on BEAM-8157: Author: ASF GitHub Bot Created on: 06/Sep/19 13:34 Start Date: 06/Sep/19 13:34 Worklog Time Spent: 10m Work Description: mxm commented on pull request #9484: [BEAM-8157] Remove length prefix from state key for Flink's state backend URL: https://github.com/apache/beam/pull/9484#discussion_r321737041 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java ## @@ -343,11 +343,11 @@ public void clear(K key, W window) { } private void prepareStateBackend(K key) { - // Key for state request is shipped already encoded as ByteString, - // this is mostly a wrapping with ByteBuffer. We still follow the - // usual key encoding procedure. - // final ByteBuffer encodedKey = FlinkKeyUtils.encodeKey(key, keyCoder); - final ByteBuffer encodedKey = ByteBuffer.wrap(key.toByteArray()); + // Key for state request is shipped already encoded as ByteString, but it is Review comment: Here is why this works for Flink <= 1.8 but definitely needs to be fixed moving forward: Keys for state requests are encoded incorrectly for the Flink Runner's state backend because they may contain a length prefix, which is not in lines with how the key is encoded for data partitioning. This causes state to be written to the wrong key group. However, the encoding scheme is consistent, so reading from state in `ProcessElement` or `OnTimer` methods works correctly. Also when inter-playing with timers. However, there are potential issues with checkpoints/savepoints due to this. In Flink 1.9, the key encoding mismatch is visible because reading state from a key not part of the partition _can_ return a `null` value. That is, not always, likely due to the state compaction logic. This explains why `PortableStateExecutionTest` works correctly but `PortableTimersExecutionTest` does not. That said, I think we should merge this fix ASAP for the upcoming release. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307854) Time Spent: 2h 10m (was: 2h) > Flink state requests return wrong state in timers when encoded key is > length-prefixed > - > > Key: BEAM-8157 > URL: https://issues.apache.org/jira/browse/BEAM-8157 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.13.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.16.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Due to multiple changes made in BEAM-7126, the Flink internal key encoding is > broken when the key is encoded with a length prefix. The Flink runner > requires the internal key to be encoded without a length prefix. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8162) NPE error when add flink 1.9 runner
[ https://issues.apache.org/jira/browse/BEAM-8162?focusedWorklogId=308025=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308025 ] ASF GitHub Bot logged work on BEAM-8162: Author: ASF GitHub Bot Created on: 06/Sep/19 17:19 Start Date: 06/Sep/19 17:19 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on issue #9464: [BEAM-8162] Encode keys as NESTED for flink keyselector URL: https://github.com/apache/beam/pull/9464#issuecomment-528938882 Hi @mxm @tweise I think you are much familiar with this code. So I just share my thoughts, and looking forward your decision :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308025) Time Spent: 50m (was: 40m) > NPE error when add flink 1.9 runner > --- > > Key: BEAM-8162 > URL: https://issues.apache.org/jira/browse/BEAM-8162 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > When add flink 1.9 runner in https://github.com/apache/beam/pull/9296, we > find an NPE error when run the `PortableTimersExecutionTest`. > the detail can be found here: > https://github.com/apache/beam/pull/9296#issuecomment-525262607 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8157) Flink state requests return wrong state in timers when encoded key is length-prefixed
[ https://issues.apache.org/jira/browse/BEAM-8157?focusedWorklogId=308077=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308077 ] ASF GitHub Bot logged work on BEAM-8157: Author: ASF GitHub Bot Created on: 06/Sep/19 18:20 Start Date: 06/Sep/19 18:20 Worklog Time Spent: 10m Work Description: tweise commented on issue #9484: [BEAM-8157] Remove length prefix from state key for Flink's state backend URL: https://github.com/apache/beam/pull/9484#issuecomment-528961315 Unfortunately with this change the test pipeline fails: ``` RuntimeError: java.lang.RuntimeException: Failed to remove nested context from key: XX at org.apache.beam.runners.flink.translation.wrappers.streaming.FlinkKeyUtils.removeNestedContext(FlinkKeyUtils.java:85) at org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator$BagUserStateFactory$1.prepareStateBackend(ExecutableStageDoFnOperator.java:370) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308077) Time Spent: 2.5h (was: 2h 20m) > Flink state requests return wrong state in timers when encoded key is > length-prefixed > - > > Key: BEAM-8157 > URL: https://issues.apache.org/jira/browse/BEAM-8157 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.13.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.16.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Due to multiple changes made in BEAM-7126, the Flink internal key encoding is > broken when the key is encoded with a length prefix. The Flink runner > requires the internal key to be encoded without a length prefix. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar
[ https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=308093=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308093 ] ASF GitHub Bot logged work on BEAM-7967: Author: ASF GitHub Bot Created on: 06/Sep/19 19:03 Start Date: 06/Sep/19 19:03 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9408: [BEAM-7967] Execute portable Flink application jar URL: https://github.com/apache/beam/pull/9408#issuecomment-528976497 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308093) Time Spent: 3.5h (was: 3h 20m) > Execute portable Flink application jar > -- > > Key: BEAM-7967 > URL: https://issues.apache.org/jira/browse/BEAM-7967 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 3.5h > Remaining Estimate: 0h > > [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar
[ https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=308092=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308092 ] ASF GitHub Bot logged work on BEAM-7967: Author: ASF GitHub Bot Created on: 06/Sep/19 19:03 Start Date: 06/Sep/19 19:03 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9408: [BEAM-7967] Execute portable Flink application jar URL: https://github.com/apache/beam/pull/9408#issuecomment-528976408 Run PortableJar_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308092) Time Spent: 3h 20m (was: 3h 10m) > Execute portable Flink application jar > -- > > Key: BEAM-7967 > URL: https://issues.apache.org/jira/browse/BEAM-7967 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 3h 20m > Remaining Estimate: 0h > > [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=308121=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308121 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 06/Sep/19 19:51 Start Date: 06/Sep/19 19:51 Worklog Time Spent: 10m Work Description: davidcavazos commented on pull request #9503: [BEAM-7389] Add code examples for ParDo page URL: https://github.com/apache/beam/pull/9503 Adding code samples for the ParDo page. R: @rosetn [website] R: @aaltay [code/approval] Can you take a look at this whenever you have a chance? Thanks! Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
[jira] [Created] (BEAM-8166) Support Graceful shutdown of worker harness.
Robert Burke created BEAM-8166: -- Summary: Support Graceful shutdown of worker harness. Key: BEAM-8166 URL: https://issues.apache.org/jira/browse/BEAM-8166 Project: Beam Issue Type: Improvement Components: runner-core, sdk-go Reporter: Robert Burke Ideally there should be a clear Shutdown control RPC a runner can send a worker harness to trigger an orderly shutdown. Absent that, errors on the runner side shouldn't manifest as SDK worker harness errors. SDKs should log, and gracefully shutdown from GRPC errors. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7969) Streaming Dataflow worker doesn't report FnAPI metrics.
[ https://issues.apache.org/jira/browse/BEAM-7969?focusedWorklogId=308126=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308126 ] ASF GitHub Bot logged work on BEAM-7969: Author: ASF GitHub Bot Created on: 06/Sep/19 20:10 Start Date: 06/Sep/19 20:10 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #9494: [BEAM-7969] Fix doublecount on GRPC PCollections in streaming jobs. URL: https://github.com/apache/beam/pull/9494#issuecomment-528996585 Run Python Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308126) Time Spent: 6h 20m (was: 6h 10m) > Streaming Dataflow worker doesn't report FnAPI metrics. > --- > > Key: BEAM-7969 > URL: https://issues.apache.org/jira/browse/BEAM-7969 > Project: Beam > Issue Type: Bug > Components: java-fn-execution, runner-dataflow >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 6h 20m > Remaining Estimate: 0h > > EOM -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8166) Support Graceful shutdown of worker harness.
[ https://issues.apache.org/jira/browse/BEAM-8166?focusedWorklogId=308148=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308148 ] ASF GitHub Bot logged work on BEAM-8166: Author: ASF GitHub Bot Created on: 06/Sep/19 20:46 Start Date: 06/Sep/19 20:46 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #9504: [BEAM-8166] Allow in process workers to exit without killing the process. URL: https://github.com/apache/beam/pull/9504 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308148) Time Spent: 50m (was: 40m) > Support Graceful shutdown of worker harness. > > > Key: BEAM-8166 > URL: https://issues.apache.org/jira/browse/BEAM-8166 > Project: Beam > Issue Type: Improvement > Components: runner-core, sdk-go >Reporter: Robert Burke >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > Ideally there should be a clear Shutdown control RPC a runner can send a > worker harness to trigger an orderly shutdown. > Absent that, errors on the runner side shouldn't manifest as SDK worker > harness errors. SDKs should log, and gracefully shutdown from GRPC errors. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Closed] (BEAM-7225) [beam_PostCommit_Py_ValCont ] [test_metrics_fnapi_it] Unable to match metrics for matcher namespace
[ https://issues.apache.org/jira/browse/BEAM-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira closed BEAM-7225. - Fix Version/s: Not applicable Resolution: Cannot Reproduce Looks like this bug is obsolete now. Failure hasn't been popping up. > [beam_PostCommit_Py_ValCont ] [test_metrics_fnapi_it] Unable to match metrics > for matcher namespace > > > Key: BEAM-7225 > URL: https://issues.apache.org/jira/browse/BEAM-7225 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Ahmet Altay >Assignee: Alex Amato >Priority: Major > Fix For: Not applicable > > > https://builds.apache.org/job/beam_PostCommit_Py_ValCont/3115/console > 13:19:41 test_metrics_fnapi_it > (apache_beam.runners.dataflow.dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest) > ... FAIL > 13:19:41 > 13:19:41 > == > 13:19:41 FAIL: test_metrics_fnapi_it > (apache_beam.runners.dataflow.dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest) > 13:19:41 > -- > 13:19:41 Traceback (most recent call last): > 13:19:41 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_ValCont/src/sdks/python/apache_beam/runners/dataflow/dataflow_exercise_metrics_pipeline_test.py", > line 70, in test_metrics_fnapi_it > 13:19:41 self.assertFalse(errors, str(errors)) > 13:19:41 AssertionError: Unable to match metrics for matcher namespace: > 'apache_beam.runners.dataflow.dataflow_exercise_metrics_pipeline.UserMetricsDoFn' > name: 'total_values' step: 'metrics' attempted: <100> committed: <100> > 13:19:41 Actual MetricResults: -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8105) Add container publishing instruction to release manual
[ https://issues.apache.org/jira/browse/BEAM-8105?focusedWorklogId=308218=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308218 ] ASF GitHub Bot logged work on BEAM-8105: Author: ASF GitHub Bot Created on: 06/Sep/19 23:23 Start Date: 06/Sep/19 23:23 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on issue #9510: [WIP][BEAM-8105] update release guide with docker images URL: https://github.com/apache/beam/pull/9510#issuecomment-529044758 Though it's WIP, I would like to give you a heads up about PR size. soyrice This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308218) Time Spent: 20m (was: 10m) > Add container publishing instruction to release manual > -- > > Key: BEAM-8105 > URL: https://issues.apache.org/jira/browse/BEAM-8105 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.16.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-7970) Regenerate Go SDK proto files in correct version
[ https://issues.apache.org/jira/browse/BEAM-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924656#comment-16924656 ] Daniel Oliveira commented on BEAM-7970: --- >From my PR attempt: {quote}Closing because it turns out attempting to regen protos with an updated version of the proto compiler also requires updating a bunch of the Go SDK's dependencies, which is a huge can of worms. And it's mostly unnecessary anyway. The real way to avoid issues is to use an older version of the proto compiler, so the change that really needs to be made is updating the documentation to mention which version to use. {quote} So I'll be making a PR modifying the message instead and closing this for now. > Regenerate Go SDK proto files in correct version > > > Key: BEAM-7970 > URL: https://issues.apache.org/jira/browse/BEAM-7970 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Generated proto files in the Go SDK currently include this bit: > {{// This is a compile-time assertion to ensure that this generated file}} > {{// is compatible with the proto package it is being compiled against.}} > {{// A compilation error at this line likely means your copy of the}} > {{// proto package needs to be updated.}} > {{const _ = proto.ProtoPackageIsVersion2 // please upgrade the proto package}} > > This indicates that the protos are being generated as proto v2 for whatever > reason. Most likely, as mentioned by this post with someone with a similar > issue, because the proto generation binary needs to be rebuilt before > generating the files again: > [https://github.com/golang/protobuf/issues/449#issuecomment-340884839] > This hasn't caused any errors so far, but might eventually cause errors if we > hit version differences between the v2 and v3 protos. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8164) Correct document for building the python SDK harness container
sunjincheng created BEAM-8164: - Summary: Correct document for building the python SDK harness container Key: BEAM-8164 URL: https://issues.apache.org/jira/browse/BEAM-8164 Project: Beam Issue Type: Bug Components: website Reporter: sunjincheng Assignee: sunjincheng In the runner document, it is described that we can use the command: `./gradlew :sdks:python:container:docker` to Build the SDK harness container, see ([https://beam.apache.org/documentation/runners/flink/)]. However, the docker config has been removed with the latest python3 docker related commit [1] the command would failed with the following error message. {code:java} > Task :sdks:python:container:docker FAILED FAILURE: Build failed with an exception. * What went wrong: Execution failed for task ':sdks:python:container:docker'. > name is a required docker configuration item.{code} I think we should also adapt the document with command: `./gradlew :sdks:python:container:py2:docker`? [1] [https://github.com/apache/beam/commit/47feeafb21023e2a60ae51737cc4000a2033719c#diff-1bc5883bcfcc9e883ab7df09e4dcddb0L63] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8162) NPE error when add flink 1.9 runner
[ https://issues.apache.org/jira/browse/BEAM-8162?focusedWorklogId=308096=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308096 ] ASF GitHub Bot logged work on BEAM-8162: Author: ASF GitHub Bot Created on: 06/Sep/19 19:05 Start Date: 06/Sep/19 19:05 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on issue #9464: [BEAM-8162] Encode keys as NESTED for flink keyselector URL: https://github.com/apache/beam/pull/9464#issuecomment-528977118 > > I can see you already do more effort for that PR. > > I don't consider a response time of one day to be long. Please keep in mind that people may be occupied with other tasks. Sorry, maybe I didn't express my meaning correctly. I meant that, we already do our best to fix it. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308096) Time Spent: 1h (was: 50m) > NPE error when add flink 1.9 runner > --- > > Key: BEAM-8162 > URL: https://issues.apache.org/jira/browse/BEAM-8162 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > When add flink 1.9 runner in https://github.com/apache/beam/pull/9296, we > find an NPE error when run the `PortableTimersExecutionTest`. > the detail can be found here: > https://github.com/apache/beam/pull/9296#issuecomment-525262607 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (BEAM-3645) Support multi-process execution on the FnApiRunner
[ https://issues.apache.org/jira/browse/BEAM-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924585#comment-16924585 ] Yifan Mai edited comment on BEAM-3645 at 9/6/19 8:54 PM: - While testing this I noticed that the multi-process runner does not handle SIGINT gracefully. I added the repro steps to BEAM-8149. was (Author: myffi...@gmail.com): While testing this I noticed that the multi-process runner does not handle SIGINT gracefully. To reproduce, run wordcount.py using the "Run with multiprocessing mode" instructions from the comment above (in Python 3). Expected: wordcount terminates gracefully when Ctrl-C is pressed during pipeline execution (similarly to default direct runner) Actual: wordcount hangs forever after printing the following once per worker: {code} Exception in thread run_worker: Traceback (most recent call last): File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/usr/local/google/home/yifanmai/venv/wordcount/lib/python3.6/site-packages/apache_beam/runners/portability/local_job_service.py", line 216, in run 'Worker subprocess exited with return code %s' % p.returncode) RuntimeError: Worker subprocess exited with return code 1 {code} > Support multi-process execution on the FnApiRunner > -- > > Key: BEAM-3645 > URL: https://issues.apache.org/jira/browse/BEAM-3645 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Affects Versions: 2.2.0, 2.3.0 >Reporter: Charles Chen >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.15.0 > > Time Spent: 35h 20m > Remaining Estimate: 0h > > https://issues.apache.org/jira/browse/BEAM-3644 gave us a 15x performance > gain over the previous DirectRunner. We can do even better in multi-core > environments by supporting multi-process execution in the FnApiRunner, to > scale past Python GIL limitations. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7969) Streaming Dataflow worker doesn't report FnAPI metrics.
[ https://issues.apache.org/jira/browse/BEAM-7969?focusedWorklogId=308156=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308156 ] ASF GitHub Bot logged work on BEAM-7969: Author: ASF GitHub Bot Created on: 06/Sep/19 20:54 Start Date: 06/Sep/19 20:54 Worklog Time Spent: 10m Work Description: Ardagan commented on pull request #9494: [BEAM-7969] Fix doublecount on GRPC PCollections in streaming jobs. URL: https://github.com/apache/beam/pull/9494#discussion_r321906748 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java ## @@ -400,13 +409,19 @@ void updateProgress() { * @param monitoringInfos Usually received from FnApi. */ private void updateMetrics(List monitoringInfos) { + List monitoringInfosCopy = new ArrayList<>(monitoringInfos); + + List misToFilter = + bundleProcessOperation.findIOPCollectionMonitoringInfos(monitoringInfos); Review comment: Synced offline. I'll try to elaborate more on this. When we modify graph for cross-boundary grpc operations, we give two different PCollections same metadata. As a result we report metrics correctly from SDK harness. Both of these PCollections generate metrics that show as ElementCount metric on UI. At this point, the shortest way to fix the issue is to utilize deduping on runner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308156) Time Spent: 6h 50m (was: 6h 40m) > Streaming Dataflow worker doesn't report FnAPI metrics. > --- > > Key: BEAM-7969 > URL: https://issues.apache.org/jira/browse/BEAM-7969 > Project: Beam > Issue Type: Bug > Components: java-fn-execution, runner-dataflow >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 6h 50m > Remaining Estimate: 0h > > EOM -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8149) [FnApiRunner]multi-process runner does not terminate cleanly upon receiving SIGINT
[ https://issues.apache.org/jira/browse/BEAM-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Mai updated BEAM-8149: Description: The multi-process runner does not handle SIGINT gracefully. To reproduce, run wordcount.py using the "Run with multiprocessing mode" instructions from the first comment in BEAM-3645 (in Python 3). Expected: wordcount terminates gracefully when Ctrl-C is pressed during pipeline execution (similarly to default direct runner) Actual: wordcount hangs forever after printing the following once per worker: {code} Exception in thread run_worker: Traceback (most recent call last): File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/usr/local/google/home/yifanmai/venv/wordcount/lib/python3.6/site-packages/apache_beam/runners/portability/local_job_service.py", line 216, in run 'Worker subprocess exited with return code %s' % p.returncode) RuntimeError: Worker subprocess exited with return code 1 {code} was: The multi-process runner does not handle SIGINT gracefully. To reproduce, run wordcount.py using the "Run with multiprocessing mode" instructions from the comment above (in Python 3). Expected: wordcount terminates gracefully when Ctrl-C is pressed during pipeline execution (similarly to default direct runner) Actual: wordcount hangs forever after printing the following once per worker: {code} Exception in thread run_worker: Traceback (most recent call last): File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/usr/local/google/home/yifanmai/venv/wordcount/lib/python3.6/site-packages/apache_beam/runners/portability/local_job_service.py", line 216, in run 'Worker subprocess exited with return code %s' % p.returncode) RuntimeError: Worker subprocess exited with return code 1 {code} > [FnApiRunner]multi-process runner does not terminate cleanly upon receiving > SIGINT > -- > > Key: BEAM-8149 > URL: https://issues.apache.org/jira/browse/BEAM-8149 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.15.0 >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > > The multi-process runner does not handle SIGINT gracefully. To reproduce, run > wordcount.py using the "Run with multiprocessing mode" instructions from the > first comment in BEAM-3645 (in Python 3). > Expected: wordcount terminates gracefully when Ctrl-C is pressed during > pipeline execution (similarly to default direct runner) > Actual: wordcount hangs forever after printing the following once per worker: > {code} > Exception in thread run_worker: > Traceback (most recent call last): > File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner > self.run() > File "/usr/lib/python3.6/threading.py", line 864, in run > self._target(*self._args, **self._kwargs) > File > "/usr/local/google/home/yifanmai/venv/wordcount/lib/python3.6/site-packages/apache_beam/runners/portability/local_job_service.py", > line 216, in run > 'Worker subprocess exited with return code %s' % p.returncode) > RuntimeError: Worker subprocess exited with return code 1 > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-6185) Upgrade Spark to version 2.4.0
[ https://issues.apache.org/jira/browse/BEAM-6185?focusedWorklogId=308173=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308173 ] ASF GitHub Bot logged work on BEAM-6185: Author: ASF GitHub Bot Created on: 06/Sep/19 21:28 Start Date: 06/Sep/19 21:28 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on issue #9487: [WIP] [BEAM-6185]Change default docker images name URL: https://github.com/apache/beam/pull/9487#issuecomment-529020384 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308173) Time Spent: 2h 10m (was: 2h) > Upgrade Spark to version 2.4.0 > -- > > Key: BEAM-6185 > URL: https://issues.apache.org/jira/browse/BEAM-6185 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré >Priority: Major > Fix For: 2.12.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-5820) Vendor Calcite
[ https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=308185=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308185 ] ASF GitHub Bot logged work on BEAM-5820: Author: ASF GitHub Bot Created on: 06/Sep/19 21:50 Start Date: 06/Sep/19 21:50 Worklog Time Spent: 10m Work Description: vectorijk commented on issue #9189: [BEAM-5820] vendor calcite URL: https://github.com/apache/beam/pull/9189#issuecomment-516307123 @kennknowles @iemejia pass unit test locally This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308185) Time Spent: 6h (was: 5h 50m) > Vendor Calcite > -- > > Key: BEAM-5820 > URL: https://issues.apache.org/jira/browse/BEAM-5820 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kai Jiang >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8105) Add container publishing instruction to release manual
[ https://issues.apache.org/jira/browse/BEAM-8105?focusedWorklogId=308216=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308216 ] ASF GitHub Bot logged work on BEAM-8105: Author: ASF GitHub Bot Created on: 06/Sep/19 23:17 Start Date: 06/Sep/19 23:17 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #9510: [WIP][BEAM-8105] update release guide with docker images URL: https://github.com/apache/beam/pull/9510 This PR is adding release procedure for docker images. Publishing docker images as part of release is tackled with following three PRs. Change default image name. (#9487 ) Add staging and publishing scripts for docker images to release procedure. (#9506) Updating release-guide.md (current PR) Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
[jira] [Created] (BEAM-8165) Change image name
Hannah Jiang created BEAM-8165: -- Summary: Change image name Key: BEAM-8165 URL: https://issues.apache.org/jira/browse/BEAM-8165 Project: Beam Issue Type: Sub-task Components: sdk-go, sdk-java-harness, sdk-py-harness Reporter: Hannah Jiang Assignee: Hannah Jiang change name to apachebeam/\{lang}{ver}_sdk -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8165) Change image name
[ https://issues.apache.org/jira/browse/BEAM-8165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hannah Jiang updated BEAM-8165: --- Status: Open (was: Triage Needed) > Change image name > - > > Key: BEAM-8165 > URL: https://issues.apache.org/jira/browse/BEAM-8165 > Project: Beam > Issue Type: Sub-task > Components: sdk-go, sdk-java-harness, sdk-py-harness >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > > change name to apachebeam/\{lang}{ver}_sdk -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8149) [FnApiRunner]multi-process runner does not terminate cleanly upon receiving SIGINT
[ https://issues.apache.org/jira/browse/BEAM-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Mai updated BEAM-8149: Description: The multi-process runner does not handle SIGINT gracefully. To reproduce, run wordcount.py using the "Run with multiprocessing mode" instructions from the comment above (in Python 3). Expected: wordcount terminates gracefully when Ctrl-C is pressed during pipeline execution (similarly to default direct runner) Actual: wordcount hangs forever after printing the following once per worker: {code} Exception in thread run_worker: Traceback (most recent call last): File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/usr/local/google/home/yifanmai/venv/wordcount/lib/python3.6/site-packages/apache_beam/runners/portability/local_job_service.py", line 216, in run 'Worker subprocess exited with return code %s' % p.returncode) RuntimeError: Worker subprocess exited with return code 1 {code} > [FnApiRunner]multi-process runner does not terminate cleanly upon receiving > SIGINT > -- > > Key: BEAM-8149 > URL: https://issues.apache.org/jira/browse/BEAM-8149 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.15.0 >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > > The multi-process runner does not handle SIGINT gracefully. To reproduce, run > wordcount.py using the "Run with multiprocessing mode" instructions from the > comment above (in Python 3). > Expected: wordcount terminates gracefully when Ctrl-C is pressed during > pipeline execution (similarly to default direct runner) > Actual: wordcount hangs forever after printing the following once per worker: > {code} > Exception in thread run_worker: > Traceback (most recent call last): > File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner > self.run() > File "/usr/lib/python3.6/threading.py", line 864, in run > self._target(*self._args, **self._kwargs) > File > "/usr/local/google/home/yifanmai/venv/wordcount/lib/python3.6/site-packages/apache_beam/runners/portability/local_job_service.py", > line 216, in run > 'Worker subprocess exited with return code %s' % p.returncode) > RuntimeError: Worker subprocess exited with return code 1 > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-3645) Support multi-process execution on the FnApiRunner
[ https://issues.apache.org/jira/browse/BEAM-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924585#comment-16924585 ] Yifan Mai commented on BEAM-3645: - While testing this I noticed that the multi-process runner does not handle SIGINT gracefully. To reproduce, run wordcount.py using the "Run with multiprocessing mode" instructions from the comment above (in Python 3). Expected: wordcount terminates gracefully when Ctrl-C is pressed during pipeline execution (similarly to default direct runner) Actual: wordcount hangs forever after printing the following once per worker: {code} Exception in thread run_worker: Traceback (most recent call last): File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/usr/local/google/home/yifanmai/venv/wordcount/lib/python3.6/site-packages/apache_beam/runners/portability/local_job_service.py", line 216, in run 'Worker subprocess exited with return code %s' % p.returncode) RuntimeError: Worker subprocess exited with return code 1 {code} > Support multi-process execution on the FnApiRunner > -- > > Key: BEAM-3645 > URL: https://issues.apache.org/jira/browse/BEAM-3645 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Affects Versions: 2.2.0, 2.3.0 >Reporter: Charles Chen >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.15.0 > > Time Spent: 35h 20m > Remaining Estimate: 0h > > https://issues.apache.org/jira/browse/BEAM-3644 gave us a 15x performance > gain over the previous DirectRunner. We can do even better in multi-core > environments by supporting multi-process execution in the FnApiRunner, to > scale past Python GIL limitations. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7969) Streaming Dataflow worker doesn't report FnAPI metrics.
[ https://issues.apache.org/jira/browse/BEAM-7969?focusedWorklogId=308192=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308192 ] ASF GitHub Bot logged work on BEAM-7969: Author: ASF GitHub Bot Created on: 06/Sep/19 22:10 Start Date: 06/Sep/19 22:10 Worklog Time Spent: 10m Work Description: Ardagan commented on pull request #9494: [BEAM-7969] Fix doublecount on GRPC PCollections in streaming jobs. URL: https://github.com/apache/beam/pull/9494 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308192) Time Spent: 7h (was: 6h 50m) > Streaming Dataflow worker doesn't report FnAPI metrics. > --- > > Key: BEAM-7969 > URL: https://issues.apache.org/jira/browse/BEAM-7969 > Project: Beam > Issue Type: Bug > Components: java-fn-execution, runner-dataflow >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > > EOM -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable IO connector for Python
[ https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=308032=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308032 ] ASF GitHub Bot logged work on BEAM-3342: Author: ASF GitHub Bot Created on: 06/Sep/19 17:24 Start Date: 06/Sep/19 17:24 Worklog Time Spent: 10m Work Description: sduskis commented on pull request #8457: [BEAM-3342] Create a Cloud Bigtable IO connector for Python URL: https://github.com/apache/beam/pull/8457#discussion_r321834329 ## File path: sdks/python/apache_beam/io/gcp/bigtableio.py ## @@ -122,22 +126,145 @@ class WriteToBigTable(beam.PTransform): A PTransform that write a list of `DirectRow` into the Bigtable Table """ - def __init__(self, project_id=None, instance_id=None, - table_id=None): + def __init__(self, project_id=None, instance_id=None, table_id=None): """ The PTransform to access the Bigtable Write connector Args: project_id(str): GCP Project of to write the Rows instance_id(str): GCP Instance to write the Rows table_id(str): GCP Table to write the `DirectRows` """ super(WriteToBigTable, self).__init__() -self.beam_options = {'project_id': project_id, - 'instance_id': instance_id, - 'table_id': table_id} +self._beam_options = {'project_id': project_id, + 'instance_id': instance_id, + 'table_id': table_id} def expand(self, pvalue): -beam_options = self.beam_options +beam_options = self._beam_options return (pvalue | beam.ParDo(_BigTableWriteFn(beam_options['project_id'], beam_options['instance_id'], beam_options['table_id']))) + + +class _BigtableReadFn(beam.DoFn): + """ Creates the connector that can read rows for Beam pipeline + + Args: +project_id(str): GCP Project ID +instance_id(str): GCP Instance ID +table_id(str): GCP Table ID + + """ + + def __init__(self, project_id, instance_id, table_id, filter_=b''): +""" Constructor of the Read connector of Bigtable + +Args: + project_id: [str] GCP Project of to write the Rows + instance_id: [str] GCP Instance to write the Rows + table_id: [str] GCP Table to write the `DirectRows` + filter_: [RowFilter] Filter to apply to columns in a row. +""" +super(self.__class__, self).__init__() +self._initialize({'project_id': project_id, + 'instance_id': instance_id, + 'table_id': table_id, + 'filter_': filter_}) + + def __getstate__(self): +return self._beam_options + + def __setstate__(self, options): +self._initialize(options) + + def _initialize(self, options): +self._beam_options = options +self.table = None +self.sample_row_keys = None +self.row_count = Metrics.counter(self.__class__.__name__, 'Rows read') + + def start_bundle(self): +if self.table is None: + self.table = Client(project=self._beam_options['project_id'])\ +.instance(self._beam_options['instance_id'])\ +.table(self._beam_options['table_id']) + + def process(self, element, **kwargs): +for row in self.table.read_rows(start_key=element.start_position, +end_key=element.end_position, +filter_=self._beam_options['filter_']): + self.row_count.inc() + yield row + + def display_data(self): +return {'projectId': DisplayDataItem(self._beam_options['project_id'], + label='Bigtable Project Id'), +'instanceId': DisplayDataItem(self._beam_options['instance_id'], + label='Bigtable Instance Id'), +'tableId': DisplayDataItem(self._beam_options['table_id'], + label='Bigtable Table Id'), +'filter_': DisplayDataItem(str(self._beam_options['filter_']), + label='Bigtable Filter') + } + + +class ReadFromBigTable(beam.PTransform): + def __init__(self, project_id, instance_id, table_id, filter_=b''): +""" The PTransform to access the Bigtable Read connector + +Args: + project_id: [str] GCP Project of to read the Rows + instance_id): [str] GCP Instance to read the Rows + table_id): [str] GCP Table to read the Rows + filter_: [RowFilter] Filter to apply to columns in a row. +""" +super(self.__class__, self).__init__() +self._beam_options = {'project_id': project_id, + 'instance_id': instance_id, + 'table_id': table_id, + 'filter_': filter_} + + def
[jira] [Work logged] (BEAM-6185) Upgrade Spark to version 2.4.0
[ https://issues.apache.org/jira/browse/BEAM-6185?focusedWorklogId=308109=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308109 ] ASF GitHub Bot logged work on BEAM-6185: Author: ASF GitHub Bot Created on: 06/Sep/19 19:20 Start Date: 06/Sep/19 19:20 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on issue #9487: [WIP] [BEAM-6185]Change default docker images name URL: https://github.com/apache/beam/pull/9487#issuecomment-528982038 Run Python Dataflow ValidatesContainer This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308109) Time Spent: 2h (was: 1h 50m) > Upgrade Spark to version 2.4.0 > -- > > Key: BEAM-6185 > URL: https://issues.apache.org/jira/browse/BEAM-6185 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré >Priority: Major > Fix For: 2.12.0 > > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8166) Support Graceful shutdown of worker harness.
[ https://issues.apache.org/jira/browse/BEAM-8166?focusedWorklogId=308131=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308131 ] ASF GitHub Bot logged work on BEAM-8166: Author: ASF GitHub Bot Created on: 06/Sep/19 20:16 Start Date: 06/Sep/19 20:16 Worklog Time Spent: 10m Work Description: lostluck commented on issue #9504: [BEAM-8166] Allow in process workers to exit without killing the process. URL: https://github.com/apache/beam/pull/9504#issuecomment-528999334 Run Go PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308131) Time Spent: 0.5h (was: 20m) > Support Graceful shutdown of worker harness. > > > Key: BEAM-8166 > URL: https://issues.apache.org/jira/browse/BEAM-8166 > Project: Beam > Issue Type: Improvement > Components: runner-core, sdk-go >Reporter: Robert Burke >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > Ideally there should be a clear Shutdown control RPC a runner can send a > worker harness to trigger an orderly shutdown. > Absent that, errors on the runner side shouldn't manifest as SDK worker > harness errors. SDKs should log, and gracefully shutdown from GRPC errors. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work started] (BEAM-8167) Replace old Postcommit_Python_Verify test with new python postcommits in Grafana.
[ https://issues.apache.org/jira/browse/BEAM-8167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-8167 started by Daniel Oliveira. - > Replace old Postcommit_Python_Verify test with new python postcommits in > Grafana. > - > > Key: BEAM-8167 > URL: https://issues.apache.org/jira/browse/BEAM-8167 > Project: Beam > Issue Type: Bug > Components: project-management, testing >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > > The old postcommit was replaced with multiple new postcommits for each > supported python version (Python2, Python35, Python36, Python37). These > aren't being reflected on the Grafana dashboards yet, so that should be fixed. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8167) Replace old Postcommit_Python_Verify test with new python postcommits in Grafana.
[ https://issues.apache.org/jira/browse/BEAM-8167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-8167: -- Status: Open (was: Triage Needed) > Replace old Postcommit_Python_Verify test with new python postcommits in > Grafana. > - > > Key: BEAM-8167 > URL: https://issues.apache.org/jira/browse/BEAM-8167 > Project: Beam > Issue Type: Bug > Components: project-management, testing >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > > The old postcommit was replaced with multiple new postcommits for each > supported python version (Python2, Python35, Python36, Python37). These > aren't being reflected on the Grafana dashboards yet, so that should be fixed. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8156) Finish migration to standard Python typing
[ https://issues.apache.org/jira/browse/BEAM-8156?focusedWorklogId=308203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308203 ] ASF GitHub Bot logged work on BEAM-8156: Author: ASF GitHub Bot Created on: 06/Sep/19 22:46 Start Date: 06/Sep/19 22:46 Worklog Time Spent: 10m Work Description: udim commented on pull request #9509: [BEAM-8156] Add convert_to_typing_type URL: https://github.com/apache/beam/pull/9509 First step in being able to compartmentalize Beam typing types to certain functions, and not expose them externally. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
[jira] [Closed] (BEAM-7619) [beam_PostCommit_Py_ValCont] [test_metrics_fnapi_it] KeyError
[ https://issues.apache.org/jira/browse/BEAM-7619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira closed BEAM-7619. - Fix Version/s: Not applicable Resolution: Cannot Reproduce Looks like this specific bug was fixed once Dataflow service releases caught up. But the root cause (the blocking bug) is still unresolved, so the issue could pop back up in the future. > [beam_PostCommit_Py_ValCont] [test_metrics_fnapi_it] KeyError > - > > Key: BEAM-7619 > URL: https://issues.apache.org/jira/browse/BEAM-7619 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Valentyn Tymofieiev >Priority: Major > Labels: currently-failing > Fix For: Not applicable > > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|[https://builds.apache.org/job/beam_PostCommit_Py_ValCont/3625/consoleFull]] > > Initial investigation: > 12:08:13 ERROR: test_wordcount_fnapi_it > (apache_beam.examples.wordcount_it_test.WordCountIT) > 12:08:13 > -- > 12:08:13 Traceback (most recent call last): > 12:08:13 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_ValCont/src/sdks/python/apache_beam/examples/wordcount_it_test.py", > line 52, in test_wordcount_fnapi_it > 12:08:13 self._run_wordcount_it(wordcount.run, experiment='beam_fn_api') > 12:08:13 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_ValCont/src/sdks/python/apache_beam/examples/wordcount_it_test.py", > line 84, in _run_wordcount_it > 12:08:13 run_wordcount(test_pipeline.get_full_options_as_args(**extra_opts)) > 12:08:13 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_ValCont/src/sdks/python/apache_beam/examples/wordcount.py", > line 114, in run > 12:08:13 result = p.run() > 12:08:13 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_ValCont/src/sdks/python/apache_beam/pipeline.py", > line 406, in run > 12:08:13 self._options).run(False) > 12:08:13 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_ValCont/src/sdks/python/apache_beam/pipeline.py", > line 419, in run > 12:08:13 return self.runner.run_pipeline(self, self._options) > 12:08:13 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_ValCont/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", > line 64, in run_pipeline > 12:08:13 self.result.wait_until_finish(duration=wait_duration) > 12:08:13 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_ValCont/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py", > line 1338, in wait_until_finish > 12:08:13 (self.state, getattr(self._runner, 'last_error_msg', None)), self) > 12:08:13 DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, > Error: > 12:08:13 java.util.concurrent.ExecutionException: java.lang.RuntimeException: > Error received from SDK harness for instruction -129: Traceback (most recent > call last): > 12:08:13 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 157, in _execute > 12:08:13 response = task() > 12:08:13 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 190, in > 12:08:13 self._execute(lambda: worker.do_instruction(work), work) > 12:08:13 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 342, in do_instruction > 12:08:13 request.instruction_id) > 12:08:13 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 368, in process_bundle > 12:08:13 bundle_processor.process_bundle(instruction_id)) > 12:08:13 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 593, in process_bundle > 12:08:13 data.ptransform_id].process_encoded(data.data) > 12:08:13 KeyError: u'\n\x04-107\x12\x04-105' > 12:08:13 > 12:08:13 at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > 12:08:13 at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > 12:08:13 at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57) > 12:08:13 at > org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:285) > 12:08:13 at > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85) > 12:08:13 at > org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125) > 12:08:13 at >
[jira] [Work logged] (BEAM-7600) Spark portable runner: reuse SDK harness
[ https://issues.apache.org/jira/browse/BEAM-7600?focusedWorklogId=308202=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308202 ] ASF GitHub Bot logged work on BEAM-7600: Author: ASF GitHub Bot Created on: 06/Sep/19 22:45 Start Date: 06/Sep/19 22:45 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9095: [BEAM-7600] borrow SDK harness management code into Spark runner URL: https://github.com/apache/beam/pull/9095#issuecomment-529038195 @angoenka I have refactored this quite a bit. Unfortunately I only managed to eliminate a small amount of the complexity as most of it seems necessary. PTAL This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308202) Time Spent: 7h (was: 6h 50m) > Spark portable runner: reuse SDK harness > > > Key: BEAM-7600 > URL: https://issues.apache.org/jira/browse/BEAM-7600 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 7h > Remaining Estimate: 0h > > Right now, we're creating a new SDK harness every time an executable stage is > run [1], which is expensive. We should be able to re-use code from the Flink > runner to re-use the SDK harness [2]. > > [1] > [https://github.com/apache/beam/blob/c9fb261bc7666788402840bb6ce1b0ce2fd445d1/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunction.java#L135] > [2] > [https://github.com/apache/beam/blob/c9fb261bc7666788402840bb6ce1b0ce2fd445d1/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkDefaultExecutableStageContext.java] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8156) Finish migration to standard Python typing
[ https://issues.apache.org/jira/browse/BEAM-8156?focusedWorklogId=308208=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308208 ] ASF GitHub Bot logged work on BEAM-8156: Author: ASF GitHub Bot Created on: 06/Sep/19 23:04 Start Date: 06/Sep/19 23:04 Worklog Time Spent: 10m Work Description: udim commented on issue #9509: [BEAM-8156] Add convert_to_typing_type URL: https://github.com/apache/beam/pull/9509#issuecomment-529041440 R: @robertwb @tvalentyn This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308208) Remaining Estimate: 503h 40m (was: 503h 50m) Time Spent: 20m (was: 10m) > Finish migration to standard Python typing > -- > > Key: BEAM-8156 > URL: https://issues.apache.org/jira/browse/BEAM-8156 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Udi Meiri >Priority: Major > Original Estimate: 504h > Time Spent: 20m > Remaining Estimate: 503h 40m > > We should migrate all Python uses of types to the standard typing module, and > make the typehints.* ones aliases of the Python ones. > > There are three places where we use custom typehints behavior: > (1) is_compatible_with > (2) bind_type_variables/match_type_variables > (3) trivial type inference. > > I would propose that each of these be adapted to a (internal) public > interface that accepts and returns standard typing types, and internally > converts to our (nowhere else exposed) typehints types, performs the logic, > and converts back. Each of these in turn can then be updated, as needed and > orthogonally, to operate on the typing types natively (possibly via deference > to a third-party library). > > I think coder inference could be easily adopted to use typing types directly, > but it may be a fourth place where we do internal conversion first. Another > gotcha is special care may need to be taken if we ever need to pickle these > types (which IIRC may have issues). -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8156) Finish migration to standard Python typing
[ https://issues.apache.org/jira/browse/BEAM-8156?focusedWorklogId=308214=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308214 ] ASF GitHub Bot logged work on BEAM-8156: Author: ASF GitHub Bot Created on: 06/Sep/19 23:16 Start Date: 06/Sep/19 23:16 Worklog Time Spent: 10m Work Description: udim commented on pull request #9509: [BEAM-8156] Add convert_to_typing_type URL: https://github.com/apache/beam/pull/9509#discussion_r321938298 ## File path: sdks/python/apache_beam/typehints/native_type_compatibility.py ## @@ -198,6 +202,10 @@ def convert_to_beam_type(typ): arity=-1, beam_type=typehints.Tuple), _TypeMapEntry(match=_match_is_union, arity=-1, beam_type=typehints.Union), + _TypeMapEntry( + match=_match_issubclass(typing.Iterator), Review comment: Line 170 effectively limits the scope to typing module types. Is that good enough? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308214) Remaining Estimate: 503h 20m (was: 503.5h) Time Spent: 40m (was: 0.5h) > Finish migration to standard Python typing > -- > > Key: BEAM-8156 > URL: https://issues.apache.org/jira/browse/BEAM-8156 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Udi Meiri >Priority: Major > Original Estimate: 504h > Time Spent: 40m > Remaining Estimate: 503h 20m > > We should migrate all Python uses of types to the standard typing module, and > make the typehints.* ones aliases of the Python ones. > > There are three places where we use custom typehints behavior: > (1) is_compatible_with > (2) bind_type_variables/match_type_variables > (3) trivial type inference. > > I would propose that each of these be adapted to a (internal) public > interface that accepts and returns standard typing types, and internally > converts to our (nowhere else exposed) typehints types, performs the logic, > and converts back. Each of these in turn can then be updated, as needed and > orthogonally, to operate on the typing types natively (possibly via deference > to a third-party library). > > I think coder inference could be easily adopted to use typing types directly, > but it may be a fourth place where we do internal conversion first. Another > gotcha is special care may need to be taken if we ever need to pickle these > types (which IIRC may have issues). -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7969) Streaming Dataflow worker doesn't report FnAPI metrics.
[ https://issues.apache.org/jira/browse/BEAM-7969?focusedWorklogId=308130=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308130 ] ASF GitHub Bot logged work on BEAM-7969: Author: ASF GitHub Bot Created on: 06/Sep/19 20:14 Start Date: 06/Sep/19 20:14 Worklog Time Spent: 10m Work Description: ajamato commented on pull request #9494: [BEAM-7969] Fix doublecount on GRPC PCollections in streaming jobs. URL: https://github.com/apache/beam/pull/9494#discussion_r321893709 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java ## @@ -400,13 +409,19 @@ void updateProgress() { * @param monitoringInfos Usually received from FnApi. */ private void updateMetrics(List monitoringInfos) { + List monitoringInfosCopy = new ArrayList<>(monitoringInfos); + + List misToFilter = + bundleProcessOperation.findIOPCollectionMonitoringInfos(monitoringInfos); Review comment: findIOPCollectionMonitoringInfos Could we use a better name here? Not sure what this is referring to. I assume IO referrs to sources/sinks, but I don't think this is the case This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308130) Time Spent: 6.5h (was: 6h 20m) > Streaming Dataflow worker doesn't report FnAPI metrics. > --- > > Key: BEAM-7969 > URL: https://issues.apache.org/jira/browse/BEAM-7969 > Project: Beam > Issue Type: Bug > Components: java-fn-execution, runner-dataflow >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 6.5h > Remaining Estimate: 0h > > EOM -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8166) Support Graceful shutdown of worker harness.
[ https://issues.apache.org/jira/browse/BEAM-8166?focusedWorklogId=308129=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308129 ] ASF GitHub Bot logged work on BEAM-8166: Author: ASF GitHub Bot Created on: 06/Sep/19 20:14 Start Date: 06/Sep/19 20:14 Worklog Time Spent: 10m Work Description: lostluck commented on issue #9504: [BEAM-8166] Allow in process workers to exit without killing the process. URL: https://github.com/apache/beam/pull/9504#issuecomment-528998600 R: @youngoli This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308129) Time Spent: 20m (was: 10m) > Support Graceful shutdown of worker harness. > > > Key: BEAM-8166 > URL: https://issues.apache.org/jira/browse/BEAM-8166 > Project: Beam > Issue Type: Improvement > Components: runner-core, sdk-go >Reporter: Robert Burke >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Ideally there should be a clear Shutdown control RPC a runner can send a > worker harness to trigger an orderly shutdown. > Absent that, errors on the runner side shouldn't manifest as SDK worker > harness errors. SDKs should log, and gracefully shutdown from GRPC errors. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-6185) Upgrade Spark to version 2.4.0
[ https://issues.apache.org/jira/browse/BEAM-6185?focusedWorklogId=308187=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308187 ] ASF GitHub Bot logged work on BEAM-6185: Author: ASF GitHub Bot Created on: 06/Sep/19 21:54 Start Date: 06/Sep/19 21:54 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on issue #9487: [WIP] [BEAM-6185]Change default docker images name URL: https://github.com/apache/beam/pull/9487#issuecomment-529027241 R: @aaltay Can you please review or find someone else to review it? Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308187) Time Spent: 2.5h (was: 2h 20m) > Upgrade Spark to version 2.4.0 > -- > > Key: BEAM-6185 > URL: https://issues.apache.org/jira/browse/BEAM-6185 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré >Priority: Major > Fix For: 2.12.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8167) Replace old Postcommit_Python_Verify test with new python postcommits in Grafana.
[ https://issues.apache.org/jira/browse/BEAM-8167?focusedWorklogId=308186=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308186 ] ASF GitHub Bot logged work on BEAM-8167: Author: ASF GitHub Bot Created on: 06/Sep/19 21:52 Start Date: 06/Sep/19 21:52 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #9505: [BEAM-8167] Replace Postcommit_Python_Verify in Grafana dashboards. URL: https://github.com/apache/beam/pull/9505#issuecomment-529026645 Some thoughts on the change: * Do you have sample of how graph will look like? * Should we keep python verify on the graph in case if we want to see historical data? My understanding is that we discontinued python_verify jobs and created pythonXX instead. This will serve us well in the future, but we lose historical data on the default graph with current change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308186) Time Spent: 0.5h (was: 20m) > Replace old Postcommit_Python_Verify test with new python postcommits in > Grafana. > - > > Key: BEAM-8167 > URL: https://issues.apache.org/jira/browse/BEAM-8167 > Project: Beam > Issue Type: Bug > Components: project-management, testing >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > The old postcommit was replaced with multiple new postcommits for each > supported python version (Python2, Python35, Python36, Python37). These > aren't being reflected on the Grafana dashboards yet, so that should be fixed. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work started] (BEAM-8165) Change image name and add images to release process
[ https://issues.apache.org/jira/browse/BEAM-8165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-8165 started by Hannah Jiang. -- > Change image name and add images to release process > --- > > Key: BEAM-8165 > URL: https://issues.apache.org/jira/browse/BEAM-8165 > Project: Beam > Issue Type: Sub-task > Components: sdk-go, sdk-java-harness, sdk-py-harness >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > > change name to apachebeam/\{lang}{ver}_sdk -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8165) Change image name and add images to release process
[ https://issues.apache.org/jira/browse/BEAM-8165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hannah Jiang updated BEAM-8165: --- Summary: Change image name and add images to release process (was: Change image name) > Change image name and add images to release process > --- > > Key: BEAM-8165 > URL: https://issues.apache.org/jira/browse/BEAM-8165 > Project: Beam > Issue Type: Sub-task > Components: sdk-go, sdk-java-harness, sdk-py-harness >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > > change name to apachebeam/\{lang}{ver}_sdk -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7969) Streaming Dataflow worker doesn't report FnAPI metrics.
[ https://issues.apache.org/jira/browse/BEAM-7969?focusedWorklogId=308125=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308125 ] ASF GitHub Bot logged work on BEAM-7969: Author: ASF GitHub Bot Created on: 06/Sep/19 20:07 Start Date: 06/Sep/19 20:07 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #9494: [BEAM-7969] Fix doublecount on GRPC PCollections in streaming jobs. URL: https://github.com/apache/beam/pull/9494#issuecomment-528996585 Run Python Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308125) Time Spent: 6h 10m (was: 6h) > Streaming Dataflow worker doesn't report FnAPI metrics. > --- > > Key: BEAM-7969 > URL: https://issues.apache.org/jira/browse/BEAM-7969 > Project: Beam > Issue Type: Bug > Components: java-fn-execution, runner-dataflow >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > EOM -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7969) Streaming Dataflow worker doesn't report FnAPI metrics.
[ https://issues.apache.org/jira/browse/BEAM-7969?focusedWorklogId=308134=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308134 ] ASF GitHub Bot logged work on BEAM-7969: Author: ASF GitHub Bot Created on: 06/Sep/19 20:22 Start Date: 06/Sep/19 20:22 Worklog Time Spent: 10m Work Description: ajamato commented on pull request #9494: [BEAM-7969] Fix doublecount on GRPC PCollections in streaming jobs. URL: https://github.com/apache/beam/pull/9494#discussion_r321895776 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java ## @@ -400,13 +409,19 @@ void updateProgress() { * @param monitoringInfos Usually received from FnApi. */ private void updateMetrics(List monitoringInfos) { + List monitoringInfosCopy = new ArrayList<>(monitoringInfos); + + List misToFilter = + bundleProcessOperation.findIOPCollectionMonitoringInfos(monitoringInfos); Review comment: Is there a reason why we need to collect counters for these ones at all? Our UI doesn't display the grpc steps, they are an implementation detail. The alternative design I was thinking of here was to try and transform each monitoring info, and drop the ones that do not have a step in the original graph. Though, maybe there is no way to detect this? So they could be dropped int the transformer, as there is no need to send them to DFE. If I understand correctly, you are just avoiding a double count now? Though I think we don't need to report these at all. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308134) Time Spent: 6h 40m (was: 6.5h) > Streaming Dataflow worker doesn't report FnAPI metrics. > --- > > Key: BEAM-7969 > URL: https://issues.apache.org/jira/browse/BEAM-7969 > Project: Beam > Issue Type: Bug > Components: java-fn-execution, runner-dataflow >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 6h 40m > Remaining Estimate: 0h > > EOM -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8166) Support Graceful shutdown of worker harness.
[ https://issues.apache.org/jira/browse/BEAM-8166?focusedWorklogId=308145=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308145 ] ASF GitHub Bot logged work on BEAM-8166: Author: ASF GitHub Bot Created on: 06/Sep/19 20:45 Start Date: 06/Sep/19 20:45 Worklog Time Spent: 10m Work Description: lostluck commented on issue #9504: [BEAM-8166] Allow in process workers to exit without killing the process. URL: https://github.com/apache/beam/pull/9504#issuecomment-529008251 Excellent question, and you're correct the new mode isn't presently available for any public runners at this time. Flink, Spark, Dataflow all rely on containers, which don't run into this. However, native Go runners might choose to make different calls, or runners where the runner half a worker is linked into the same binary run into this, such as the internal Google runner I work on uses. It has an inprocess mode that can take advantage of this as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308145) Time Spent: 40m (was: 0.5h) > Support Graceful shutdown of worker harness. > > > Key: BEAM-8166 > URL: https://issues.apache.org/jira/browse/BEAM-8166 > Project: Beam > Issue Type: Improvement > Components: runner-core, sdk-go >Reporter: Robert Burke >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > Ideally there should be a clear Shutdown control RPC a runner can send a > worker harness to trigger an orderly shutdown. > Absent that, errors on the runner side shouldn't manifest as SDK worker > harness errors. SDKs should log, and gracefully shutdown from GRPC errors. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-7859) Portable Wordcount on Spark runner does not work in DOCKER execution mode.
[ https://issues.apache.org/jira/browse/BEAM-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-7859: -- Description: The error was observed during Beam 2.14.0 release validation, see: https://issues.apache.org/jira/browse/BEAM-7224?focusedCommentId=16896831=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16896831 -Looks like master currently fails with a different error, both in Loopback and Docker modes.- [~ibzib] [~altay] [~robertwb] [~angoenka] was: The error was observed during Beam 2.14.0 release validation, see: https://issues.apache.org/jira/browse/BEAM-7224?focusedCommentId=16896831=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16896831 Looks like master currently fails with a different error, both in Loopback and Docker modes. [~ibzib] [~altay] [~robertwb] [~angoenka] > Portable Wordcount on Spark runner does not work in DOCKER execution mode. > -- > > Key: BEAM-7859 > URL: https://issues.apache.org/jira/browse/BEAM-7859 > Project: Beam > Issue Type: Bug > Components: runner-spark, sdk-py-harness >Reporter: Valentyn Tymofieiev >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 1h > Remaining Estimate: 0h > > The error was observed during Beam 2.14.0 release validation, see: > https://issues.apache.org/jira/browse/BEAM-7224?focusedCommentId=16896831=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16896831 > -Looks like master currently fails with a different error, both in Loopback > and Docker modes.- > [~ibzib] [~altay] [~robertwb] [~angoenka] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-6185) Upgrade Spark to version 2.4.0
[ https://issues.apache.org/jira/browse/BEAM-6185?focusedWorklogId=308177=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308177 ] ASF GitHub Bot logged work on BEAM-6185: Author: ASF GitHub Bot Created on: 06/Sep/19 21:36 Start Date: 06/Sep/19 21:36 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #9506: [WIP][BEAM-6185] Release scripts for docker images URL: https://github.com/apache/beam/pull/9506 This PR adds scripts for staging and publishing docker images to docker hub. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-8167) Replace old Postcommit_Python_Verify test with new python postcommits in Grafana.
[ https://issues.apache.org/jira/browse/BEAM-8167?focusedWorklogId=308198=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308198 ] ASF GitHub Bot logged work on BEAM-8167: Author: ASF GitHub Bot Created on: 06/Sep/19 22:18 Start Date: 06/Sep/19 22:18 Worklog Time Spent: 10m Work Description: youngoli commented on issue #9505: [BEAM-8167] Replace Postcommit_Python_Verify in Grafana dashboards. URL: https://github.com/apache/beam/pull/9505#issuecomment-529032650 I'll modify the commit to put Python_Verify back in. Doesn't seem like a big deal to leave it for historical data for now. As for a sample: ![image](https://user-images.githubusercontent.com/1740865/64463554-4a902a80-d0b9-11e9-9138-db16900d3522.png) Since this is running locally, I don't have a full history like it would have on the website. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308198) Time Spent: 40m (was: 0.5h) > Replace old Postcommit_Python_Verify test with new python postcommits in > Grafana. > - > > Key: BEAM-8167 > URL: https://issues.apache.org/jira/browse/BEAM-8167 > Project: Beam > Issue Type: Bug > Components: project-management, testing >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > The old postcommit was replaced with multiple new postcommits for each > supported python version (Python2, Python35, Python36, Python37). These > aren't being reflected on the Grafana dashboards yet, so that should be fixed. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable IO connector for Python
[ https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=308029=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308029 ] ASF GitHub Bot logged work on BEAM-3342: Author: ASF GitHub Bot Created on: 06/Sep/19 17:22 Start Date: 06/Sep/19 17:22 Worklog Time Spent: 10m Work Description: sduskis commented on pull request #8457: [BEAM-3342] Create a Cloud Bigtable IO connector for Python URL: https://github.com/apache/beam/pull/8457#discussion_r321833569 ## File path: sdks/python/apache_beam/io/gcp/bigtableio.py ## @@ -122,22 +126,145 @@ class WriteToBigTable(beam.PTransform): A PTransform that write a list of `DirectRow` into the Bigtable Table """ - def __init__(self, project_id=None, instance_id=None, - table_id=None): + def __init__(self, project_id=None, instance_id=None, table_id=None): """ The PTransform to access the Bigtable Write connector Args: project_id(str): GCP Project of to write the Rows instance_id(str): GCP Instance to write the Rows table_id(str): GCP Table to write the `DirectRows` """ super(WriteToBigTable, self).__init__() -self.beam_options = {'project_id': project_id, - 'instance_id': instance_id, - 'table_id': table_id} +self._beam_options = {'project_id': project_id, + 'instance_id': instance_id, + 'table_id': table_id} def expand(self, pvalue): -beam_options = self.beam_options +beam_options = self._beam_options return (pvalue | beam.ParDo(_BigTableWriteFn(beam_options['project_id'], beam_options['instance_id'], beam_options['table_id']))) + + +class _BigtableReadFn(beam.DoFn): + """ Creates the connector that can read rows for Beam pipeline + + Args: +project_id(str): GCP Project ID +instance_id(str): GCP Instance ID +table_id(str): GCP Table ID + + """ + + def __init__(self, project_id, instance_id, table_id, filter_=b''): +""" Constructor of the Read connector of Bigtable + +Args: + project_id: [str] GCP Project of to write the Rows + instance_id: [str] GCP Instance to write the Rows + table_id: [str] GCP Table to write the `DirectRows` + filter_: [RowFilter] Filter to apply to columns in a row. +""" +super(self.__class__, self).__init__() +self._initialize({'project_id': project_id, + 'instance_id': instance_id, + 'table_id': table_id, + 'filter_': filter_}) + + def __getstate__(self): +return self._beam_options + + def __setstate__(self, options): +self._initialize(options) + + def _initialize(self, options): +self._beam_options = options +self.table = None +self.sample_row_keys = None +self.row_count = Metrics.counter(self.__class__.__name__, 'Rows read') + + def start_bundle(self): +if self.table is None: + self.table = Client(project=self._beam_options['project_id'])\ +.instance(self._beam_options['instance_id'])\ +.table(self._beam_options['table_id']) + + def process(self, element, **kwargs): +for row in self.table.read_rows(start_key=element.start_position, +end_key=element.end_position, +filter_=self._beam_options['filter_']): + self.row_count.inc() + yield row + + def display_data(self): +return {'projectId': DisplayDataItem(self._beam_options['project_id'], + label='Bigtable Project Id'), +'instanceId': DisplayDataItem(self._beam_options['instance_id'], + label='Bigtable Instance Id'), +'tableId': DisplayDataItem(self._beam_options['table_id'], + label='Bigtable Table Id'), +'filter_': DisplayDataItem(str(self._beam_options['filter_']), + label='Bigtable Filter') + } + + +class ReadFromBigTable(beam.PTransform): + def __init__(self, project_id, instance_id, table_id, filter_=b''): +""" The PTransform to access the Bigtable Read connector + +Args: + project_id: [str] GCP Project of to read the Rows + instance_id): [str] GCP Instance to read the Rows + table_id): [str] GCP Table to read the Rows + filter_: [RowFilter] Filter to apply to columns in a row. +""" +super(self.__class__, self).__init__() +self._beam_options = {'project_id': project_id, + 'instance_id': instance_id, + 'table_id': table_id, + 'filter_': filter_} + + def
[jira] [Work logged] (BEAM-7969) Streaming Dataflow worker doesn't report FnAPI metrics.
[ https://issues.apache.org/jira/browse/BEAM-7969?focusedWorklogId=308062=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308062 ] ASF GitHub Bot logged work on BEAM-7969: Author: ASF GitHub Bot Created on: 06/Sep/19 18:08 Start Date: 06/Sep/19 18:08 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #9494: [BEAM-7969] Fix doublecount on GRPC PCollections in streaming jobs. URL: https://github.com/apache/beam/pull/9494#issuecomment-528957185 Run Python Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308062) Time Spent: 5.5h (was: 5h 20m) > Streaming Dataflow worker doesn't report FnAPI metrics. > --- > > Key: BEAM-7969 > URL: https://issues.apache.org/jira/browse/BEAM-7969 > Project: Beam > Issue Type: Bug > Components: java-fn-execution, runner-dataflow >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > > EOM -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7969) Streaming Dataflow worker doesn't report FnAPI metrics.
[ https://issues.apache.org/jira/browse/BEAM-7969?focusedWorklogId=308063=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308063 ] ASF GitHub Bot logged work on BEAM-7969: Author: ASF GitHub Bot Created on: 06/Sep/19 18:08 Start Date: 06/Sep/19 18:08 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #9494: [BEAM-7969] Fix doublecount on GRPC PCollections in streaming jobs. URL: https://github.com/apache/beam/pull/9494#issuecomment-528957185 Run Python Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308063) Time Spent: 5h 40m (was: 5.5h) > Streaming Dataflow worker doesn't report FnAPI metrics. > --- > > Key: BEAM-7969 > URL: https://issues.apache.org/jira/browse/BEAM-7969 > Project: Beam > Issue Type: Bug > Components: java-fn-execution, runner-dataflow >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > > EOM -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7969) Streaming Dataflow worker doesn't report FnAPI metrics.
[ https://issues.apache.org/jira/browse/BEAM-7969?focusedWorklogId=308067=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308067 ] ASF GitHub Bot logged work on BEAM-7969: Author: ASF GitHub Bot Created on: 06/Sep/19 18:09 Start Date: 06/Sep/19 18:09 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #9494: [BEAM-7969] Fix doublecount on GRPC PCollections in streaming jobs. URL: https://github.com/apache/beam/pull/9494#issuecomment-528957722 @lukecwik @y1chi @angoenka This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308067) Time Spent: 5h 50m (was: 5h 40m) > Streaming Dataflow worker doesn't report FnAPI metrics. > --- > > Key: BEAM-7969 > URL: https://issues.apache.org/jira/browse/BEAM-7969 > Project: Beam > Issue Type: Bug > Components: java-fn-execution, runner-dataflow >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > EOM -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8166) Support Graceful shutdown of worker harness.
[ https://issues.apache.org/jira/browse/BEAM-8166?focusedWorklogId=308128=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308128 ] ASF GitHub Bot logged work on BEAM-8166: Author: ASF GitHub Bot Created on: 06/Sep/19 20:13 Start Date: 06/Sep/19 20:13 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #9504: [BEAM-8166] Allow in process workers to exit without killing the process. URL: https://github.com/apache/beam/pull/9504 Allow for runners that do not spawn distinct processes for workers. InProcess runners can be useful for testing, or to support for any future native go runner. When running multiple pipelines simultaneously, it's desireable to allow for some measure of independance as well. Responding fatally on GRPC connection terminations prevents these behaviours. Further, improve the error messages to include additional context when there are GRPC issues. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-8167) Replace old Postcommit_Python_Verify test with new python postcommits in Grafana.
[ https://issues.apache.org/jira/browse/BEAM-8167?focusedWorklogId=308160=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308160 ] ASF GitHub Bot logged work on BEAM-8167: Author: ASF GitHub Bot Created on: 06/Sep/19 21:02 Start Date: 06/Sep/19 21:02 Worklog Time Spent: 10m Work Description: youngoli commented on issue #9505: [BEAM-8167] Replace Postcommit_Python_Verify in Grafana dashboards. URL: https://github.com/apache/beam/pull/9505#issuecomment-529013258 R: @Ardagan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308160) Time Spent: 20m (was: 10m) > Replace old Postcommit_Python_Verify test with new python postcommits in > Grafana. > - > > Key: BEAM-8167 > URL: https://issues.apache.org/jira/browse/BEAM-8167 > Project: Beam > Issue Type: Bug > Components: project-management, testing >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > The old postcommit was replaced with multiple new postcommits for each > supported python version (Python2, Python35, Python36, Python37). These > aren't being reflected on the Grafana dashboards yet, so that should be fixed. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8167) Replace old Postcommit_Python_Verify test with new python postcommits in Grafana.
[ https://issues.apache.org/jira/browse/BEAM-8167?focusedWorklogId=308159=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308159 ] ASF GitHub Bot logged work on BEAM-8167: Author: ASF GitHub Bot Created on: 06/Sep/19 21:01 Start Date: 06/Sep/19 21:01 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #9505: [BEAM-8167] Replace Postcommit_Python_Verify in Grafana dashboards. URL: https://github.com/apache/beam/pull/9505 Replacing the old postcommit in the "Stability Critical Jobs" graphs with the new Python Postcommits which are split based on Python version. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar
[ https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=308180=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308180 ] ASF GitHub Bot logged work on BEAM-7967: Author: ASF GitHub Bot Created on: 06/Sep/19 21:39 Start Date: 06/Sep/19 21:39 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9408: [BEAM-7967] Execute portable Flink application jar URL: https://github.com/apache/beam/pull/9408#issuecomment-529023081 Run PortableJar_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308180) Time Spent: 3h 40m (was: 3.5h) > Execute portable Flink application jar > -- > > Key: BEAM-7967 > URL: https://issues.apache.org/jira/browse/BEAM-7967 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 3h 40m > Remaining Estimate: 0h > > [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads
[ https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=308206=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308206 ] ASF GitHub Bot logged work on BEAM-8151: Author: ASF GitHub Bot Created on: 06/Sep/19 22:51 Start Date: 06/Sep/19 22:51 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #9477: [BEAM-8151, BEAM-7848] Up the max number of threads inside the SDK harness to a default of 10k URL: https://github.com/apache/beam/pull/9477#issuecomment-529039169 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308206) Time Spent: 5h 10m (was: 5h) > Allow the Python SDK to use many many threads > - > > Key: BEAM-8151 > URL: https://issues.apache.org/jira/browse/BEAM-8151 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > We need to use a thread pool which shrinks the number of active threads when > they are not being used. > > This is to prevent any stuckness issues related to a runner scheduling more > work items then there are "work" threads inside the SDK harness. > > By default the control plane should have all "requests" being processed in > parallel and the runner is responsible for not overloading the SDK with too > much work. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8161) Upgrade to joda time 2.10.3 to get updated TZDB
[ https://issues.apache.org/jira/browse/BEAM-8161?focusedWorklogId=308205=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308205 ] ASF GitHub Bot logged work on BEAM-8161: Author: ASF GitHub Bot Created on: 06/Sep/19 22:51 Start Date: 06/Sep/19 22:51 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #9495: [BEAM-8161] Update joda time to 2.10.3 to get TZDB 2019b URL: https://github.com/apache/beam/pull/9495 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308205) Time Spent: 20m (was: 10m) > Upgrade to joda time 2.10.3 to get updated TZDB > --- > > Key: BEAM-8161 > URL: https://issues.apache.org/jira/browse/BEAM-8161 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8156) Finish migration to standard Python typing
[ https://issues.apache.org/jira/browse/BEAM-8156?focusedWorklogId=308211=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308211 ] ASF GitHub Bot logged work on BEAM-8156: Author: ASF GitHub Bot Created on: 06/Sep/19 23:08 Start Date: 06/Sep/19 23:08 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9509: [BEAM-8156] Add convert_to_typing_type URL: https://github.com/apache/beam/pull/9509#discussion_r321936389 ## File path: sdks/python/apache_beam/typehints/native_type_compatibility.py ## @@ -198,6 +202,10 @@ def convert_to_beam_type(typ): arity=-1, beam_type=typehints.Tuple), _TypeMapEntry(match=_match_is_union, arity=-1, beam_type=typehints.Union), + _TypeMapEntry( + match=_match_issubclass(typing.Iterator), Review comment: This will math all kinds of things, including strings and dicts, which is not what we want. (We may want to explicitly note that.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308211) Remaining Estimate: 503.5h (was: 503h 40m) Time Spent: 0.5h (was: 20m) > Finish migration to standard Python typing > -- > > Key: BEAM-8156 > URL: https://issues.apache.org/jira/browse/BEAM-8156 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Udi Meiri >Priority: Major > Original Estimate: 504h > Time Spent: 0.5h > Remaining Estimate: 503.5h > > We should migrate all Python uses of types to the standard typing module, and > make the typehints.* ones aliases of the Python ones. > > There are three places where we use custom typehints behavior: > (1) is_compatible_with > (2) bind_type_variables/match_type_variables > (3) trivial type inference. > > I would propose that each of these be adapted to a (internal) public > interface that accepts and returns standard typing types, and internally > converts to our (nowhere else exposed) typehints types, performs the logic, > and converts back. Each of these in turn can then be updated, as needed and > orthogonally, to operate on the typing types natively (possibly via deference > to a third-party library). > > I think coder inference could be easily adopted to use typing types directly, > but it may be a fourth place where we do internal conversion first. Another > gotcha is special care may need to be taken if we ever need to pickle these > types (which IIRC may have issues). -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8167) Replace old Postcommit_Python_Verify test with new python postcommits in Grafana.
[ https://issues.apache.org/jira/browse/BEAM-8167?focusedWorklogId=308212=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308212 ] ASF GitHub Bot logged work on BEAM-8167: Author: ASF GitHub Bot Created on: 06/Sep/19 23:08 Start Date: 06/Sep/19 23:08 Worklog Time Spent: 10m Work Description: Ardagan commented on pull request #9505: [BEAM-8167] Replace Postcommit_Python_Verify in Grafana dashboards. URL: https://github.com/apache/beam/pull/9505 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308212) Time Spent: 50m (was: 40m) > Replace old Postcommit_Python_Verify test with new python postcommits in > Grafana. > - > > Key: BEAM-8167 > URL: https://issues.apache.org/jira/browse/BEAM-8167 > Project: Beam > Issue Type: Bug > Components: project-management, testing >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > The old postcommit was replaced with multiple new postcommits for each > supported python version (Python2, Python35, Python36, Python37). These > aren't being reflected on the Grafana dashboards yet, so that should be fixed. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-7969) Streaming Dataflow worker doesn't report FnAPI metrics.
[ https://issues.apache.org/jira/browse/BEAM-7969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924461#comment-16924461 ] Mikhail Gryzykhin commented on BEAM-7969: - Initial PR double counted elements in GRPC PCollections > Streaming Dataflow worker doesn't report FnAPI metrics. > --- > > Key: BEAM-7969 > URL: https://issues.apache.org/jira/browse/BEAM-7969 > Project: Beam > Issue Type: Bug > Components: java-fn-execution, runner-dataflow >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > EOM -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-5820) Vendor Calcite
[ https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=308048=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308048 ] ASF GitHub Bot logged work on BEAM-5820: Author: ASF GitHub Bot Created on: 06/Sep/19 17:49 Start Date: 06/Sep/19 17:49 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #9189: [BEAM-5820] vendor calcite URL: https://github.com/apache/beam/pull/9189#issuecomment-528950626 Run SQL PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308048) Time Spent: 5h 50m (was: 5h 40m) > Vendor Calcite > -- > > Key: BEAM-5820 > URL: https://issues.apache.org/jira/browse/BEAM-5820 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kai Jiang >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7969) Streaming Dataflow worker doesn't report FnAPI metrics.
[ https://issues.apache.org/jira/browse/BEAM-7969?focusedWorklogId=308113=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308113 ] ASF GitHub Bot logged work on BEAM-7969: Author: ASF GitHub Bot Created on: 06/Sep/19 19:27 Start Date: 06/Sep/19 19:27 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #9494: [BEAM-7969] Fix doublecount on GRPC PCollections in streaming jobs. URL: https://github.com/apache/beam/pull/9494#issuecomment-528984343 @ajamato This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308113) Time Spent: 6h (was: 5h 50m) > Streaming Dataflow worker doesn't report FnAPI metrics. > --- > > Key: BEAM-7969 > URL: https://issues.apache.org/jira/browse/BEAM-7969 > Project: Beam > Issue Type: Bug > Components: java-fn-execution, runner-dataflow >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > > EOM -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8167) Replace old Postcommit_Python_Verify test with new python postcommits in Grafana.
Daniel Oliveira created BEAM-8167: - Summary: Replace old Postcommit_Python_Verify test with new python postcommits in Grafana. Key: BEAM-8167 URL: https://issues.apache.org/jira/browse/BEAM-8167 Project: Beam Issue Type: Bug Components: project-management, testing Reporter: Daniel Oliveira Assignee: Daniel Oliveira The old postcommit was replaced with multiple new postcommits for each supported python version (Python2, Python35, Python36, Python37). These aren't being reflected on the Grafana dashboards yet, so that should be fixed. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8111) SchemaCoder broken on DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8111?focusedWorklogId=308195=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308195 ] ASF GitHub Bot logged work on BEAM-8111: Author: ASF GitHub Bot Created on: 06/Sep/19 22:14 Start Date: 06/Sep/19 22:14 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #9454: [BEAM-8111] Add ValidatesRunner test to AvroSchemaTest URL: https://github.com/apache/beam/pull/9454 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308195) Time Spent: 1h 40m (was: 1.5h) > SchemaCoder broken on DataflowRunner > > > Key: BEAM-8111 > URL: https://issues.apache.org/jira/browse/BEAM-8111 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-java-core >Affects Versions: 2.15.0 >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Blocker > Fix For: 2.16.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > https://github.com/apache/beam/commit/e65c176a9f34e45d408281e1101a2ae54cef0f6c > broke SchemaCoder on Dataflow. When translating a schema that uses logical > types from a cloud object dataflow encounters a runtime error. > This means any pipelines that use SqlTransform or schema transforms will fail > on Dataflow in 2.15.0 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8168) Python GCSFileSystem failing with gzip content encoding
Daniel Ecer created BEAM-8168: - Summary: Python GCSFileSystem failing with gzip content encoding Key: BEAM-8168 URL: https://issues.apache.org/jira/browse/BEAM-8168 Project: Beam Issue Type: Bug Components: io-py-gcp Affects Versions: 2.15.0 Reporter: Daniel Ecer Google Storage supports gzip content encoding. While Apache Beam (Python) can correctly work with .gz files without content encoding. It however fails to handle .gz files that have content encoding applied. e.g. (the following would work run in a Jupyer notebook) {code:python} file_url_1 = 'gs://some-bucket/test1.gz' file_url_2 = 'gs://some-bucket/test2.gz' !echo 'my content' > /tmp/test # file 1 without content encoding !cat /tmp/test | gzip | gsutil cp - "{file_url_1}" # file 2 with content encoding !gsutil cp -Z /tmp/test "{file_url_2}" !gsutil cat "{file_url_1}" | zcat - # output: my content !gsutil cat "{file_url_2}" | zcat - # output: my content import apache_beam as beam from apache_beam.io.filesystem import CompressionTypes from apache_beam.io.filesystems import FileSystems print(beam.__version__) # output: 2.15.0 with FileSystems.open(file_url_1, compression_type=CompressionTypes.UNCOMPRESSED) as fp: print(fp.read(10)) # output: b'\x1f\x8b\x08\x00\x10\xd6r]\x00\x03' with FileSystems.open(file_url_1) as fp: print(fp.read(10)) # output: b'my content' with FileSystems.open(file_url_2, compression_type=CompressionTypes.UNCOMPRESSED) as fp: print(fp.read(10)) # output: b'my content' # (here I would expect the gzipped byte code) with FileSystems.open(file_url_2) as fp: print(fp.read(10)) # exception: FailedToDecompressContent: Content purported to be compressed with gzip but failed to decompress. {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8172) Add a label field to FunctionSpec proto
Chad Dombrova created BEAM-8172: --- Summary: Add a label field to FunctionSpec proto Key: BEAM-8172 URL: https://issues.apache.org/jira/browse/BEAM-8172 Project: Beam Issue Type: Improvement Components: beam-model Reporter: Chad Dombrova Assignee: Chad Dombrova The FunctionSpec payload is opaque outside of the native environment, which can make debugging pipeline protos difficult. It would be very useful if the FunctionSpec had an optional human readable "label", for debugging and presenting in UIs and error messages. For example, in python, if the payload is an instance of a class, we could attempt to provide a string that represents the dotted path to that class, "mymodule.MyClass". In the case of coders, we could use the label to hold the type hint: "Optional[Iterable[mymodule.MyClass]]". -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness
[ https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=308245=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308245 ] ASF GitHub Bot logged work on BEAM-7945: Author: ASF GitHub Bot Created on: 07/Sep/19 00:58 Start Date: 07/Sep/19 00:58 Worklog Time Spent: 10m Work Description: tweise commented on pull request #9452: [BEAM-7945] Allow runner to configure semi_persist_dir which is used … URL: https://github.com/apache/beam/pull/9452#discussion_r321948469 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/options/RemoteEnvironmentOptions.java ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.options; + +import com.google.auto.service.AutoService; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +/** Options that are used to control configuration of the remote environment. */ +@Experimental +@Hidden +public interface RemoteEnvironmentOptions extends PipelineOptions { + + @Description("Local semi-persistent directory") + @Default.String("/tmp") Review comment: I think the default should be null, so that the environment can pick its suitable tmp directory when nothing is specified by the user. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308245) Time Spent: 2.5h (was: 2h 20m) > Allow runner to configure "semi_persist_dir" which is used in the SDK harness > - > > Key: BEAM-7945 > URL: https://issues.apache.org/jira/browse/BEAM-7945 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.16.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Currently "semi_persist_dir" is not configurable. This may become a problem > in certain scenarios. For example, the default value of "semi_persist_dir" is > "/tmp" > ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48]) > in Python SDK harness. When the environment type is "PROCESS", the disk of > "/tmp" may be filled up and unexpected issues will occur in production > environment. We should provide a way to configure "semi_persist_dir" in > EnvironmentFactory at the runner side. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7990) Add ability to read parquet files into PCollection
[ https://issues.apache.org/jira/browse/BEAM-7990?focusedWorklogId=308254=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308254 ] ASF GitHub Bot logged work on BEAM-7990: Author: ASF GitHub Bot Created on: 07/Sep/19 01:32 Start Date: 07/Sep/19 01:32 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #9361: [BEAM-7990] Add ability to read parquet files into PCollection URL: https://github.com/apache/beam/pull/9361#issuecomment-529059952 R: @chamikaramj would you mind reviewing/merging? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308254) Time Spent: 40m (was: 0.5h) > Add ability to read parquet files into PCollection > - > > Key: BEAM-7990 > URL: https://issues.apache.org/jira/browse/BEAM-7990 > Project: Beam > Issue Type: New Feature > Components: io-py-parquet >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8169) DataCatalogTableProvider should use credentials from PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-8169?focusedWorklogId=308220=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308220 ] ASF GitHub Bot logged work on BEAM-8169: Author: ASF GitHub Bot Created on: 06/Sep/19 23:31 Start Date: 06/Sep/19 23:31 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #9511: [BEAM-8169] DataCatalogTableProvider to use GCP credentials from PipelineOptions URL: https://github.com/apache/beam/pull/9511 This switches the metastore from Google Cloud Data Catalog to use the credentials specified in the options. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-7970) Regenerate Go SDK proto files in correct version
[ https://issues.apache.org/jira/browse/BEAM-7970?focusedWorklogId=308221=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308221 ] ASF GitHub Bot logged work on BEAM-7970: Author: ASF GitHub Bot Created on: 06/Sep/19 23:31 Start Date: 06/Sep/19 23:31 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #9512: [BEAM-7970] Adding correct version of Go protobuf compiler. URL: https://github.com/apache/beam/pull/9512 From personal experience, using a version of the Go protobuf compiler that's too new will cause errors when generating the protobuf code, since it'll be incompatible with the version of the golang/protobuf library used in Beam. So I added a note with that warning and the best version to use. Only worry is that the note will become outdated if we ever update the golock file or switch to modules, but I can't think of a more "evergreen" way to leave a warning. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.
[ https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=308226=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308226 ] ASF GitHub Bot logged work on BEAM-3713: Author: ASF GitHub Bot Created on: 07/Sep/19 00:08 Start Date: 07/Sep/19 00:08 Worklog Time Spent: 10m Work Description: udim commented on issue #7949: [BEAM-3713] Add pytest testing infrastructure URL: https://github.com/apache/beam/pull/7949#issuecomment-529051732 run python 2 postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308226) Time Spent: 4.5h (was: 4h 20m) > Consider moving away from nose to nose2 or pytest. > -- > > Key: BEAM-3713 > URL: https://issues.apache.org/jira/browse/BEAM-3713 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: Robert Bradshaw >Assignee: Udi Meiri >Priority: Minor > Time Spent: 4.5h > Remaining Estimate: 0h > > Per > [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,] > , nose is in maintenance mode. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8171) Python container build fails
[ https://issues.apache.org/jira/browse/BEAM-8171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924678#comment-16924678 ] Kyle Weaver commented on BEAM-8171: --- Seems to be caused by [https://github.com/h5py/h5py/issues/1294] > Python container build fails > > > Key: BEAM-8171 > URL: https://issues.apache.org/jira/browse/BEAM-8171 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Kyle Weaver >Priority: Major > > > Task :sdks:python:container:py3:docker > The command '/bin/sh -c pip install -r /tmp/base_image_requirements.txt && > python -c "from google.protobuf.internal import api_implementation; assert > api_implementation._default_implementation_type == 'cpp'; print ('Verified > fast protobuf used.')" && rm -rf /root/.cache/pip' returned a non-zero code: 1 > This also happens for py2. > Looks like this dependency might be causing the error: > 16:47:51.893 [INFO] [system.out] Loading library to get version: libhdf5.so > 16:47:51.893 [INFO] [system.out] error: libhdf5.so: cannot open shared object > file: No such file or directory > 16:47:51.893 [INFO] [system.out] > 16:47:51.893 [INFO] [system.out] ERROR: Failed building wheel for h5py -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads
[ https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=308260=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308260 ] ASF GitHub Bot logged work on BEAM-8151: Author: ASF GitHub Bot Created on: 07/Sep/19 01:58 Start Date: 07/Sep/19 01:58 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9477: [BEAM-8151, BEAM-7848] Up the max number of threads inside the SDK harness to a default of 10k URL: https://github.com/apache/beam/pull/9477#issuecomment-529062034 Precommit is failing with: 16:09:15 collapsing-thread-pool-executor 2018.6 has requirement futures==3.2.0, but you have futures 3.3.0. 16:09:15 ERROR: InvocationError for command /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/target/.tox-py27-lint/py27-lint/bin/pip check (exited with code 1) 16:09:15 py27-lint run-test-post: commands[0] | /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/scripts/run_tox_cleanup.sh 16:09:15 ___ summary 16:09:15 ERROR: py27-lint: commands failed The reason is a dependency conflict. collapsing-thread-pool-executor requires futures == 3.2.0 for python 2.7 (https://github.com/ftpsolutions/collapsing-thread-pool-executor/blob/master/setup.py#L22) Beam's setup.py accepts "'futures>=3.2.0,<4.0.0; python_version < "3.0"'," We can do a few things: - Update collapsing-thread-pool-executor to relax the upper bound for futures dependency - Update beam to restict its futures dependency to use 3.2.0 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308260) Time Spent: 5h 20m (was: 5h 10m) > Allow the Python SDK to use many many threads > - > > Key: BEAM-8151 > URL: https://issues.apache.org/jira/browse/BEAM-8151 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > We need to use a thread pool which shrinks the number of active threads when > they are not being used. > > This is to prevent any stuckness issues related to a runner scheduling more > work items then there are "work" threads inside the SDK harness. > > By default the control plane should have all "requests" being processed in > parallel and the runner is responsible for not overloading the SDK with too > much work. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-6185) Upgrade Spark to version 2.4.0
[ https://issues.apache.org/jira/browse/BEAM-6185?focusedWorklogId=308261=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308261 ] ASF GitHub Bot logged work on BEAM-6185: Author: ASF GitHub Bot Created on: 07/Sep/19 01:59 Start Date: 07/Sep/19 01:59 Worklog Time Spent: 10m Work Description: tweise commented on pull request #9487: [BEAM-6185]Change default docker images name URL: https://github.com/apache/beam/pull/9487#discussion_r321950925 ## File path: sdks/python/apache_beam/runners/portability/portable_runner.py ## @@ -84,13 +84,12 @@ def default_docker_image(): 'has Python %d.%d interpreter.' % ( sys.version_info[0], sys.version_info[1])) - # Perhaps also test if this was built? - image = ('{user}-docker-apache.bintray.io/beam/python' - '{version_suffix}:latest'.format( - user=os.environ['USER'], - version_suffix=version_suffix)) + image = ('apachebeam/python{version_suffix}_sdk:latest'.format( Review comment: I have mixed feelings about this. When working with master, wouldn't I expect the locally built image to be used? What does it mean to refer to hub for something that wasn't released. Perhaps do that depending on whether current version is dev/snapshot or a release? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308261) Time Spent: 2h 50m (was: 2h 40m) > Upgrade Spark to version 2.4.0 > -- > > Key: BEAM-6185 > URL: https://issues.apache.org/jira/browse/BEAM-6185 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré >Priority: Major > Fix For: 2.12.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-5820) Vendor Calcite
[ https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=308252=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308252 ] ASF GitHub Bot logged work on BEAM-5820: Author: ASF GitHub Bot Created on: 07/Sep/19 01:13 Start Date: 07/Sep/19 01:13 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #9189: [BEAM-5820] vendor calcite URL: https://github.com/apache/beam/pull/9189#issuecomment-529058347 Thanks for replacing calcite in Beam ZetaSQL to vendor calcite, which will unblock us from moving Beam ZetaSQL to a separate module. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308252) Time Spent: 6.5h (was: 6h 20m) > Vendor Calcite > -- > > Key: BEAM-5820 > URL: https://issues.apache.org/jira/browse/BEAM-5820 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kai Jiang >Priority: Major > Time Spent: 6.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-5820) Vendor Calcite
[ https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=308251=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308251 ] ASF GitHub Bot logged work on BEAM-5820: Author: ASF GitHub Bot Created on: 07/Sep/19 01:13 Start Date: 07/Sep/19 01:13 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #9189: [BEAM-5820] vendor calcite URL: https://github.com/apache/beam/pull/9189#issuecomment-529058347 Thanks for replacing calcite in Beam ZetaSQL to vendor calcite, which will unblock us to move Beam ZetaSQL to a separate module. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308251) Time Spent: 6h 20m (was: 6h 10m) > Vendor Calcite > -- > > Key: BEAM-5820 > URL: https://issues.apache.org/jira/browse/BEAM-5820 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kai Jiang >Priority: Major > Time Spent: 6h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-5820) Vendor Calcite
[ https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=308250=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308250 ] ASF GitHub Bot logged work on BEAM-5820: Author: ASF GitHub Bot Created on: 07/Sep/19 01:13 Start Date: 07/Sep/19 01:13 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #9189: [BEAM-5820] vendor calcite URL: https://github.com/apache/beam/pull/9189#issuecomment-529058347 Thanks for replace calcite in Beam ZetaSQL to vendor calcite, which will unblock us to move Beam ZetaSQL to a separate module. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308250) Time Spent: 6h 10m (was: 6h) > Vendor Calcite > -- > > Key: BEAM-5820 > URL: https://issues.apache.org/jira/browse/BEAM-5820 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kai Jiang >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-6185) Upgrade Spark to version 2.4.0
[ https://issues.apache.org/jira/browse/BEAM-6185?focusedWorklogId=308266=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308266 ] ASF GitHub Bot logged work on BEAM-6185: Author: ASF GitHub Bot Created on: 07/Sep/19 02:33 Start Date: 07/Sep/19 02:33 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9487: [BEAM-6185]Change default docker images name URL: https://github.com/apache/beam/pull/9487#discussion_r321951929 ## File path: sdks/python/apache_beam/runners/portability/portable_runner.py ## @@ -84,13 +84,12 @@ def default_docker_image(): 'has Python %d.%d interpreter.' % ( sys.version_info[0], sys.version_info[1])) - # Perhaps also test if this was built? - image = ('{user}-docker-apache.bintray.io/beam/python' - '{version_suffix}:latest'.format( - user=os.environ['USER'], - version_suffix=version_suffix)) + image = ('apachebeam/python{version_suffix}_sdk:latest'.format( Review comment: To build on Thomas's idea, we could do something similar to what dataflowrunner does today (https://github.com/apache/beam/blob/9043b3e0bc138e63123d3fa55d4b7280eb96a82e/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py#L946) - If this is a released version on release branch (i.e. no 'dev' in version') default to image in hub. (release manager will have to build this container as part of the release anyway.) - If it is a dev version --> this is a question. Dataflowrunner's solution is to have a fixed image that is occasionally updated as needed or if provided a flag use that as an override. (Here we might just simplify and require a locally built version. This occasionally built image is convenient but is error-prone and a hassle.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308266) Time Spent: 3h 10m (was: 3h) > Upgrade Spark to version 2.4.0 > -- > > Key: BEAM-6185 > URL: https://issues.apache.org/jira/browse/BEAM-6185 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré >Priority: Major > Fix For: 2.12.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8173) Filesystems.matchSingleFileSpec throws away the actual failure exception
Kenneth Knowles created BEAM-8173: - Summary: Filesystems.matchSingleFileSpec throws away the actual failure exception Key: BEAM-8173 URL: https://issues.apache.org/jira/browse/BEAM-8173 Project: Beam Issue Type: Bug Components: sdk-java-core Reporter: Kenneth Knowles At [1] the result of a non-OK match causes an exception to be thrown. But the exception does not include the actual cause of the failure, so it cannot be efficiently debugged. It appears that the design of MatchResult is that it should call metadata() without bothering to check status, so that the underlying exception can be re-raised and caught and put in the chain of causes, as it should be. [1] https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java#L190 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.
[ https://issues.apache.org/jira/browse/BEAM-5878?focusedWorklogId=308283=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308283 ] ASF GitHub Bot logged work on BEAM-5878: Author: ASF GitHub Bot Created on: 07/Sep/19 03:33 Start Date: 07/Sep/19 03:33 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #9517: Revert "[BEAM-5878] support DoFns with Keyword-only arguments" URL: https://github.com/apache/beam/pull/9517 Reverts apache/beam#9237 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308283) Time Spent: 11h 20m (was: 11h 10m) > Support DoFns with Keyword-only arguments in Python 3. > -- > > Key: BEAM-5878 > URL: https://issues.apache.org/jira/browse/BEAM-5878 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: yoshiki obata >Priority: Minor > Fix For: 2.16.0 > > Time Spent: 11h 20m > Remaining Estimate: 0h > > Python 3.0 [adds a possibility|https://www.python.org/dev/peps/pep-3102/] to > define functions with keyword-only arguments. > Currently Beam does not handle them correctly. [~ruoyu] pointed out [one > place|https://github.com/apache/beam/blob/a56ce43109c97c739fa08adca45528c41e3c925c/sdks/python/apache_beam/typehints/decorators.py#L118] > in our codebase that we should fix: in Python in 3.0 inspect.getargspec() > will fail on functions with keyword-only arguments, but a new method > [inspect.getfullargspec()|https://docs.python.org/3/library/inspect.html#inspect.getfullargspec] > supports them. > There may be implications for our (best-effort) type-hints machinery. > We should also add a Py3-only unit tests that covers DoFn's with keyword-only > arguments once Beam Python 3 tests are in a good shape. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.
[ https://issues.apache.org/jira/browse/BEAM-5878?focusedWorklogId=308282=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308282 ] ASF GitHub Bot logged work on BEAM-5878: Author: ASF GitHub Bot Created on: 07/Sep/19 03:33 Start Date: 07/Sep/19 03:33 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9237: [BEAM-5878] support DoFns with Keyword-only arguments URL: https://github.com/apache/beam/pull/9237#issuecomment-529068152 One Beam SDK users who use master branch reported that monkey-patch caused their pipeline to fail with what appears to be an infinite recursion from new_save_reduce. If line 163 is removed, the problem disappears. The problem persists with the monkey patch, even if the body of new_save_reduce is changed to ``` StockPickler.save_reduce(self, func, args, *other_args, **kwargs) ``` @lazylynx I have not investigated that report yet but I'd suggest to revert this change in the mean time until we understand a better way to patch, which may not be necessary soon given your outstanding changes to dill. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308282) Time Spent: 11h 10m (was: 11h) > Support DoFns with Keyword-only arguments in Python 3. > -- > > Key: BEAM-5878 > URL: https://issues.apache.org/jira/browse/BEAM-5878 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: yoshiki obata >Priority: Minor > Fix For: 2.16.0 > > Time Spent: 11h 10m > Remaining Estimate: 0h > > Python 3.0 [adds a possibility|https://www.python.org/dev/peps/pep-3102/] to > define functions with keyword-only arguments. > Currently Beam does not handle them correctly. [~ruoyu] pointed out [one > place|https://github.com/apache/beam/blob/a56ce43109c97c739fa08adca45528c41e3c925c/sdks/python/apache_beam/typehints/decorators.py#L118] > in our codebase that we should fix: in Python in 3.0 inspect.getargspec() > will fail on functions with keyword-only arguments, but a new method > [inspect.getfullargspec()|https://docs.python.org/3/library/inspect.html#inspect.getfullargspec] > supports them. > There may be implications for our (best-effort) type-hints machinery. > We should also add a Py3-only unit tests that covers DoFn's with keyword-only > arguments once Beam Python 3 tests are in a good shape. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-6185) Upgrade Spark to version 2.4.0
[ https://issues.apache.org/jira/browse/BEAM-6185?focusedWorklogId=308285=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308285 ] ASF GitHub Bot logged work on BEAM-6185: Author: ASF GitHub Bot Created on: 07/Sep/19 03:34 Start Date: 07/Sep/19 03:34 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #9487: [BEAM-6185]Change default docker images name URL: https://github.com/apache/beam/pull/9487#discussion_r321953571 ## File path: sdks/python/apache_beam/runners/portability/portable_runner.py ## @@ -84,13 +84,12 @@ def default_docker_image(): 'has Python %d.%d interpreter.' % ( sys.version_info[0], sys.version_info[1])) - # Perhaps also test if this was built? - image = ('{user}-docker-apache.bintray.io/beam/python' - '{version_suffix}:latest'.format( - user=os.environ['USER'], - version_suffix=version_suffix)) + image = ('apachebeam/python{version_suffix}_sdk:latest'.format( Review comment: Thanks @aaltay and @tweise for comments. `--environment_config` option can be used to pass customized images when use Portable runner. The default image is used when no docker image is specified. My understanding is, from users perspective, they don't need to worry about creating docker images at local when they write pipelines, so it's ok to pull it from remote. From developers perspective, developers who are working on sdks are expected to know how portable runner is working, so as how docker is used. If they want to test their developing sdks, they should build an image at local and use it. The image name can be same as the default one, then the local image is used instead of pulling it from remote, or can pass it with `--environment_config` option if it has different name. I can add more comments here to explain how to pass customized images. What scenarios am I missing? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308285) Time Spent: 3h 20m (was: 3h 10m) > Upgrade Spark to version 2.4.0 > -- > > Key: BEAM-6185 > URL: https://issues.apache.org/jira/browse/BEAM-6185 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré >Priority: Major > Fix For: 2.12.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.
[ https://issues.apache.org/jira/browse/BEAM-5878?focusedWorklogId=308284=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308284 ] ASF GitHub Bot logged work on BEAM-5878: Author: ASF GitHub Bot Created on: 07/Sep/19 03:34 Start Date: 07/Sep/19 03:34 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9517: Revert "[BEAM-5878] support DoFns with Keyword-only arguments" URL: https://github.com/apache/beam/pull/9517#issuecomment-529068201 R: @lazylynx This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308284) Time Spent: 11.5h (was: 11h 20m) > Support DoFns with Keyword-only arguments in Python 3. > -- > > Key: BEAM-5878 > URL: https://issues.apache.org/jira/browse/BEAM-5878 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: yoshiki obata >Priority: Minor > Fix For: 2.16.0 > > Time Spent: 11.5h > Remaining Estimate: 0h > > Python 3.0 [adds a possibility|https://www.python.org/dev/peps/pep-3102/] to > define functions with keyword-only arguments. > Currently Beam does not handle them correctly. [~ruoyu] pointed out [one > place|https://github.com/apache/beam/blob/a56ce43109c97c739fa08adca45528c41e3c925c/sdks/python/apache_beam/typehints/decorators.py#L118] > in our codebase that we should fix: in Python in 3.0 inspect.getargspec() > will fail on functions with keyword-only arguments, but a new method > [inspect.getfullargspec()|https://docs.python.org/3/library/inspect.html#inspect.getfullargspec] > supports them. > There may be implications for our (best-effort) type-hints machinery. > We should also add a Py3-only unit tests that covers DoFn's with keyword-only > arguments once Beam Python 3 tests are in a good shape. -- This message was sent by Atlassian Jira (v8.3.2#803003)