[jira] [Work logged] (BEAM-9430) Migrate from ProcessContext#updateWatermark to WatermarkEstimators
[ https://issues.apache.org/jira/browse/BEAM-9430?focusedWorklogId=430515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430515 ] ASF GitHub Bot logged work on BEAM-9430: Author: ASF GitHub Bot Created on: 05/May/20 05:15 Start Date: 05/May/20 05:15 Worklog Time Spent: 10m Work Description: ihji commented on pull request #11607: URL: https://github.com/apache/beam/pull/11607#issuecomment-623860331 Shouldn't we also disable `IllegalArgumentException`s in the constructors? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430515) Time Spent: 5h 40m (was: 5.5h) > Migrate from ProcessContext#updateWatermark to WatermarkEstimators > -- > > Key: BEAM-9430 > URL: https://issues.apache.org/jira/browse/BEAM-9430 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Major > Labels: backward-incompatible > Fix For: 2.21.0 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > Current discussion underway in > [https://lists.apache.org/thread.html/r5d974b6a58bc04ff4c02682fda4ef68608121f1bf23a86e9d592ca6e%40%3Cdev.beam.apache.org%3E] > > Proposed API: [https://github.com/apache/beam/pull/10992] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9779) HL7v2IOWriteIT is flaky
[ https://issues.apache.org/jira/browse/BEAM-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099525#comment-17099525 ] Jacob Ferriero commented on BEAM-9779: -- [~chamikara] Has the changes in https://github.com/apache/beam/pull/11450 stabilized this test? should we close this? > HL7v2IOWriteIT is flaky > --- > > Key: BEAM-9779 > URL: https://issues.apache.org/jira/browse/BEAM-9779 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, test-failures >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Critical > Time Spent: 4h 10m > Remaining Estimate: 0h > > There seems to be a race condition somewhere in HL7v2IOWriteIT that causes > flakiness. > https://builds.apache.org/job/beam_PostCommit_Java/5947/ > https://builds.apache.org/job/beam_PostCommit_Java/5943/ > https://builds.apache.org/job/beam_PostCommit_Java/5942/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9449) Consider passing pipeline options for expansion service.
[ https://issues.apache.org/jira/browse/BEAM-9449?focusedWorklogId=430502=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430502 ] ASF GitHub Bot logged work on BEAM-9449: Author: ASF GitHub Bot Created on: 05/May/20 03:36 Start Date: 05/May/20 03:36 Worklog Time Spent: 10m Work Description: ihji commented on pull request #11574: URL: https://github.com/apache/beam/pull/11574#issuecomment-623816157 Thanks. Looks good to me overall. I think we should also consider adding optional `pipeline_options` argument to `ExternalTransform` given that each different expansion service needs different pipeline options. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430502) Time Spent: 40m (was: 0.5h) > Consider passing pipeline options for expansion service. > > > Key: BEAM-9449 > URL: https://issues.apache.org/jira/browse/BEAM-9449 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Robert Bradshaw >Assignee: Brian Hulette >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment
[ https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=430499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430499 ] ASF GitHub Bot logged work on BEAM-9383: Author: ASF GitHub Bot Created on: 05/May/20 03:11 Start Date: 05/May/20 03:11 Worklog Time Spent: 10m Work Description: ihji commented on a change in pull request #11039: URL: https://github.com/apache/beam/pull/11039#discussion_r419841344 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -462,7 +462,8 @@ def run_pipeline(self, pipeline, options): use_fnapi = apiclient._use_fnapi(options) from apache_beam.transforms import environments default_environment = environments.DockerEnvironment.from_container_image( -apiclient.get_container_image_from_options(options)) +apiclient.get_container_image_from_options(options), +artifacts=environments.python_sdk_dependencies(options)) Review comment: We need pipeline option to populate artifacts. I think we could either use `from_options` instead and override container image or just leave as is. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430499) Time Spent: 8h (was: 7h 50m) > Staging Dataflow artifacts from environment > --- > > Key: BEAM-9383 > URL: https://issues.apache.org/jira/browse/BEAM-9383 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 8h > Remaining Estimate: 0h > > Staging Dataflow artifacts from environment -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9679) Core Transforms | Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099501#comment-17099501 ] Damon Douglas commented on BEAM-9679: - PR [https://github.com/apache/beam/pull/11564] approved > Core Transforms | Go SDK Code Katas > --- > > Key: BEAM-9679 > URL: https://issues.apache.org/jira/browse/BEAM-9679 > Project: Beam > Issue Type: Sub-task > Components: katas, sdk-go >Reporter: Damon Douglas >Assignee: Damon Douglas >Priority: Major > > A kata devoted to core beam transforms patterns after > [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms] > where the take away is an individual's ability to master the following using > an Apache Beam pipeline using the Golang SDK. > * Branching > * CoGroupByKey > * Combine > * Composite Transform > * DoFn Additional Parameters > * Flatten > * GroupByKey > * [Map|[https://github.com/apache/beam/pull/11564]] > * Partition > * Side Input -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9679) Core Transforms | Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damon Douglas updated BEAM-9679: Description: A kata devoted to core beam transforms patterns after [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms] where the take away is an individual's ability to master the following using an Apache Beam pipeline using the Golang SDK. * Branching * CoGroupByKey * Combine * Composite Transform * DoFn Additional Parameters * Flatten * GroupByKey * [Map|[https://github.com/apache/beam/pull/11564]] * Partition * Side Input was: A kata devoted to core beam transforms patterns after [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms] where the take away is an individual's ability to master the following using an Apache Beam pipeline using the Golang SDK. * Branching * CoGroupByKey * Combine * Composite Transform * DoFn Additional Parameters * Flatten * GroupByKey * [Map|[https://github.com/damondouglas/beam/tree/BEAM-9679-core-transforms-map-kata-go]] * Partition * Side Input > Core Transforms | Go SDK Code Katas > --- > > Key: BEAM-9679 > URL: https://issues.apache.org/jira/browse/BEAM-9679 > Project: Beam > Issue Type: Sub-task > Components: katas, sdk-go >Reporter: Damon Douglas >Assignee: Damon Douglas >Priority: Major > > A kata devoted to core beam transforms patterns after > [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms] > where the take away is an individual's ability to master the following using > an Apache Beam pipeline using the Golang SDK. > * Branching > * CoGroupByKey > * Combine > * Composite Transform > * DoFn Additional Parameters > * Flatten > * GroupByKey > * [Map|[https://github.com/apache/beam/pull/11564]] > * Partition > * Side Input -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment
[ https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=430498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430498 ] ASF GitHub Bot logged work on BEAM-9383: Author: ASF GitHub Bot Created on: 05/May/20 03:04 Start Date: 05/May/20 03:04 Worklog Time Spent: 10m Work Description: ihji commented on a change in pull request #11039: URL: https://github.com/apache/beam/pull/11039#discussion_r419839927 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py ## @@ -102,7 +102,8 @@ def setUp(self): '--staging_location=ignored', '--temp_location=/dev/null', '--no_auth', -'--dry_run=True' +'--dry_run=True', +'--sdk_location=container' Review comment: The test tries to download a dev version of apache-beam dependency (which indeed does not exist in pypi) when it constructs the environment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430498) Time Spent: 7h 50m (was: 7h 40m) > Staging Dataflow artifacts from environment > --- > > Key: BEAM-9383 > URL: https://issues.apache.org/jira/browse/BEAM-9383 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 7h 50m > Remaining Estimate: 0h > > Staging Dataflow artifacts from environment -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment
[ https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=430497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430497 ] ASF GitHub Bot logged work on BEAM-9383: Author: ASF GitHub Bot Created on: 05/May/20 02:57 Start Date: 05/May/20 02:57 Worklog Time Spent: 10m Work Description: ihji commented on a change in pull request #11039: URL: https://github.com/apache/beam/pull/11039#discussion_r419838456 ## File path: runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java ## @@ -336,25 +323,26 @@ public DataflowPackage stageToFile( final AtomicInteger numCached = new AtomicInteger(0); List> destinationPackages = new ArrayList<>(); -for (String classpathElement : classpathElements) { - DataflowPackage sourcePackage = new DataflowPackage(); - if (classpathElement.contains("=")) { -String[] components = classpathElement.split("=", 2); -sourcePackage.setName(components[0]); -sourcePackage.setLocation(components[1]); - } else { -sourcePackage.setName(null); -sourcePackage.setLocation(classpathElement); +for (StagedFile classpathElement : classpathElements) { + DataflowPackage targetPackage = classpathElement.getStagedPackage(); + String source = classpathElement.getSource(); + if (source.contains("=")) { Review comment: removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430497) Time Spent: 7h 40m (was: 7.5h) > Staging Dataflow artifacts from environment > --- > > Key: BEAM-9383 > URL: https://issues.apache.org/jira/browse/BEAM-9383 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 7h 40m > Remaining Estimate: 0h > > Staging Dataflow artifacts from environment -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment
[ https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=430496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430496 ] ASF GitHub Bot logged work on BEAM-9383: Author: ASF GitHub Bot Created on: 05/May/20 02:56 Start Date: 05/May/20 02:56 Worklog Time Spent: 10m Work Description: ihji commented on a change in pull request #11039: URL: https://github.com/apache/beam/pull/11039#discussion_r419838363 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1271,6 +1271,11 @@ message DeferredArtifactPayload { message ArtifactStagingToRolePayload { // A generated staged name (relative path under staging directory). string staged_name = 1; + + // (Optional) An artifact name when a runner supports it. + // For example, DataflowRunner requires predefined names for some artifacts + // such as "dataflow-worker.jar", "windmill_main". + string alias_name = 2; Review comment: This is Dataflow specific requirement. `DataflowPackage` model has two separate fields for `location` and `name`. `staged_name` and `alias_name` correspond to `location` and `name` respectively. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430496) Time Spent: 7.5h (was: 7h 20m) > Staging Dataflow artifacts from environment > --- > > Key: BEAM-9383 > URL: https://issues.apache.org/jira/browse/BEAM-9383 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 7.5h > Remaining Estimate: 0h > > Staging Dataflow artifacts from environment -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment
[ https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=430494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430494 ] ASF GitHub Bot logged work on BEAM-9383: Author: ASF GitHub Bot Created on: 05/May/20 02:27 Start Date: 05/May/20 02:27 Worklog Time Spent: 10m Work Description: ihji commented on a change in pull request #11039: URL: https://github.com/apache/beam/pull/11039#discussion_r419832339 ## File path: runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java ## @@ -772,6 +783,88 @@ private Debuggee registerDebuggee(CloudDebugger debuggerClient, String uniquifie } } + private List stageArtifacts(RunnerApi.Pipeline pipeline) { +ImmutableList.Builder filesToStageBuilder = ImmutableList.builder(); +for (Map.Entry entry : +pipeline.getComponents().getEnvironmentsMap().entrySet()) { + for (RunnerApi.ArtifactInformation info : entry.getValue().getDependenciesList()) { +if (!BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn())) { + throw new RuntimeException( + String.format("unsupported artifact type %s", info.getTypeUrn())); +} +RunnerApi.ArtifactFilePayload filePayload; +try { + filePayload = RunnerApi.ArtifactFilePayload.parseFrom(info.getTypePayload()); +} catch (InvalidProtocolBufferException e) { + throw new RuntimeException("Error parsing artifact file payload.", e); +} +if (!BeamUrns.getUrn(RunnerApi.StandardArtifacts.Roles.STAGING_TO) +.equals(info.getRoleUrn())) { + throw new RuntimeException( + String.format("unsupported artifact role %s", info.getRoleUrn())); +} +RunnerApi.ArtifactStagingToRolePayload stagingPayload; +try { + stagingPayload = RunnerApi.ArtifactStagingToRolePayload.parseFrom(info.getRolePayload()); +} catch (InvalidProtocolBufferException e) { + throw new RuntimeException("Error parsing artifact staging_to role payload.", e); +} +DataflowPackage target = new DataflowPackage(); +target.setLocation(stagingPayload.getStagedName()); +if (!Strings.isNullOrEmpty(stagingPayload.getAliasName())) { + target.setName(stagingPayload.getAliasName()); +} +filesToStageBuilder.add(StagedFile.of(filePayload.getPath(), target)); + } +} +return options.getStager().stageFiles(filesToStageBuilder.build()); + } + + private List getDefaultArtifacts() { +ImmutableList.Builder pathsToStageBuilder = ImmutableList.builder(); +ImmutableMap.Builder aliasMapBuilder = ImmutableMap.builder(); +String windmillBinary = + options.as(DataflowPipelineDebugOptions.class).getOverrideWindmillBinary(); +String dataflowWorkerJar = options.getDataflowWorkerJar(); +if (dataflowWorkerJar != null && !dataflowWorkerJar.isEmpty()) { + // Put the user specified worker jar at the start of the classpath, to be consistent with the + // built in worker order. + pathsToStageBuilder.add(dataflowWorkerJar); + aliasMapBuilder.put(dataflowWorkerJar, "dataflow-worker.jar"); +} +for (String path : options.getFilesToStage()) { + if (path.contains("=")) { Review comment: Yes. This syntax is only supported in Dataflow runner. `DataflowPackage` has a separate field `name` in addition to `location` and "=" separator allows to prefix `name` to the location of the source e.g. "dataflow.jar=/tmp/foo.jar". I could remove this special syntax but I decided to keep it since it's already exposed to users via `--filesToStage` option so removing it may cause backward compatibility issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430494) Time Spent: 7h 20m (was: 7h 10m) > Staging Dataflow artifacts from environment > --- > > Key: BEAM-9383 > URL: https://issues.apache.org/jira/browse/BEAM-9383 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 7h 20m > Remaining Estimate: 0h > > Staging Dataflow artifacts from environment -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8949) Add Spanner IO Integration Test for Python
[ https://issues.apache.org/jira/browse/BEAM-8949?focusedWorklogId=430493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430493 ] ASF GitHub Bot logged work on BEAM-8949: Author: ASF GitHub Bot Created on: 05/May/20 02:23 Start Date: 05/May/20 02:23 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11210: URL: https://github.com/apache/beam/pull/11210#issuecomment-623801499 @mszb - Do you know why the test is failing? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430493) Time Spent: 8.5h (was: 8h 20m) > Add Spanner IO Integration Test for Python > -- > > Key: BEAM-8949 > URL: https://issues.apache.org/jira/browse/BEAM-8949 > Project: Beam > Issue Type: Test > Components: io-py-gcp >Reporter: Shoaib Zafar >Assignee: Shoaib Zafar >Priority: Major > Time Spent: 8.5h > Remaining Estimate: 0h > > Spanner IO (Python SDK) contains PTransform which uses the BatchAPI to read > from the spanner. Currently, it only contains direct runner unit tests. In > order to make this functionality available for the users, integration tests > also need to be added. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8910) Use AVRO instead of JSON in BigQuery bounded source.
[ https://issues.apache.org/jira/browse/BEAM-8910?focusedWorklogId=430492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430492 ] ASF GitHub Bot logged work on BEAM-8910: Author: ASF GitHub Bot Created on: 05/May/20 02:18 Start Date: 05/May/20 02:18 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11086: URL: https://github.com/apache/beam/pull/11086#issuecomment-623800613 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430492) Time Spent: 9h 50m (was: 9h 40m) > Use AVRO instead of JSON in BigQuery bounded source. > > > Key: BEAM-8910 > URL: https://issues.apache.org/jira/browse/BEAM-8910 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Kamil Wasilewski >Assignee: Pablo Estrada >Priority: Minor > Time Spent: 9h 50m > Remaining Estimate: 0h > > The proposed BigQuery bounded source in Python SDK (see PR: > [https://github.com/apache/beam/pull/9772)] uses a BigQuery export job to > take a snapshot of the table and read from each produced JSON file. A > performance improvement can be gain by switching to AVRO instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment
[ https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=430491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430491 ] ASF GitHub Bot logged work on BEAM-9383: Author: ASF GitHub Bot Created on: 05/May/20 02:16 Start Date: 05/May/20 02:16 Worklog Time Spent: 10m Work Description: ihji commented on a change in pull request #11039: URL: https://github.com/apache/beam/pull/11039#discussion_r419829971 ## File path: runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java ## @@ -784,7 +877,25 @@ public DataflowPipelineJob run(Pipeline pipeline) { "Executing pipeline on the Dataflow Service, which will have billing implications " + "related to Google Compute Engine usage and other Google Cloud Services."); -List packages = options.getStager().stageDefaultFiles(); +// Capture the sdkComponents for look up during step translations +SdkComponents sdkComponents = SdkComponents.create(); + +DataflowPipelineOptions dataflowOptions = options.as(DataflowPipelineOptions.class); +String workerHarnessContainerImageURL = DataflowRunner.getContainerImageForJob(dataflowOptions); +RunnerApi.Environment defaultEnvironmentForDataflow = +Environments.createDockerEnvironment(workerHarnessContainerImageURL); + +sdkComponents.registerEnvironment( +defaultEnvironmentForDataflow +.toBuilder() +.addAllDependencies(getDefaultArtifacts()) +.build()); + +RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, sdkComponents, true); + +LOG.debug("Portable pipeline proto:\n{}", TextFormat.printToString(pipelineProto)); Review comment: This debug log is not new. It's just relocated. Do you think it would be better to remove this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430491) Time Spent: 7h 10m (was: 7h) > Staging Dataflow artifacts from environment > --- > > Key: BEAM-9383 > URL: https://issues.apache.org/jira/browse/BEAM-9383 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > Staging Dataflow artifacts from environment -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment
[ https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=430490=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430490 ] ASF GitHub Bot logged work on BEAM-9383: Author: ASF GitHub Bot Created on: 05/May/20 02:14 Start Date: 05/May/20 02:14 Worklog Time Spent: 10m Work Description: ihji commented on a change in pull request #11039: URL: https://github.com/apache/beam/pull/11039#discussion_r419829659 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java ## @@ -210,56 +209,55 @@ public static Environment createProcessEnvironment( } } - private static List getArtifacts(List stagingFiles) { -Set pathsToStage = Sets.newHashSet(stagingFiles); + public static List getArtifacts( + List stagingFiles, StagingFileNameGenerator generator) { ImmutableList.Builder artifactsBuilder = ImmutableList.builder(); -for (String path : pathsToStage) { +for (String path : ImmutableSet.copyOf(stagingFiles)) { Review comment: `ImmutableSet` preserves the order but I think we don't need to make a copy here. Will use `LinkedHashSet` instead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430490) Time Spent: 7h (was: 6h 50m) > Staging Dataflow artifacts from environment > --- > > Key: BEAM-9383 > URL: https://issues.apache.org/jira/browse/BEAM-9383 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > > Staging Dataflow artifacts from environment -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9886) ReadFromBigQuery should auto-infer which project to bill for BQ exports
Pablo Estrada created BEAM-9886: --- Summary: ReadFromBigQuery should auto-infer which project to bill for BQ exports Key: BEAM-9886 URL: https://issues.apache.org/jira/browse/BEAM-9886 Project: Beam Issue Type: Bug Components: io-py-gcp Reporter: Pablo Estrada Assignee: Pablo Estrada -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9840) Support for Parameterized Types when converting from HCatRecords to Rows in HCatalogIO
[ https://issues.apache.org/jira/browse/BEAM-9840?focusedWorklogId=430488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430488 ] ASF GitHub Bot logged work on BEAM-9840: Author: ASF GitHub Bot Created on: 05/May/20 02:05 Start Date: 05/May/20 02:05 Worklog Time Spent: 10m Work Description: rahul8383 commented on a change in pull request #11569: URL: https://github.com/apache/beam/pull/11569#discussion_r419827790 ## File path: sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/SchemaUtilsTest.java ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.hcatalog; + +import java.util.ArrayList; +import java.util.List; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.hadoop.hive.metastore.api.FieldSchema; +import org.junit.Assert; +import org.junit.Test; + +public class SchemaUtilsTest { + @Test + public void testParameterizedTypesToBeamTypes() { +List listOfFieldSchema = new ArrayList<>(); +listOfFieldSchema.add(new FieldSchema("parameterizedChar", "char(10)", null)); +listOfFieldSchema.add(new FieldSchema("parameterizedVarchar", "varchar(100)", null)); +listOfFieldSchema.add(new FieldSchema("parameterizedDecimal", "decimal(30,16)", null)); + +Schema expectedSchema = +Schema.builder() +.addNullableField("parameterizedChar", Schema.FieldType.STRING) +.addNullableField("parameterizedVarchar", Schema.FieldType.STRING) +.addNullableField("parameterizedDecimal", Schema.FieldType.DECIMAL) Review comment: Thanks @TheNeuralBit for the review. I am adding logical types in `schemas.logicaltypes` called `VariableLengthBytes`, `FixedLengthString`, `VariableLengthString`, `LogicalDecimal` as part of #11581 . I will take up the task of mapping these to logical types once my other PR gets merged. I also hope that #11272 get merged by then, so that I can use `SqlTypes.DATE` logical type. Can you please create a JIRA ticket and assign it to me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430488) Time Spent: 1h (was: 50m) > Support for Parameterized Types when converting from HCatRecords to Rows in > HCatalogIO > -- > > Key: BEAM-9840 > URL: https://issues.apache.org/jira/browse/BEAM-9840 > Project: Beam > Issue Type: Improvement > Components: io-java-hcatalog >Reporter: Rahul Patwari >Assignee: Rahul Patwari >Priority: Major > Fix For: 2.22.0 > > Time Spent: 1h > Remaining Estimate: 0h > > In Hive, to use CHAR and VARCHAR as the data type for a column, the types > have to be parameterized with the length of the character sequence. Refer > [https://github.com/apache/hive/blob/f37c5de6c32b9395d1b34fa3c02ed06d1bfbf6eb/serde/src/test/org/apache/hadoop/hive/serde2/typeinfo/TestTypeInfoUtils.java#L68] > In addition, for the DECIMAL data type, custom precision and scale can be > provided as parameters. > A user has faced an Exception while reading data from a table created in Hive > with a parameterized DECIMAL data type. Refer > [https://lists.apache.org/thread.html/r159012fbefce24d734096e3ec24ecd112de5f89b8029e57147d233b0%40%3Cuser.beam.apache.org%3E]. > This ticket is created to support converting from HcatRecords to Rows when > the HCatRecords have parameterized types. > To Support Parameterized data types, we can make use of HcatFieldSchema. > Refer >
[jira] [Updated] (BEAM-9886) ReadFromBigQuery should auto-infer which project to bill for BQ exports
[ https://issues.apache.org/jira/browse/BEAM-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-9886: -- Status: Open (was: Triage Needed) > ReadFromBigQuery should auto-infer which project to bill for BQ exports > --- > > Key: BEAM-9886 > URL: https://issues.apache.org/jira/browse/BEAM-9886 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9876) Migrate the Beam website from Jekyll to Hugo to enable localization of the site content
[ https://issues.apache.org/jira/browse/BEAM-9876?focusedWorklogId=430487=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430487 ] ASF GitHub Bot logged work on BEAM-9876: Author: ASF GitHub Bot Created on: 05/May/20 02:02 Start Date: 05/May/20 02:02 Worklog Time Spent: 10m Work Description: aaltay commented on a change in pull request #11554: URL: https://github.com/apache/beam/pull/11554#discussion_r419826968 ## File path: website/www/site/content/en/community/contact-us.md ## @@ -0,0 +1,47 @@ +--- +title: "Contact Us" +aliases: + - /community/ + - /use/issue-tracking/ + - /use/mailing-lists/ + - /get-started/support/ +--- + + +# Contact Us + +There are many ways to reach the Beam user and developer communities - use +whichever one seems best. + + Review comment: What is the problem here? What is markdownify and what is superscripts ? ## File path: website/www/site/content/en/documentation/transforms/java/elementwise/regex.md ## @@ -0,0 +1,34 @@ +--- +title: "Regex" +--- + +# Regex + +https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Regex.html;> + https://beam.apache.org/images/logos/sdks/java.png; width="20px" height="20px" + alt="Javadoc" /> + Javadoc + + + Review comment: Why do we multiple breaks here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430487) Time Spent: 50m (was: 40m) > Migrate the Beam website from Jekyll to Hugo to enable localization of the > site content > --- > > Key: BEAM-9876 > URL: https://issues.apache.org/jira/browse/BEAM-9876 > Project: Beam > Issue Type: Task > Components: website >Reporter: Aizhamal Nurmamat kyzy >Assignee: Aizhamal Nurmamat kyzy >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Enable internationalization of the Apache Beam website to increase the reach > of the project, and facilitate adoption and growth of its community. > The proposal was to do this by migrating the current Apache Beam website from > Jekyll do Hugo [1]. Hugo supports internationalization out-of-the-box, making > it easier both for contributors and maintainers support the > internationalization effort. > The further discussion on implementation can be viewed here [2] > [1] > [https://lists.apache.org/thread.html/rfab4cc1411318c3f4667bee051df68f37be11846ada877f3576c41a9%40%3Cdev.beam.apache.org%3E] > [2] > [https://lists.apache.org/thread.html/r6b999b6d7d1f6cbb94e16bb2deed2b65098a6b14c4ac98707fe0c36a%40%3Cdev.beam.apache.org%3E] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9876) Migrate the Beam website from Jekyll to Hugo to enable localization of the site content
[ https://issues.apache.org/jira/browse/BEAM-9876?focusedWorklogId=430485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430485 ] ASF GitHub Bot logged work on BEAM-9876: Author: ASF GitHub Bot Created on: 05/May/20 01:40 Start Date: 05/May/20 01:40 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #11554: URL: https://github.com/apache/beam/pull/11554#discussion_r419822575 ## File path: website/www/site/content/en/documentation/transforms/python/elementwise/pardo.md ## @@ -0,0 +1,142 @@ +--- +title: "Partition" Review comment: Another copy of "Partiton." (There may be others, we should verify we haven't lost content in the move.) ## File path: website/www/site/content/en/documentation/transforms/python/elementwise/map.md ## @@ -1,8 +1,5 @@ --- -layout: section -title: "Map" -permalink: /documentation/transforms/python/elementwise/map/ -section_menu: section-menu/documentation.html +title: "Partition" Review comment: Is this Partition or Map? ## File path: website/www/site/content/en/documentation/transforms/python/elementwise/kvswap.md ## @@ -0,0 +1,142 @@ +--- +title: "Partition" Review comment: This seems to be the wrong page. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430485) Time Spent: 40m (was: 0.5h) > Migrate the Beam website from Jekyll to Hugo to enable localization of the > site content > --- > > Key: BEAM-9876 > URL: https://issues.apache.org/jira/browse/BEAM-9876 > Project: Beam > Issue Type: Task > Components: website >Reporter: Aizhamal Nurmamat kyzy >Assignee: Aizhamal Nurmamat kyzy >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Enable internationalization of the Apache Beam website to increase the reach > of the project, and facilitate adoption and growth of its community. > The proposal was to do this by migrating the current Apache Beam website from > Jekyll do Hugo [1]. Hugo supports internationalization out-of-the-box, making > it easier both for contributors and maintainers support the > internationalization effort. > The further discussion on implementation can be viewed here [2] > [1] > [https://lists.apache.org/thread.html/rfab4cc1411318c3f4667bee051df68f37be11846ada877f3576c41a9%40%3Cdev.beam.apache.org%3E] > [2] > [https://lists.apache.org/thread.html/r6b999b6d7d1f6cbb94e16bb2deed2b65098a6b14c4ac98707fe0c36a%40%3Cdev.beam.apache.org%3E] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
[ https://issues.apache.org/jira/browse/BEAM-9856?focusedWorklogId=430483=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430483 ] ASF GitHub Bot logged work on BEAM-9856: Author: ASF GitHub Bot Created on: 05/May/20 01:36 Start Date: 05/May/20 01:36 Worklog Time Spent: 10m Work Description: jaketf commented on pull request #11596: URL: https://github.com/apache/beam/pull/11596#issuecomment-623793025 @chamikaramj @lukecwik Apologies, I believe the the NPE was a user error on my part. I've been able to revert my changes to to OffsetRangeTracker without reintroducing the NPE. To help future users as foolish as me, to get a less confusing NPE, I suggest we do something like I put in 8891292. But this is definitely not a core issue in OffsetRangeTracker. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430483) Time Spent: 2h 20m (was: 2h 10m) > HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization > -- > > Key: BEAM-9856 > URL: https://issues.apache.org/jira/browse/BEAM-9856 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 2h 20m > Remaining Estimate: 0h > > Currently the List Messages API paginates through in a single ProcessElement > Call. > However we could get a restriction based on createTime using Messages.List > filter and orderby. > > This is inline with the future roadmap of HL7v2 bulk export API becomes > available that should allow splitting on (e.g. create time dimension). > Leveraging this bulk export might be a future optimization to explore. > > This could take one of two forms: > 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make > optimization the runner's problem, potentially unnecessarily complex for this > use case ) > 2. static splitting on some time partition e.g. finding the earliest > createTime and emitting a PCollection of 1 hour partitions and paginating > through each hour of data w/ in the time frame that the store spans, in a > separate ProcessElement. (easy to implement but will likely have hot keys / > stragglers based on "busy hours") > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9881) refactor KafkaIOIT for reusing pipeline
[ https://issues.apache.org/jira/browse/BEAM-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee resolved BEAM-9881. --- Fix Version/s: Not applicable Resolution: Won't Do > refactor KafkaIOIT for reusing pipeline > --- > > Key: BEAM-9881 > URL: https://issues.apache.org/jira/browse/BEAM-9881 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: Not applicable > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
[ https://issues.apache.org/jira/browse/BEAM-9856?focusedWorklogId=430474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430474 ] ASF GitHub Bot logged work on BEAM-9856: Author: ASF GitHub Bot Created on: 05/May/20 00:46 Start Date: 05/May/20 00:46 Worklog Time Spent: 10m Work Description: jaketf edited a comment on pull request #11596: URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796 @chamikaramj thanks for the suggestion. I will look into using BoundedSource API in a separate PR and we can compare. Unfortunately, regular DoFns don't cut it because a single elements outputs are committed atomically (see this [conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)). Basically we have one input element (HL7v2 store) exploding to many, many output elements (all the messages in that store) in a single ProcessElement call. I'm trying to explore strategies for splitting up this listing (because due to pagination nature of messages.list it is single threaded). I originally chose splittable DoFn over BoundedSource based off the sentiment of this statement: > **Coding against the Source API involves a lot of boilerplate and is error-prone**, and it does not compose well with the rest of the Beam model because a Source can appear only at the root of a pipeline. - https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html Using splittable DoFn also made reading from multiple HL7v2 stores come for free (e.g. if you had several regional HL7v2 stores and your use case was to read from them all and write to a single multi-regional store). This admittedly a rather contrived use case. The blog also mentions - A Source can not emit an additional output (for example, records that failed to parse). - Healthcare customers feeding requirements for this plugin want DLQ on all sinks and sources. To be consistent with the streaming API provided in `HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I believe this is more of a nice to have for batch use cases (because there's no room for passing ListMessages bad messages IDs like there is in HL7v2IO.Read). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430474) Time Spent: 2h 10m (was: 2h) > HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization > -- > > Key: BEAM-9856 > URL: https://issues.apache.org/jira/browse/BEAM-9856 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently the List Messages API paginates through in a single ProcessElement > Call. > However we could get a restriction based on createTime using Messages.List > filter and orderby. > > This is inline with the future roadmap of HL7v2 bulk export API becomes > available that should allow splitting on (e.g. create time dimension). > Leveraging this bulk export might be a future optimization to explore. > > This could take one of two forms: > 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make > optimization the runner's problem, potentially unnecessarily complex for this > use case ) > 2. static splitting on some time partition e.g. finding the earliest > createTime and emitting a PCollection of 1 hour partitions and paginating > through each hour of data w/ in the time frame that the store spans, in a > separate ProcessElement. (easy to implement but will likely have hot keys / > stragglers based on "busy hours") > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9650) Add consistent slowly changing side inputs support
[ https://issues.apache.org/jira/browse/BEAM-9650?focusedWorklogId=430473=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430473 ] ASF GitHub Bot logged work on BEAM-9650: Author: ASF GitHub Bot Created on: 05/May/20 00:42 Start Date: 05/May/20 00:42 Worklog Time Spent: 10m Work Description: aaltay commented on a change in pull request #11582: URL: https://github.com/apache/beam/pull/11582#discussion_r419808099 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -283,6 +284,8 @@ def compute_table_name(row): 'BigQuerySink', 'WriteToBigQuery', 'ReadFromBigQuery', +'ReadAllFromBigQueryRequest', Review comment: Does the java api use a similar concept? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430473) Time Spent: 10.5h (was: 10h 20m) > Add consistent slowly changing side inputs support > -- > > Key: BEAM-9650 > URL: https://issues.apache.org/jira/browse/BEAM-9650 > Project: Beam > Issue Type: Bug > Components: io-ideas >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 10.5h > Remaining Estimate: 0h > > Add implementation for slowly changing dimentions based on [design > doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9875) PR11585 breaks python2PostCommit cross-lang suite
[ https://issues.apache.org/jira/browse/BEAM-9875?focusedWorklogId=430472=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430472 ] ASF GitHub Bot logged work on BEAM-9875: Author: ASF GitHub Bot Created on: 05/May/20 00:12 Start Date: 05/May/20 00:12 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11603: URL: https://github.com/apache/beam/pull/11603#issuecomment-623772817 R: @ihji This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430472) Time Spent: 1h (was: 50m) > PR11585 breaks python2PostCommit cross-lang suite > - > > Key: BEAM-9875 > URL: https://issues.apache.org/jira/browse/BEAM-9875 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Kyle Weaver >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Build scan: https://scans.gradle.com/s/6s6vse2zqr5x4 > Typical error msg: > Traceback (most recent call last): > File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main > "__main__", fname, loader, pkg_name) > File "/usr/lib/python2.7/runpy.py", line 72, in _run_code > exec code in run_globals > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2/src/sdks/python/apache_beam/examples/wordcount_xlang.py", > line 137, in > main() > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2/src/sdks/python/apache_beam/examples/wordcount_xlang.py", > line 128, in main > p.runner.create_job_service(pipeline_options) > File "apache_beam/runners/portability/portable_runner.py", line 328, in > create_job_service > server = self.default_job_server(options) > File "apache_beam/runners/portability/portable_runner.py", line 307, in > default_job_server > 'You must specify a --job_endpoint when using --runner=PortableRunner. ' > NotImplementedError: You must specify a --job_endpoint when using > --runner=PortableRunner. Alternatively, you may specify which portable runner > you intend to use, such as --runner=FlinkRunner or --runner=SparkRunner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
[ https://issues.apache.org/jira/browse/BEAM-9856?focusedWorklogId=430468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430468 ] ASF GitHub Bot logged work on BEAM-9856: Author: ASF GitHub Bot Created on: 04/May/20 23:57 Start Date: 04/May/20 23:57 Worklog Time Spent: 10m Work Description: jaketf edited a comment on pull request #11596: URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796 @chamikaramj thanks for the suggestion. I will look into using BoundedSource API in a separate PR and we can compare. Unfortunately, regular DoFns don't cut it because a single elements outputs are committed atomically (see this [conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)). Basically we have one input element (HL7v2 store) exploding to many, many output elements (all the messages in that store) in a single ProcessElement call. I'm trying to explore strategies for splitting up this listing. I originally chose splittable DoFn over BoundedSource based off the sentiment of this statement: > **Coding against the Source API involves a lot of boilerplate and is error-prone**, and it does not compose well with the rest of the Beam model because a Source can appear only at the root of a pipeline. - https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html Using splittable DoFn also made reading from multiple HL7v2 stores come for free (e.g. if you had several regional HL7v2 stores and your use case was to read from them all and write to a single multi-regional store). This admittedly a rather contrived use case. The blog also mentions - A Source can not emit an additional output (for example, records that failed to parse). - Healthcare customers feeding requirements for this plugin want DLQ on all sinks and sources. To be consistent with the streaming API provided in `HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I believe this is more of a nice to have for batch use cases (because there's no room for passing ListMessages bad messages IDs like there is in HL7v2IO.Read). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430468) Time Spent: 2h (was: 1h 50m) > HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization > -- > > Key: BEAM-9856 > URL: https://issues.apache.org/jira/browse/BEAM-9856 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > Currently the List Messages API paginates through in a single ProcessElement > Call. > However we could get a restriction based on createTime using Messages.List > filter and orderby. > > This is inline with the future roadmap of HL7v2 bulk export API becomes > available that should allow splitting on (e.g. create time dimension). > Leveraging this bulk export might be a future optimization to explore. > > This could take one of two forms: > 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make > optimization the runner's problem, potentially unnecessarily complex for this > use case ) > 2. static splitting on some time partition e.g. finding the earliest > createTime and emitting a PCollection of 1 hour partitions and paginating > through each hour of data w/ in the time frame that the store spans, in a > separate ProcessElement. (easy to implement but will likely have hot keys / > stragglers based on "busy hours") > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
[ https://issues.apache.org/jira/browse/BEAM-9856?focusedWorklogId=430467=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430467 ] ASF GitHub Bot logged work on BEAM-9856: Author: ASF GitHub Bot Created on: 04/May/20 23:54 Start Date: 04/May/20 23:54 Worklog Time Spent: 10m Work Description: jaketf edited a comment on pull request #11596: URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796 @chamikaramj thanks for the suggestion. I will look into using BoundedSource API in a separate PR and we can compare. Unfortunately, regular DoFns don't cut it because a single elements outputs are committed atomically (see this [conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)). Basically we have one input element (HL7v2 store) exploding to many, many output elements (all the messages in that store) in a single ProcessElement call. I'm trying to explore strategies for splitting up this listing. I originally chose splittable DoFn over BoundedSource based off the sentiment of this statement: > **Coding against the Source API involves a lot of boilerplate and is error-prone**, and it does not compose well with the rest of the Beam model because a Source can appear only at the root of a pipeline. - https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html The blog also mentions - A Source can not emit an additional output (for example, records that failed to parse). - Healthcare customers feeding requirements for this plugin want DLQ on all sinks and sources. To be consistent with the streaming API provided in `HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I believe this is more of a nice to have for batch use cases (because there's no room for passing ListMessages bad messages IDs like there is in HL7v2IO.Read). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430467) Time Spent: 1h 50m (was: 1h 40m) > HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization > -- > > Key: BEAM-9856 > URL: https://issues.apache.org/jira/browse/BEAM-9856 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently the List Messages API paginates through in a single ProcessElement > Call. > However we could get a restriction based on createTime using Messages.List > filter and orderby. > > This is inline with the future roadmap of HL7v2 bulk export API becomes > available that should allow splitting on (e.g. create time dimension). > Leveraging this bulk export might be a future optimization to explore. > > This could take one of two forms: > 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make > optimization the runner's problem, potentially unnecessarily complex for this > use case ) > 2. static splitting on some time partition e.g. finding the earliest > createTime and emitting a PCollection of 1 hour partitions and paginating > through each hour of data w/ in the time frame that the store spans, in a > separate ProcessElement. (easy to implement but will likely have hot keys / > stragglers based on "busy hours") > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
[ https://issues.apache.org/jira/browse/BEAM-9856?focusedWorklogId=430466=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430466 ] ASF GitHub Bot logged work on BEAM-9856: Author: ASF GitHub Bot Created on: 04/May/20 23:52 Start Date: 04/May/20 23:52 Worklog Time Spent: 10m Work Description: jaketf edited a comment on pull request #11596: URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796 @chamikaramj thanks for the suggestion. I will look into using BoundedSource API. Unfortunately, regular DoFns don't cut it because a single elements outputs are committed atomically (see this [conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)). Basically we have one input element (HL7v2 store) exploding to many, many output elements (all the messages in that store) in a single ProcessElement call. I'm trying to explore strategies for splitting up this listing. I originally chose splittable DoFn over BoundedSource based off the sentiment of this statement: > **Coding against the Source API involves a lot of boilerplate and is error-prone**, and it does not compose well with the rest of the Beam model because a Source can appear only at the root of a pipeline. - https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html The blog also mentions - A Source can not emit an additional output (for example, records that failed to parse). - Healthcare customers feeding requirements for this plugin want DLQ on all sinks and sources. To be consistent with the streaming API provided in `HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I believe this is more of a nice to have for batch use cases (because there's no room for passing ListMessages bad messages IDs like there is in HL7v2IO.Read). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430466) Time Spent: 1h 40m (was: 1.5h) > HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization > -- > > Key: BEAM-9856 > URL: https://issues.apache.org/jira/browse/BEAM-9856 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently the List Messages API paginates through in a single ProcessElement > Call. > However we could get a restriction based on createTime using Messages.List > filter and orderby. > > This is inline with the future roadmap of HL7v2 bulk export API becomes > available that should allow splitting on (e.g. create time dimension). > Leveraging this bulk export might be a future optimization to explore. > > This could take one of two forms: > 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make > optimization the runner's problem, potentially unnecessarily complex for this > use case ) > 2. static splitting on some time partition e.g. finding the earliest > createTime and emitting a PCollection of 1 hour partitions and paginating > through each hour of data w/ in the time frame that the store spans, in a > separate ProcessElement. (easy to implement but will likely have hot keys / > stragglers based on "busy hours") > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
[ https://issues.apache.org/jira/browse/BEAM-9856?focusedWorklogId=430465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430465 ] ASF GitHub Bot logged work on BEAM-9856: Author: ASF GitHub Bot Created on: 04/May/20 23:49 Start Date: 04/May/20 23:49 Worklog Time Spent: 10m Work Description: jaketf commented on pull request #11596: URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796 @chamikaramj thanks for the suggestion. I will look into using BoundedSource API. Unfortunately, regular DoFns don't cut it because a single elements outputs are committed atomically (see this [conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)). Basically we have one input element (HL7v2 store) exploding to many, many output elements (all the messages in that store) in a single ProcessElement call. I'm trying to explore strategies for splitting up this listing. I originally chose splittable DoFn over BoundedSource based off the sentiment of this statement: > **Coding against the Source API involves a lot of boilerplate and is error-prone**, and it does not compose well with the rest of the Beam model because a Source can appear only at the root of a pipeline. - https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html The blog also mentions - A Source can not emit an additional output (for example, records that failed to parse). - Healthcare customers feeding requirements for this plugin want DLQ on all sinks and sources. To be consistent with the streaming API provided in `HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I believe this is more of a nice to have for batch use cases. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430465) Time Spent: 1.5h (was: 1h 20m) > HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization > -- > > Key: BEAM-9856 > URL: https://issues.apache.org/jira/browse/BEAM-9856 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently the List Messages API paginates through in a single ProcessElement > Call. > However we could get a restriction based on createTime using Messages.List > filter and orderby. > > This is inline with the future roadmap of HL7v2 bulk export API becomes > available that should allow splitting on (e.g. create time dimension). > Leveraging this bulk export might be a future optimization to explore. > > This could take one of two forms: > 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make > optimization the runner's problem, potentially unnecessarily complex for this > use case ) > 2. static splitting on some time partition e.g. finding the earliest > createTime and emitting a PCollection of 1 hour partitions and paginating > through each hour of data w/ in the time frame that the store spans, in a > separate ProcessElement. (easy to implement but will likely have hot keys / > stragglers based on "busy hours") > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9154) Move Chicago Taxi Example to Python 3
[ https://issues.apache.org/jira/browse/BEAM-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099414#comment-17099414 ] Valentyn Tymofieiev commented on BEAM-9154: --- I see, once you have time feel free to let me know if this is still an issue and if it is reproducible in newer versions of TFT. Thanks. > Move Chicago Taxi Example to Python 3 > - > > Key: BEAM-9154 > URL: https://issues.apache.org/jira/browse/BEAM-9154 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Kamil Wasilewski >Priority: Major > > The Chicago Taxi Example[1] should be moved to the latest version of Python > supported by Beam (currently it's Python 3.7). > At the moment, the following error occurs when running the benchmark on > Python 3.7 (requires futher investigation): > {code:java} > Traceback (most recent call last): > File "preprocess.py", line 259, in > main() > File "preprocess.py", line 254, in main > project=known_args.metric_reporting_project > File "preprocess.py", line 155, in transform_data > ('Analyze' >> tft_beam.AnalyzeDataset(preprocessing_fn))) > File > "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py", > line 987, in __ror__ > return self.transform.__ror__(pvalueish, self.label) > File > "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py", > line 547, in __ror__ > result = p.apply(self, pvalueish, label) > File > "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line > 532, in apply > return self.apply(transform, pvalueish) > File > "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line > 573, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File > "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File > "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", > line 223, in apply_PTransform > return transform.expand(input) > File > "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", > line 825, in expand > input_metadata)) > File > "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", > line 716, in expand > output_signature = self._preprocessing_fn(copied_inputs) > File "preprocess.py", line 102, in preprocessing_fn > _fill_in_missing(inputs[key]), > KeyError: 'company' > {code} > [1] sdks/python/apache_beam/testing/benchmarks/chicago_taxi -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9875) PR11585 breaks python2PostCommit cross-lang suite
[ https://issues.apache.org/jira/browse/BEAM-9875?focusedWorklogId=430449=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430449 ] ASF GitHub Bot logged work on BEAM-9875: Author: ASF GitHub Bot Created on: 04/May/20 23:00 Start Date: 04/May/20 23:00 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11603: URL: https://github.com/apache/beam/pull/11603#issuecomment-623752060 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430449) Time Spent: 40m (was: 0.5h) > PR11585 breaks python2PostCommit cross-lang suite > - > > Key: BEAM-9875 > URL: https://issues.apache.org/jira/browse/BEAM-9875 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Kyle Weaver >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Build scan: https://scans.gradle.com/s/6s6vse2zqr5x4 > Typical error msg: > Traceback (most recent call last): > File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main > "__main__", fname, loader, pkg_name) > File "/usr/lib/python2.7/runpy.py", line 72, in _run_code > exec code in run_globals > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2/src/sdks/python/apache_beam/examples/wordcount_xlang.py", > line 137, in > main() > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2/src/sdks/python/apache_beam/examples/wordcount_xlang.py", > line 128, in main > p.runner.create_job_service(pipeline_options) > File "apache_beam/runners/portability/portable_runner.py", line 328, in > create_job_service > server = self.default_job_server(options) > File "apache_beam/runners/portability/portable_runner.py", line 307, in > default_job_server > 'You must specify a --job_endpoint when using --runner=PortableRunner. ' > NotImplementedError: You must specify a --job_endpoint when using > --runner=PortableRunner. Alternatively, you may specify which portable runner > you intend to use, such as --runner=FlinkRunner or --runner=SparkRunner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
[ https://issues.apache.org/jira/browse/BEAM-9856?focusedWorklogId=430443=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430443 ] ASF GitHub Bot logged work on BEAM-9856: Author: ASF GitHub Bot Created on: 04/May/20 22:29 Start Date: 04/May/20 22:29 Worklog Time Spent: 10m Work Description: chamikaramj edited a comment on pull request #11596: URL: https://github.com/apache/beam/pull/11596#issuecomment-623741333 Is using SplittableDoFn API here intentional ? I think this API is being updated. cc: @lukecwik If you need dynamic work rebalancing, consider using the BoundedSource interface. Otherwise we can just implement the source using regular DoFns and wait for SplittableDoFn API to stabilize before adding support for dynamic work rebalancing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430443) Time Spent: 1h 20m (was: 1h 10m) > HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization > -- > > Key: BEAM-9856 > URL: https://issues.apache.org/jira/browse/BEAM-9856 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently the List Messages API paginates through in a single ProcessElement > Call. > However we could get a restriction based on createTime using Messages.List > filter and orderby. > > This is inline with the future roadmap of HL7v2 bulk export API becomes > available that should allow splitting on (e.g. create time dimension). > Leveraging this bulk export might be a future optimization to explore. > > This could take one of two forms: > 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make > optimization the runner's problem, potentially unnecessarily complex for this > use case ) > 2. static splitting on some time partition e.g. finding the earliest > createTime and emitting a PCollection of 1 hour partitions and paginating > through each hour of data w/ in the time frame that the store spans, in a > separate ProcessElement. (easy to implement but will likely have hot keys / > stragglers based on "busy hours") > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
[ https://issues.apache.org/jira/browse/BEAM-9856?focusedWorklogId=430442=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430442 ] ASF GitHub Bot logged work on BEAM-9856: Author: ASF GitHub Bot Created on: 04/May/20 22:28 Start Date: 04/May/20 22:28 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11596: URL: https://github.com/apache/beam/pull/11596#issuecomment-623741333 Is using SplittableDoFn API here intentional ? I think this API is being updated. cc: @lukecwik If you need dynamic work rebalancing, consider using the BoundedSource interface. Otherwise we can just implement the source as a regular and wait for SplittableDoFn API to stabilize before adding support for dynamic work rebalancing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430442) Time Spent: 1h 10m (was: 1h) > HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization > -- > > Key: BEAM-9856 > URL: https://issues.apache.org/jira/browse/BEAM-9856 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently the List Messages API paginates through in a single ProcessElement > Call. > However we could get a restriction based on createTime using Messages.List > filter and orderby. > > This is inline with the future roadmap of HL7v2 bulk export API becomes > available that should allow splitting on (e.g. create time dimension). > Leveraging this bulk export might be a future optimization to explore. > > This could take one of two forms: > 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make > optimization the runner's problem, potentially unnecessarily complex for this > use case ) > 2. static splitting on some time partition e.g. finding the earliest > createTime and emitting a PCollection of 1 hour partitions and paginating > through each hour of data w/ in the time frame that the store spans, in a > separate ProcessElement. (easy to implement but will likely have hot keys / > stragglers based on "busy hours") > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9885) Use artifact staging for SqlTransform rather than staging jar experiment
[ https://issues.apache.org/jira/browse/BEAM-9885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-9885: -- Status: Open (was: Triage Needed) > Use artifact staging for SqlTransform rather than staging jar experiment > > > Key: BEAM-9885 > URL: https://issues.apache.org/jira/browse/BEAM-9885 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9885) Use artifact staging for SqlTransform rather than staging jar experiment
Brian Hulette created BEAM-9885: --- Summary: Use artifact staging for SqlTransform rather than staging jar experiment Key: BEAM-9885 URL: https://issues.apache.org/jira/browse/BEAM-9885 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Brian Hulette Assignee: Brian Hulette -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
[ https://issues.apache.org/jira/browse/BEAM-9856?focusedWorklogId=430441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430441 ] ASF GitHub Bot logged work on BEAM-9856: Author: ASF GitHub Bot Created on: 04/May/20 22:19 Start Date: 04/May/20 22:19 Worklog Time Spent: 10m Work Description: jaketf commented on pull request #11596: URL: https://github.com/apache/beam/pull/11596#issuecomment-623737955 > The NPE is caused by `OffsetRangeTracker.lastAttemptedOffset` being unboxed in `OffsetRangeTracker::checkDone` [here](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L98). > > [all](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L53) [other](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L77) [uses](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L119) check that `lastAttemptedOffset` is not null. > > I'm not sure if this was intentional in the implementation of OffsetRangeTracker. @chamikaramj pablo said you might know about this. should check done have some conditional on `lastAttemptedOffset != null` e.g. ```java @Override public void checkDone() throws IllegalStateException { if (range.getFrom() == range.getTo()) { return; } if (lastAttemptedOffset != null) { checkState( lastAttemptedOffset >= range.getTo() - 1, "Last attempted offset was %s in range %s, claiming work in [%s, %s) was not attempted", lastAttemptedOffset, range, lastAttemptedOffset + 1, range.getTo()); } } ``` I'm not really familiar with what checkDone should do in the case that lastAttemptedOffset was null. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430441) Time Spent: 1h (was: 50m) > HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization > -- > > Key: BEAM-9856 > URL: https://issues.apache.org/jira/browse/BEAM-9856 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > Currently the List Messages API paginates through in a single ProcessElement > Call. > However we could get a restriction based on createTime using Messages.List > filter and orderby. > > This is inline with the future roadmap of HL7v2 bulk export API becomes > available that should allow splitting on (e.g. create time dimension). > Leveraging this bulk export might be a future optimization to explore. > > This could take one of two forms: > 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make > optimization the runner's problem, potentially unnecessarily complex for this > use case ) > 2. static splitting on some time partition e.g. finding the earliest > createTime and emitting a PCollection of 1 hour partitions and paginating > through each hour of data w/ in the time frame that the store spans, in a > separate ProcessElement. (easy to implement but will likely have hot keys / > stragglers based on "busy hours") > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9875) PR11585 breaks python2PostCommit cross-lang suite
[ https://issues.apache.org/jira/browse/BEAM-9875?focusedWorklogId=430439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430439 ] ASF GitHub Bot logged work on BEAM-9875: Author: ASF GitHub Bot Created on: 04/May/20 22:14 Start Date: 04/May/20 22:14 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11603: URL: https://github.com/apache/beam/pull/11603#issuecomment-623736044 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430439) Time Spent: 0.5h (was: 20m) > PR11585 breaks python2PostCommit cross-lang suite > - > > Key: BEAM-9875 > URL: https://issues.apache.org/jira/browse/BEAM-9875 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Kyle Weaver >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Build scan: https://scans.gradle.com/s/6s6vse2zqr5x4 > Typical error msg: > Traceback (most recent call last): > File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main > "__main__", fname, loader, pkg_name) > File "/usr/lib/python2.7/runpy.py", line 72, in _run_code > exec code in run_globals > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2/src/sdks/python/apache_beam/examples/wordcount_xlang.py", > line 137, in > main() > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2/src/sdks/python/apache_beam/examples/wordcount_xlang.py", > line 128, in main > p.runner.create_job_service(pipeline_options) > File "apache_beam/runners/portability/portable_runner.py", line 328, in > create_job_service > server = self.default_job_server(options) > File "apache_beam/runners/portability/portable_runner.py", line 307, in > default_job_server > 'You must specify a --job_endpoint when using --runner=PortableRunner. ' > NotImplementedError: You must specify a --job_endpoint when using > --runner=PortableRunner. Alternatively, you may specify which portable runner > you intend to use, such as --runner=FlinkRunner or --runner=SparkRunner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
[ https://issues.apache.org/jira/browse/BEAM-9856?focusedWorklogId=430438=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430438 ] ASF GitHub Bot logged work on BEAM-9856: Author: ASF GitHub Bot Created on: 04/May/20 22:06 Start Date: 04/May/20 22:06 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11596: URL: https://github.com/apache/beam/pull/11596#issuecomment-623733179 @chamikaramj can you take a look at this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430438) Time Spent: 50m (was: 40m) > HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization > -- > > Key: BEAM-9856 > URL: https://issues.apache.org/jira/browse/BEAM-9856 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > Currently the List Messages API paginates through in a single ProcessElement > Call. > However we could get a restriction based on createTime using Messages.List > filter and orderby. > > This is inline with the future roadmap of HL7v2 bulk export API becomes > available that should allow splitting on (e.g. create time dimension). > Leveraging this bulk export might be a future optimization to explore. > > This could take one of two forms: > 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make > optimization the runner's problem, potentially unnecessarily complex for this > use case ) > 2. static splitting on some time partition e.g. finding the earliest > createTime and emitting a PCollection of 1 hour partitions and paginating > through each hour of data w/ in the time frame that the store spans, in a > separate ProcessElement. (easy to implement but will likely have hot keys / > stragglers based on "busy hours") > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9861) BigQueryStorageStreamSource fails with split fractions of 0.0 or 1.0
[ https://issues.apache.org/jira/browse/BEAM-9861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9861: --- Status: Open (was: Triage Needed) > BigQueryStorageStreamSource fails with split fractions of 0.0 or 1.0 > > > Key: BEAM-9861 > URL: https://issues.apache.org/jira/browse/BEAM-9861 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Kenneth Jung >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9862) test_multimap_multiside_input breaks Spark python VR test
[ https://issues.apache.org/jira/browse/BEAM-9862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9862: --- Status: Open (was: Triage Needed) > test_multimap_multiside_input breaks Spark python VR test > - > > Key: BEAM-9862 > URL: https://issues.apache.org/jira/browse/BEAM-9862 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Pablo Estrada >Priority: Major > > == > ERROR: test_multimap_multiside_input (__main__.SparkRunnerTest) > -- > Traceback (most recent call last): > File "apache_beam/runners/portability/fn_api_runner/fn_runner_test.py", > line 265, in test_multimap_multiside_input > equal_to([('a', [1, 3], [1, 2, 3]), ('b', [2], [1, 2, 3])])) > File "apache_beam/pipeline.py", line 543, in __exit__ > self.run().wait_until_finish() > File "apache_beam/runners/portability/portable_runner.py", line 571, in > wait_until_finish > (self._job_id, self._state, self._last_error_message())) > RuntimeError: Pipeline > test_multimap_multiside_input_1588248535.23_9fee5456-d12d-451e-b112-a022d1c6ce1f > failed in state FAILED: java.lang.IllegalArgumentException: Multiple entries > with same key: > ref_PCollection_PCollection_21=(Broadcast(37),WindowedValue$FullWindowedValueCoder(KvCoder(ByteArrayCoder,VarLongCoder),GlobalWindow$Coder)) > and > ref_PCollection_PCollection_21=(Broadcast(36),WindowedValue$FullWindowedValueCoder(KvCoder(ByteArrayCoder,VarLongCoder),GlobalWindow$Coder)) > Link: https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/3099/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
[ https://issues.apache.org/jira/browse/BEAM-9856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9856: --- Status: Open (was: Triage Needed) > HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization > -- > > Key: BEAM-9856 > URL: https://issues.apache.org/jira/browse/BEAM-9856 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > Currently the List Messages API paginates through in a single ProcessElement > Call. > However we could get a restriction based on createTime using Messages.List > filter and orderby. > > This is inline with the future roadmap of HL7v2 bulk export API becomes > available that should allow splitting on (e.g. create time dimension). > Leveraging this bulk export might be a future optimization to explore. > > This could take one of two forms: > 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make > optimization the runner's problem, potentially unnecessarily complex for this > use case ) > 2. static splitting on some time partition e.g. finding the earliest > createTime and emitting a PCollection of 1 hour partitions and paginating > through each hour of data w/ in the time frame that the store spans, in a > separate ProcessElement. (easy to implement but will likely have hot keys / > stragglers based on "busy hours") > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8458) BigQueryIO.Read needs permissions to create datasets to be able to run queries
[ https://issues.apache.org/jira/browse/BEAM-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Israel Herraiz updated BEAM-8458: - Fix Version/s: (was: 2.20.0) 2.21.0 > BigQueryIO.Read needs permissions to create datasets to be able to run queries > -- > > Key: BEAM-8458 > URL: https://issues.apache.org/jira/browse/BEAM-8458 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Israel Herraiz >Assignee: Israel Herraiz >Priority: Major > Fix For: 2.21.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > When using {{fromQuery}}, BigQueryIO creates a temp dataset to store the > results of the query. > Therefore, Beam requires permissions to create datasets just to be able to > run a query. In practice, this means that Beam requires the role > bigQuery.User just to run queries, whereas if you use {{from}} (to read from > a table), the role bigQuery.jobUser suffices. > BigQueryIO.Read should have an option to set an existing dataset to write > the temp results of > a query, so it would be enough with having the role bigQuery.jobUser. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9880) touch: build/target/third_party_licenses/skip: No such file or directory
[ https://issues.apache.org/jira/browse/BEAM-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099365#comment-17099365 ] Hannah Jiang commented on BEAM-9880: Created a PR, that should solve the issue. [https://github.com/apache/beam/pull/11606] > touch: build/target/third_party_licenses/skip: No such file or directory > > > Key: BEAM-9880 > URL: https://issues.apache.org/jira/browse/BEAM-9880 > Project: Beam > Issue Type: Bug > Components: sdk-java-harness >Reporter: Kyle Weaver >Assignee: Hannah Jiang >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When I run ./gradlew > :sdks:python:test-suites:portable:py2:crossLanguageTests, I get the following > error: > > Task :sdks:java:container:createFile FAILED > touch: build/target/third_party_licenses/skip: No such file or directory > When I do `ls build`, the only thing it outputs is `gradleenv`. So it looks > like it's assuming the directory exists, when it might not. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-9880) touch: build/target/third_party_licenses/skip: No such file or directory
[ https://issues.apache.org/jira/browse/BEAM-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-9880 started by Hannah Jiang. -- > touch: build/target/third_party_licenses/skip: No such file or directory > > > Key: BEAM-9880 > URL: https://issues.apache.org/jira/browse/BEAM-9880 > Project: Beam > Issue Type: Bug > Components: sdk-java-harness >Reporter: Kyle Weaver >Assignee: Hannah Jiang >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When I run ./gradlew > :sdks:python:test-suites:portable:py2:crossLanguageTests, I get the following > error: > > Task :sdks:java:container:createFile FAILED > touch: build/target/third_party_licenses/skip: No such file or directory > When I do `ls build`, the only thing it outputs is `gradleenv`. So it looks > like it's assuming the directory exists, when it might not. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9880) touch: build/target/third_party_licenses/skip: No such file or directory
[ https://issues.apache.org/jira/browse/BEAM-9880?focusedWorklogId=430430=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430430 ] ASF GitHub Bot logged work on BEAM-9880: Author: ASF GitHub Bot Created on: 04/May/20 21:35 Start Date: 04/May/20 21:35 Worklog Time Spent: 10m Work Description: Hannah-Jiang opened a new pull request #11606: URL: https://github.com/apache/beam/pull/11606 R: @ibzib Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-9799) Add automated RTracker validation to Go SDF
[ https://issues.apache.org/jira/browse/BEAM-9799?focusedWorklogId=430429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430429 ] ASF GitHub Bot logged work on BEAM-9799: Author: ASF GitHub Bot Created on: 04/May/20 21:34 Start Date: 04/May/20 21:34 Worklog Time Spent: 10m Work Description: youngoli commented on a change in pull request #11553: URL: https://github.com/apache/beam/pull/11553#discussion_r419743023 ## File path: sdks/go/pkg/beam/core/runtime/exec/pardo.go ## @@ -162,6 +168,14 @@ func (n *ParDo) processMainInput(mainIn *MainInput) error { return nil } +func rtErrHelper(err error) error { + if err != nil { + return err + } else { + return errors.New("RTracker IsDone failed for unspecified reason") + } Review comment: Yeah good point. I'll rephrase it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430429) Time Spent: 40m (was: 0.5h) > Add automated RTracker validation to Go SDF > --- > > Key: BEAM-9799 > URL: https://issues.apache.org/jira/browse/BEAM-9799 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > After finishing executing an SDF we should be validating the restriction > trackers to make sure they succeeded without errors. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8603) Add Python SqlTransform MVP
[ https://issues.apache.org/jira/browse/BEAM-8603?focusedWorklogId=430423=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430423 ] ASF GitHub Bot logged work on BEAM-8603: Author: ASF GitHub Bot Created on: 04/May/20 21:25 Start Date: 04/May/20 21:25 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10055: URL: https://github.com/apache/beam/pull/10055#issuecomment-623714745 yoohooo This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430423) Time Spent: 12h 40m (was: 12.5h) > Add Python SqlTransform MVP > --- > > Key: BEAM-8603 > URL: https://issues.apache.org/jira/browse/BEAM-8603 > Project: Beam > Issue Type: Improvement > Components: dsl-sql, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.22.0 > > Time Spent: 12h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9449) Consider passing pipeline options for expansion service.
[ https://issues.apache.org/jira/browse/BEAM-9449?focusedWorklogId=430420=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430420 ] ASF GitHub Bot logged work on BEAM-9449: Author: ASF GitHub Bot Created on: 04/May/20 21:23 Start Date: 04/May/20 21:23 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #11574: URL: https://github.com/apache/beam/pull/11574#issuecomment-623713457 Synced this up now that #11571 is merged. This is ready for review now. @ihji do you have time to review? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430420) Time Spent: 0.5h (was: 20m) > Consider passing pipeline options for expansion service. > > > Key: BEAM-9449 > URL: https://issues.apache.org/jira/browse/BEAM-9449 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Robert Bradshaw >Assignee: Brian Hulette >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9883) Generalize SDF-validating restrictions
[ https://issues.apache.org/jira/browse/BEAM-9883?focusedWorklogId=430421=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430421 ] ASF GitHub Bot logged work on BEAM-9883: Author: ASF GitHub Bot Created on: 04/May/20 21:23 Start Date: 04/May/20 21:23 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #11605: URL: https://github.com/apache/beam/pull/11605#issuecomment-623713567 R: @lostluck This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430421) Time Spent: 20m (was: 10m) > Generalize SDF-validating restrictions > -- > > Key: BEAM-9883 > URL: https://issues.apache.org/jira/browse/BEAM-9883 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > We have some restrictions written for the purpose of validating that SDFs > work in sdf_invokers_test.go, but they can be improved and generalized. The > main improvement is changing the validation approach so that the restriction > keeps track of each method it's had called on it. Then this can be > generalized so that it can be used in upcoming integration tests to validate > which methods are being called. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9883) Generalize SDF-validating restrictions
[ https://issues.apache.org/jira/browse/BEAM-9883?focusedWorklogId=430418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430418 ] ASF GitHub Bot logged work on BEAM-9883: Author: ASF GitHub Bot Created on: 04/May/20 21:12 Start Date: 04/May/20 21:12 Worklog Time Spent: 10m Work Description: youngoli opened a new pull request #11605: URL: https://github.com/apache/beam/pull/11605 Refactoring the restriction used for testing SDFs. Instead of having some obtuse behavior that we can validate, it now just contains a bunch of flags we can flip to track that it was used in each method. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
[jira] [Commented] (BEAM-9880) touch: build/target/third_party_licenses/skip: No such file or directory
[ https://issues.apache.org/jira/browse/BEAM-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099353#comment-17099353 ] Kyle Weaver commented on BEAM-9880: --- I am running all these commands from the Beam root directory by the way. > touch: build/target/third_party_licenses/skip: No such file or directory > > > Key: BEAM-9880 > URL: https://issues.apache.org/jira/browse/BEAM-9880 > Project: Beam > Issue Type: Bug > Components: sdk-java-harness >Reporter: Kyle Weaver >Assignee: Hannah Jiang >Priority: Major > > When I run ./gradlew > :sdks:python:test-suites:portable:py2:crossLanguageTests, I get the following > error: > > Task :sdks:java:container:createFile FAILED > touch: build/target/third_party_licenses/skip: No such file or directory > When I do `ls build`, the only thing it outputs is `gradleenv`. So it looks > like it's assuming the directory exists, when it might not. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9880) touch: build/target/third_party_licenses/skip: No such file or directory
[ https://issues.apache.org/jira/browse/BEAM-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-9880: -- Status: Open (was: Triage Needed) > touch: build/target/third_party_licenses/skip: No such file or directory > > > Key: BEAM-9880 > URL: https://issues.apache.org/jira/browse/BEAM-9880 > Project: Beam > Issue Type: Bug > Components: sdk-java-harness >Reporter: Kyle Weaver >Assignee: Hannah Jiang >Priority: Major > > When I run ./gradlew > :sdks:python:test-suites:portable:py2:crossLanguageTests, I get the following > error: > > Task :sdks:java:container:createFile FAILED > touch: build/target/third_party_licenses/skip: No such file or directory > When I do `ls build`, the only thing it outputs is `gradleenv`. So it looks > like it's assuming the directory exists, when it might not. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9880) touch: build/target/third_party_licenses/skip: No such file or directory
[ https://issues.apache.org/jira/browse/BEAM-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099346#comment-17099346 ] Kyle Weaver commented on BEAM-9880: --- Same error happens when I do ./gradlew :sdks:java:container:docker. (I'm not using docker-pull-licenses in the case since I'm just trying to run some tests locally.) Can you try doing `./gradlew clean && ./gradlew :sdks:java:container:docker`? > touch: build/target/third_party_licenses/skip: No such file or directory > > > Key: BEAM-9880 > URL: https://issues.apache.org/jira/browse/BEAM-9880 > Project: Beam > Issue Type: Bug > Components: sdk-java-harness >Reporter: Kyle Weaver >Assignee: Hannah Jiang >Priority: Major > > When I run ./gradlew > :sdks:python:test-suites:portable:py2:crossLanguageTests, I get the following > error: > > Task :sdks:java:container:createFile FAILED > touch: build/target/third_party_licenses/skip: No such file or directory > When I do `ls build`, the only thing it outputs is `gradleenv`. So it looks > like it's assuming the directory exists, when it might not. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9884) Add local option for configuring SqlTransform planner
[ https://issues.apache.org/jira/browse/BEAM-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette updated BEAM-9884: Summary: Add local option for configuring SqlTransform planner (was: Add local option for configuraing SqlTransform planner) > Add local option for configuring SqlTransform planner > - > > Key: BEAM-9884 > URL: https://issues.apache.org/jira/browse/BEAM-9884 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Brian Hulette >Priority: Minor > > Currently it's only possible to change the SqlTransform planner globally via > pipeline options. It should be possible to configure the planner per > transform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9884) Add local option for configuraing SqlTransform planner
Brian Hulette created BEAM-9884: --- Summary: Add local option for configuraing SqlTransform planner Key: BEAM-9884 URL: https://issues.apache.org/jira/browse/BEAM-9884 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Brian Hulette Currently it's only possible to change the SqlTransform planner globally via pipeline options. It should be possible to configure the planner per transform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9880) touch: build/target/third_party_licenses/skip: No such file or directory
[ https://issues.apache.org/jira/browse/BEAM-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099344#comment-17099344 ] Hannah Jiang commented on BEAM-9880: The directly should be created as of [https://github.com/apache/beam/blob/master/sdks/java/container/build.gradle#L110] I cannot reproduce it. Can you try to create a java image only? By the way, you should use docker-pull-licenses or isRelease (hasn't been merged yet) to pull licenses. > touch: build/target/third_party_licenses/skip: No such file or directory > > > Key: BEAM-9880 > URL: https://issues.apache.org/jira/browse/BEAM-9880 > Project: Beam > Issue Type: Bug > Components: sdk-java-harness >Reporter: Kyle Weaver >Assignee: Hannah Jiang >Priority: Major > > When I run ./gradlew > :sdks:python:test-suites:portable:py2:crossLanguageTests, I get the following > error: > > Task :sdks:java:container:createFile FAILED > touch: build/target/third_party_licenses/skip: No such file or directory > When I do `ls build`, the only thing it outputs is `gradleenv`. So it looks > like it's assuming the directory exists, when it might not. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9883) Generalize SDF-validating restrictions
[ https://issues.apache.org/jira/browse/BEAM-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-9883: -- Status: Open (was: Triage Needed) > Generalize SDF-validating restrictions > -- > > Key: BEAM-9883 > URL: https://issues.apache.org/jira/browse/BEAM-9883 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > > We have some restrictions written for the purpose of validating that SDFs > work in sdf_invokers_test.go, but they can be improved and generalized. The > main improvement is changing the validation approach so that the restriction > keeps track of each method it's had called on it. Then this can be > generalized so that it can be used in upcoming integration tests to validate > which methods are being called. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9882) Go SplittableDoFn testing
[ https://issues.apache.org/jira/browse/BEAM-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-9882: -- Status: Open (was: Triage Needed) > Go SplittableDoFn testing > - > > Key: BEAM-9882 > URL: https://issues.apache.org/jira/browse/BEAM-9882 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > > This is a catch-all jira for all tasks needed to fully test SplittableDoFns > in Go. For progress on SplittableDoFns themselves, see BEAM-3301 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9882) Go SplittableDoFn testing
Daniel Oliveira created BEAM-9882: - Summary: Go SplittableDoFn testing Key: BEAM-9882 URL: https://issues.apache.org/jira/browse/BEAM-9882 Project: Beam Issue Type: Improvement Components: sdk-go Reporter: Daniel Oliveira Assignee: Daniel Oliveira This is a catch-all jira for all tasks needed to fully test SplittableDoFns in Go. For progress on SplittableDoFns themselves, see BEAM-3301 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9880) touch: build/target/third_party_licenses/skip: No such file or directory
[ https://issues.apache.org/jira/browse/BEAM-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099333#comment-17099333 ] Kyle Weaver commented on BEAM-9880: --- This happened on head. sdks/java/container/build/target/third_party_licenses does not exist. $ ls sdks/java/container/build/target/ LICENSE NOTICE > touch: build/target/third_party_licenses/skip: No such file or directory > > > Key: BEAM-9880 > URL: https://issues.apache.org/jira/browse/BEAM-9880 > Project: Beam > Issue Type: Bug > Components: sdk-java-harness >Reporter: Kyle Weaver >Assignee: Hannah Jiang >Priority: Major > > When I run ./gradlew > :sdks:python:test-suites:portable:py2:crossLanguageTests, I get the following > error: > > Task :sdks:java:container:createFile FAILED > touch: build/target/third_party_licenses/skip: No such file or directory > When I do `ls build`, the only thing it outputs is `gradleenv`. So it looks > like it's assuming the directory exists, when it might not. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9881) refactor KafkaIOIT for reusing pipeline
[ https://issues.apache.org/jira/browse/BEAM-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-9881: -- Status: Open (was: Triage Needed) > refactor KafkaIOIT for reusing pipeline > --- > > Key: BEAM-9881 > URL: https://issues.apache.org/jira/browse/BEAM-9881 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9877) Eager size estimation of large group-by-key iterables cause expensive / duplicate reads
[ https://issues.apache.org/jira/browse/BEAM-9877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver resolved BEAM-9877. --- Fix Version/s: 2.21.0 Resolution: Fixed > Eager size estimation of large group-by-key iterables cause expensive / > duplicate reads > --- > > Key: BEAM-9877 > URL: https://issues.apache.org/jira/browse/BEAM-9877 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Tudor Marian >Assignee: Tudor Marian >Priority: Major > Fix For: 2.21.0 > > Time Spent: 1h > Remaining Estimate: 0h > > BatchGroupAlsoByWindowViaIteratorsFn WindowReiterator can cause very > expensive duplicate reading of data from a (Co-)GroupByKey for hot keys with > many values due to PCollection size estimation. Instead, it should perform > lazy estimation like GroupingShuffleReader and GroupingShuffleEntryInterator > (or perform no estimation at all). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9877) Eager size estimation of large group-by-key iterables cause expensive / duplicate reads
[ https://issues.apache.org/jira/browse/BEAM-9877?focusedWorklogId=430399=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430399 ] ASF GitHub Bot logged work on BEAM-9877: Author: ASF GitHub Bot Created on: 04/May/20 20:20 Start Date: 04/May/20 20:20 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11601: URL: https://github.com/apache/beam/pull/11601#issuecomment-623684159 Java test failing due to BEAM-9164 (known flake). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430399) Time Spent: 1h (was: 50m) > Eager size estimation of large group-by-key iterables cause expensive / > duplicate reads > --- > > Key: BEAM-9877 > URL: https://issues.apache.org/jira/browse/BEAM-9877 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Tudor Marian >Assignee: Tudor Marian >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > BatchGroupAlsoByWindowViaIteratorsFn WindowReiterator can cause very > expensive duplicate reading of data from a (Co-)GroupByKey for hot keys with > many values due to PCollection size estimation. Instead, it should perform > lazy estimation like GroupingShuffleReader and GroupingShuffleEntryInterator > (or perform no estimation at all). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-9843) Flink UnboundedSourceWrapperTest flaky due to a timeout
[ https://issues.apache.org/jira/browse/BEAM-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver closed BEAM-9843. - Fix Version/s: Not applicable Resolution: Duplicate > Flink UnboundedSourceWrapperTest flaky due to a timeout > --- > > Key: BEAM-9843 > URL: https://issues.apache.org/jira/browse/BEAM-9843 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Chamikara Madhusanka Jayalath >Priority: Major > Fix For: Not applicable > > > For example, > [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2685/] > [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2684/] > [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2682/] > [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2680/] > [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2685/testReport/junit/org.apache.beam.runners.flink.translation.wrappers.streaming.io/UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest/testWatermarkEmission_numTasks___4__numSplits_4_/] > org.junit.runners.model.TestTimedOutException: test timed out after 3 > milliseconds at sun.misc.Unsafe.park(Native Method) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at > org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest.testWatermarkEmission(UnboundedSourceWrapperTest.java:354) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9880) touch: build/target/third_party_licenses/skip: No such file or directory
[ https://issues.apache.org/jira/browse/BEAM-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099324#comment-17099324 ] Hannah Jiang commented on BEAM-9880: Did you pull from the head and still see the issue? Can you check if sdks/java/container/build/target/third_party_licenses exists? > touch: build/target/third_party_licenses/skip: No such file or directory > > > Key: BEAM-9880 > URL: https://issues.apache.org/jira/browse/BEAM-9880 > Project: Beam > Issue Type: Bug > Components: sdk-java-harness >Reporter: Kyle Weaver >Assignee: Hannah Jiang >Priority: Major > > When I run ./gradlew > :sdks:python:test-suites:portable:py2:crossLanguageTests, I get the following > error: > > Task :sdks:java:container:createFile FAILED > touch: build/target/third_party_licenses/skip: No such file or directory > When I do `ls build`, the only thing it outputs is `gradleenv`. So it looks > like it's assuming the directory exists, when it might not. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9164) [PreCommit_Java] [Flake] UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest
[ https://issues.apache.org/jira/browse/BEAM-9164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-9164: -- Issue Type: Bug (was: Improvement) > [PreCommit_Java] [Flake] > UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest > --- > > Key: BEAM-9164 > URL: https://issues.apache.org/jira/browse/BEAM-9164 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kirill Kozlov >Priority: Major > Labels: flake > > Test: > org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest > >> testWatermarkEmission[numTasks = 1; numSplits=1] > Fails with the following exception: > {code:java} > org.junit.runners.model.TestTimedOutException: test timed out after 3 > milliseconds{code} > Affected Jenkins job: > [https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1665/] > Gradle build scan: [https://scans.gradle.com/s/nvgeb425fe63q] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9659) ArrayScanToJoinConverter ResolvedGetStructField cast to ResolvedColumnRef
[ https://issues.apache.org/jira/browse/BEAM-9659?focusedWorklogId=430392=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430392 ] ASF GitHub Bot logged work on BEAM-9659: Author: ASF GitHub Bot Created on: 04/May/20 20:10 Start Date: 04/May/20 20:10 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #11604: URL: https://github.com/apache/beam/pull/11604#issuecomment-623679534 R: @robinyqiu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430392) Time Spent: 20m (was: 10m) > ArrayScanToJoinConverter ResolvedGetStructField cast to ResolvedColumnRef > - > > Key: BEAM-9659 > URL: https://issues.apache.org/jira/browse/BEAM-9659 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Andrew Pilloud >Priority: Critical > Labels: zetasql-compliance > Time Spent: 20m > Remaining Estimate: 0h > > two failures in shard 19 > {code:java} > java.lang.ClassCastException: > com.google.zetasql.resolvedast.ResolvedNodes$ResolvedGetStructField cannot be > cast to com.google.zetasql.resolvedast.ResolvedNodes$ResolvedColumnRef > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:62) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:37) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:97) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Collections$2.tryAdvance(Collections.java:4717) > at java.util.Collections$2.forEachRemaining(Collections.java:4725) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:96) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convert(QueryStatementConverter.java:84) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertRootQuery(QueryStatementConverter.java:51) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:160) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:131) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:115) > at > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl.executeQuery(ExecuteQueryServiceServer.java:242) > at > com.google.zetasql.testing.SqlComplianceServiceGrpc$MethodHandlers.invoke(SqlComplianceServiceGrpc.java:423) > at > com.google.zetasql.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171) > at > com.google.zetasql.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283) > at > com.google.zetasql.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711) > at > com.google.zetasql.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > com.google.zetasql.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > {code:java} > Apr 01, 2020 11:48:18 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > INFO: Processing Sql statement: SELECT b, x > FROM UNNEST( >[STRUCT(true AS b, [3, 5] AS arr), STRUCT(false AS b, [7, 9] AS
[jira] [Work logged] (BEAM-9659) ArrayScanToJoinConverter ResolvedGetStructField cast to ResolvedColumnRef
[ https://issues.apache.org/jira/browse/BEAM-9659?focusedWorklogId=430391=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430391 ] ASF GitHub Bot logged work on BEAM-9659: Author: ASF GitHub Bot Created on: 04/May/20 20:10 Start Date: 04/May/20 20:10 Worklog Time Spent: 10m Work Description: apilloud opened a new pull request #11604: URL: https://github.com/apache/beam/pull/11604 There is no trivial path to fixing these, so just ensure they return a sensible error for now. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
[jira] [Assigned] (BEAM-9664) ArrayScanToJoinConverter ResolvedSubqueryExpr cast to ResolvedColumnRef
[ https://issues.apache.org/jira/browse/BEAM-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Pilloud reassigned BEAM-9664: Assignee: Andrew Pilloud > ArrayScanToJoinConverter ResolvedSubqueryExpr cast to ResolvedColumnRef > --- > > Key: BEAM-9664 > URL: https://issues.apache.org/jira/browse/BEAM-9664 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Andrew Pilloud >Priority: Critical > Labels: zetasql-compliance > > one failure in shard 49 > {code} > java.lang.ClassCastException: > com.google.zetasql.resolvedast.ResolvedNodes$ResolvedSubqueryExpr cannot be > cast to com.google.zetasql.resolvedast.ResolvedNodes$ResolvedColumnRef > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:62) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:37) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:97) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Collections$2.tryAdvance(Collections.java:4717) > at java.util.Collections$2.forEachRemaining(Collections.java:4725) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:96) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convert(QueryStatementConverter.java:84) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertRootQuery(QueryStatementConverter.java:51) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:160) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:131) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:115) > at > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl.executeQuery(ExecuteQueryServiceServer.java:242) > at > com.google.zetasql.testing.SqlComplianceServiceGrpc$MethodHandlers.invoke(SqlComplianceServiceGrpc.java:423) > at > com.google.zetasql.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171) > at > com.google.zetasql.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283) > at > com.google.zetasql.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711) > at > com.google.zetasql.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > com.google.zetasql.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > {code} > Apr 01, 2020 11:48:49 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > INFO: Processing Sql statement: SELECT a, b > FROM UNNEST([1, 2, 3]) a > JOIN UNNEST(ARRAY(SELECT b FROM UNNEST([3, 2, 1]) b)) b > ON a = b; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9875) PR11585 breaks python2PostCommit cross-lang suite
[ https://issues.apache.org/jira/browse/BEAM-9875?focusedWorklogId=430388=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430388 ] ASF GitHub Bot logged work on BEAM-9875: Author: ASF GitHub Bot Created on: 04/May/20 20:07 Start Date: 04/May/20 20:07 Worklog Time Spent: 10m Work Description: ibzib opened a new pull request #11603: URL: https://github.com/apache/beam/pull/11603 …ge tests. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-9875) PR11585 breaks python2PostCommit cross-lang suite
[ https://issues.apache.org/jira/browse/BEAM-9875?focusedWorklogId=430389=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430389 ] ASF GitHub Bot logged work on BEAM-9875: Author: ASF GitHub Bot Created on: 04/May/20 20:07 Start Date: 04/May/20 20:07 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11603: URL: https://github.com/apache/beam/pull/11603#issuecomment-623678292 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430389) Time Spent: 20m (was: 10m) > PR11585 breaks python2PostCommit cross-lang suite > - > > Key: BEAM-9875 > URL: https://issues.apache.org/jira/browse/BEAM-9875 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Kyle Weaver >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Build scan: https://scans.gradle.com/s/6s6vse2zqr5x4 > Typical error msg: > Traceback (most recent call last): > File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main > "__main__", fname, loader, pkg_name) > File "/usr/lib/python2.7/runpy.py", line 72, in _run_code > exec code in run_globals > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2/src/sdks/python/apache_beam/examples/wordcount_xlang.py", > line 137, in > main() > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2/src/sdks/python/apache_beam/examples/wordcount_xlang.py", > line 128, in main > p.runner.create_job_service(pipeline_options) > File "apache_beam/runners/portability/portable_runner.py", line 328, in > create_job_service > server = self.default_job_server(options) > File "apache_beam/runners/portability/portable_runner.py", line 307, in > default_job_server > 'You must specify a --job_endpoint when using --runner=PortableRunner. ' > NotImplementedError: You must specify a --job_endpoint when using > --runner=PortableRunner. Alternatively, you may specify which portable runner > you intend to use, such as --runner=FlinkRunner or --runner=SparkRunner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-9659) ArrayScanToJoinConverter ResolvedGetStructField cast to ResolvedColumnRef
[ https://issues.apache.org/jira/browse/BEAM-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-9659 started by Andrew Pilloud. > ArrayScanToJoinConverter ResolvedGetStructField cast to ResolvedColumnRef > - > > Key: BEAM-9659 > URL: https://issues.apache.org/jira/browse/BEAM-9659 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Andrew Pilloud >Priority: Critical > Labels: zetasql-compliance > > two failures in shard 19 > {code:java} > java.lang.ClassCastException: > com.google.zetasql.resolvedast.ResolvedNodes$ResolvedGetStructField cannot be > cast to com.google.zetasql.resolvedast.ResolvedNodes$ResolvedColumnRef > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:62) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:37) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:97) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Collections$2.tryAdvance(Collections.java:4717) > at java.util.Collections$2.forEachRemaining(Collections.java:4725) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:96) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convert(QueryStatementConverter.java:84) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertRootQuery(QueryStatementConverter.java:51) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:160) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:131) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:115) > at > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl.executeQuery(ExecuteQueryServiceServer.java:242) > at > com.google.zetasql.testing.SqlComplianceServiceGrpc$MethodHandlers.invoke(SqlComplianceServiceGrpc.java:423) > at > com.google.zetasql.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171) > at > com.google.zetasql.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283) > at > com.google.zetasql.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711) > at > com.google.zetasql.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > com.google.zetasql.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > {code:java} > Apr 01, 2020 11:48:18 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > INFO: Processing Sql statement: SELECT b, x > FROM UNNEST( >[STRUCT(true AS b, [3, 5] AS arr), STRUCT(false AS b, [7, 9] AS arr)]) > t > LEFT JOIN UNNEST(t.arr) x ON b > Apr 01, 2020 11:48:18 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > INFO: Processing Sql statement: SELECT b, x, p > FROM UNNEST( >[STRUCT(true AS b, [3, 5] AS arr), STRUCT(false AS b, [7, 9] AS arr)]) > t > LEFT JOIN UNNEST(t.arr) x WITH OFFSET p ON b > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9659) ArrayScanToJoinConverter ResolvedGetStructField cast to ResolvedColumnRef
[ https://issues.apache.org/jira/browse/BEAM-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Pilloud reassigned BEAM-9659: Assignee: Andrew Pilloud > ArrayScanToJoinConverter ResolvedGetStructField cast to ResolvedColumnRef > - > > Key: BEAM-9659 > URL: https://issues.apache.org/jira/browse/BEAM-9659 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Andrew Pilloud >Priority: Critical > Labels: zetasql-compliance > > two failures in shard 19 > {code:java} > java.lang.ClassCastException: > com.google.zetasql.resolvedast.ResolvedNodes$ResolvedGetStructField cannot be > cast to com.google.zetasql.resolvedast.ResolvedNodes$ResolvedColumnRef > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:62) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:37) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:97) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Collections$2.tryAdvance(Collections.java:4717) > at java.util.Collections$2.forEachRemaining(Collections.java:4725) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:96) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convert(QueryStatementConverter.java:84) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertRootQuery(QueryStatementConverter.java:51) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:160) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:131) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:115) > at > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl.executeQuery(ExecuteQueryServiceServer.java:242) > at > com.google.zetasql.testing.SqlComplianceServiceGrpc$MethodHandlers.invoke(SqlComplianceServiceGrpc.java:423) > at > com.google.zetasql.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171) > at > com.google.zetasql.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283) > at > com.google.zetasql.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711) > at > com.google.zetasql.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > com.google.zetasql.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > {code:java} > Apr 01, 2020 11:48:18 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > INFO: Processing Sql statement: SELECT b, x > FROM UNNEST( >[STRUCT(true AS b, [3, 5] AS arr), STRUCT(false AS b, [7, 9] AS arr)]) > t > LEFT JOIN UNNEST(t.arr) x ON b > Apr 01, 2020 11:48:18 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > INFO: Processing Sql statement: SELECT b, x, p > FROM UNNEST( >[STRUCT(true AS b, [3, 5] AS arr), STRUCT(false AS b, [7, 9] AS arr)]) > t > LEFT JOIN UNNEST(t.arr) x WITH OFFSET p ON b > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9657) ArrayScanToJoinConverter ResolvedLiteral cast to ResolvedColumnRef
[ https://issues.apache.org/jira/browse/BEAM-9657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Pilloud reassigned BEAM-9657: Assignee: Andrew Pilloud > ArrayScanToJoinConverter ResolvedLiteral cast to ResolvedColumnRef > -- > > Key: BEAM-9657 > URL: https://issues.apache.org/jira/browse/BEAM-9657 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Andrew Pilloud >Priority: Critical > Labels: zetasql-compliance > > 5 failures in shard 2 > {code:java} > Apr 01, 2020 11:48:47 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > SEVERE: com.google.zetasql.resolvedast.ResolvedNodes$ResolvedLiteral > cannot be cast to > com.google.zetasql.resolvedast.ResolvedNodes$ResolvedColumnRef > java.lang.ClassCastException: > com.google.zetasql.resolvedast.ResolvedNodes$ResolvedLiteral cannot be cast > to com.google.zetasql.resolvedast.ResolvedNodes$ResolvedColumnRef > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:62) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:37) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:97) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Collections$2.tryAdvance(Collections.java:4717) > at java.util.Collections$2.forEachRemaining(Collections.java:4725) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:96) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convert(QueryStatementConverter.java:84) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertRootQuery(QueryStatementConverter.java:51) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:160) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:131) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:115) > at > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl.executeQuery(ExecuteQueryServiceServer.java:242) > at > com.google.zetasql.testing.SqlComplianceServiceGrpc$MethodHandlers.invoke(SqlComplianceServiceGrpc.java:423) > at > com.google.zetasql.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171) > at > com.google.zetasql.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283) > at > com.google.zetasql.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711) > at > com.google.zetasql.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > com.google.zetasql.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) {code} > {code} > Apr 01, 2020 11:48:47 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > INFO: Processing Sql statement: SELECT a, b FROM (SELECT 1 a) JOIN UNNEST([1, > 2, 3]) b ON b > a > Apr 01, 2020 11:48:48 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > INFO: Processing Sql statement: SELECT a, b > FROM UNNEST([1, 1, 2, 3, 5, 8, 13, NULL]) a > JOIN UNNEST([1, 2, 3, 5, 7, 11, 13, NULL]) b > ON a = b > Apr 01, 2020 11:48:48 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > INFO: Processing Sql statement: SELECT a, b > FROM UNNEST([1, 1, 2, 3, 5, 8, 13, NULL]) a > LEFT JOIN UNNEST([1, 2, 3, 5, 7, 11, 13, NULL]) b > ON a = b > Apr 01, 2020 11:48:49 AM >
[jira] [Work started] (BEAM-9657) ArrayScanToJoinConverter ResolvedLiteral cast to ResolvedColumnRef
[ https://issues.apache.org/jira/browse/BEAM-9657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-9657 started by Andrew Pilloud. > ArrayScanToJoinConverter ResolvedLiteral cast to ResolvedColumnRef > -- > > Key: BEAM-9657 > URL: https://issues.apache.org/jira/browse/BEAM-9657 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Andrew Pilloud >Priority: Critical > Labels: zetasql-compliance > > 5 failures in shard 2 > {code:java} > Apr 01, 2020 11:48:47 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > SEVERE: com.google.zetasql.resolvedast.ResolvedNodes$ResolvedLiteral > cannot be cast to > com.google.zetasql.resolvedast.ResolvedNodes$ResolvedColumnRef > java.lang.ClassCastException: > com.google.zetasql.resolvedast.ResolvedNodes$ResolvedLiteral cannot be cast > to com.google.zetasql.resolvedast.ResolvedNodes$ResolvedColumnRef > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:62) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:37) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:97) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Collections$2.tryAdvance(Collections.java:4717) > at java.util.Collections$2.forEachRemaining(Collections.java:4725) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:96) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convert(QueryStatementConverter.java:84) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertRootQuery(QueryStatementConverter.java:51) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:160) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:131) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:115) > at > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl.executeQuery(ExecuteQueryServiceServer.java:242) > at > com.google.zetasql.testing.SqlComplianceServiceGrpc$MethodHandlers.invoke(SqlComplianceServiceGrpc.java:423) > at > com.google.zetasql.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171) > at > com.google.zetasql.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283) > at > com.google.zetasql.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711) > at > com.google.zetasql.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > com.google.zetasql.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) {code} > {code} > Apr 01, 2020 11:48:47 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > INFO: Processing Sql statement: SELECT a, b FROM (SELECT 1 a) JOIN UNNEST([1, > 2, 3]) b ON b > a > Apr 01, 2020 11:48:48 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > INFO: Processing Sql statement: SELECT a, b > FROM UNNEST([1, 1, 2, 3, 5, 8, 13, NULL]) a > JOIN UNNEST([1, 2, 3, 5, 7, 11, 13, NULL]) b > ON a = b > Apr 01, 2020 11:48:48 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > INFO: Processing Sql statement: SELECT a, b > FROM UNNEST([1, 1, 2, 3, 5, 8, 13, NULL]) a > LEFT JOIN UNNEST([1, 2, 3, 5, 7, 11, 13, NULL]) b > ON a = b > Apr 01, 2020 11:48:49 AM >
[jira] [Created] (BEAM-9880) touch: build/target/third_party_licenses/skip: No such file or directory
Kyle Weaver created BEAM-9880: - Summary: touch: build/target/third_party_licenses/skip: No such file or directory Key: BEAM-9880 URL: https://issues.apache.org/jira/browse/BEAM-9880 Project: Beam Issue Type: Bug Components: sdk-java-harness Reporter: Kyle Weaver Assignee: Hannah Jiang When I run ./gradlew :sdks:python:test-suites:portable:py2:crossLanguageTests, I get the following error: > Task :sdks:java:container:createFile FAILED touch: build/target/third_party_licenses/skip: No such file or directory When I do `ls build`, the only thing it outputs is `gradleenv`. So it looks like it's assuming the directory exists, when it might not. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9877) Eager size estimation of large group-by-key iterables cause expensive / duplicate reads
[ https://issues.apache.org/jira/browse/BEAM-9877?focusedWorklogId=430384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430384 ] ASF GitHub Bot logged work on BEAM-9877: Author: ASF GitHub Bot Created on: 04/May/20 19:59 Start Date: 04/May/20 19:59 Worklog Time Spent: 10m Work Description: tudorm commented on pull request #11601: URL: https://github.com/apache/beam/pull/11601#issuecomment-623674559 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430384) Time Spent: 50m (was: 40m) > Eager size estimation of large group-by-key iterables cause expensive / > duplicate reads > --- > > Key: BEAM-9877 > URL: https://issues.apache.org/jira/browse/BEAM-9877 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Tudor Marian >Assignee: Tudor Marian >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > BatchGroupAlsoByWindowViaIteratorsFn WindowReiterator can cause very > expensive duplicate reading of data from a (Co-)GroupByKey for hot keys with > many values due to PCollection size estimation. Instead, it should perform > lazy estimation like GroupingShuffleReader and GroupingShuffleEntryInterator > (or perform no estimation at all). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9877) Eager size estimation of large group-by-key iterables cause expensive / duplicate reads
[ https://issues.apache.org/jira/browse/BEAM-9877?focusedWorklogId=430382=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430382 ] ASF GitHub Bot logged work on BEAM-9877: Author: ASF GitHub Bot Created on: 04/May/20 19:56 Start Date: 04/May/20 19:56 Worklog Time Spent: 10m Work Description: tudorm commented on pull request #11598: URL: https://github.com/apache/beam/pull/11598#issuecomment-623673128 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430382) Time Spent: 40m (was: 0.5h) > Eager size estimation of large group-by-key iterables cause expensive / > duplicate reads > --- > > Key: BEAM-9877 > URL: https://issues.apache.org/jira/browse/BEAM-9877 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Tudor Marian >Assignee: Tudor Marian >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > BatchGroupAlsoByWindowViaIteratorsFn WindowReiterator can cause very > expensive duplicate reading of data from a (Co-)GroupByKey for hot keys with > many values due to PCollection size estimation. Instead, it should perform > lazy estimation like GroupingShuffleReader and GroupingShuffleEntryInterator > (or perform no estimation at all). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-9661) LimitOffsetScanToOrderByLimitConverter IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/BEAM-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-9661 started by Andrew Pilloud. > LimitOffsetScanToOrderByLimitConverter IndexOutOfBoundsException > > > Key: BEAM-9661 > URL: https://issues.apache.org/jira/browse/BEAM-9661 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Andrew Pilloud >Priority: Critical > Labels: zetasql-compliance > Time Spent: 20m > Remaining Estimate: 0h > > Fourteen failures in shard 37, one failure in shard 44 > {code:java} > java.lang.IndexOutOfBoundsException: index (4) must be less than size (1) > at > org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:310) > at > org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:293) > at > org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.SingletonImmutableList.get(SingletonImmutableList.java:41) > at > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexBuilder.makeInputRef(RexBuilder.java:855) > at > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Sort.(Sort.java:103) > at > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalSort.(LogicalSort.java:37) > at > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalSort.create(LogicalSort.java:63) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.LimitOffsetScanToOrderByLimitConverter.convert(LimitOffsetScanToOrderByLimitConverter.java:74) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.LimitOffsetScanToOrderByLimitConverter.convert(LimitOffsetScanToOrderByLimitConverter.java:40) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:97) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convert(QueryStatementConverter.java:84) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertRootQuery(QueryStatementConverter.java:51) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:160) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:131) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:115) > at > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl.executeQuery(ExecuteQueryServiceServer.java:242) > at > com.google.zetasql.testing.SqlComplianceServiceGrpc$MethodHandlers.invoke(SqlComplianceServiceGrpc.java:423) > at > com.google.zetasql.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171) > at > com.google.zetasql.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283) > at > com.google.zetasql.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711) > at > com.google.zetasql.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > com.google.zetasql.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > {code} > Apr 01, 2020 11:50:41 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > INFO: Processing Sql statement: SELECT a FROM (SELECT 2.2 a UNION ALL SELECT > 1.1 UNION ALL SELECT 3.3) ORDER BY a LIMIT 0 > Apr 01, 2020 11:50:42 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > INFO: Processing Sql statement: SELECT a FROM (SELECT 2.2 a UNION ALL SELECT > 1.1 UNION ALL SELECT 3.3) ORDER BY a LIMIT 1 > Apr 01, 2020 11:50:43 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > INFO: Processing Sql statement: SELECT a FROM (SELECT 2.2 a UNION ALL SELECT > 1.1 UNION ALL SELECT 3.3) ORDER BY a LIMIT 2 > Apr 01, 2020 11:50:43 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > INFO: Processing Sql statement: SELECT a FROM (SELECT 3.3 a UNION ALL SELECT > 1.1 UNION ALL SELECT 2.2) ORDER BY a LIMIT 3 > Apr 01, 2020
[jira] [Assigned] (BEAM-9661) LimitOffsetScanToOrderByLimitConverter IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/BEAM-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Pilloud reassigned BEAM-9661: Assignee: Andrew Pilloud > LimitOffsetScanToOrderByLimitConverter IndexOutOfBoundsException > > > Key: BEAM-9661 > URL: https://issues.apache.org/jira/browse/BEAM-9661 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Andrew Pilloud >Priority: Critical > Labels: zetasql-compliance > Time Spent: 20m > Remaining Estimate: 0h > > Fourteen failures in shard 37, one failure in shard 44 > {code:java} > java.lang.IndexOutOfBoundsException: index (4) must be less than size (1) > at > org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:310) > at > org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:293) > at > org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.SingletonImmutableList.get(SingletonImmutableList.java:41) > at > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexBuilder.makeInputRef(RexBuilder.java:855) > at > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Sort.(Sort.java:103) > at > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalSort.(LogicalSort.java:37) > at > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalSort.create(LogicalSort.java:63) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.LimitOffsetScanToOrderByLimitConverter.convert(LimitOffsetScanToOrderByLimitConverter.java:74) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.LimitOffsetScanToOrderByLimitConverter.convert(LimitOffsetScanToOrderByLimitConverter.java:40) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:97) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convert(QueryStatementConverter.java:84) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertRootQuery(QueryStatementConverter.java:51) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:160) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:131) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:115) > at > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl.executeQuery(ExecuteQueryServiceServer.java:242) > at > com.google.zetasql.testing.SqlComplianceServiceGrpc$MethodHandlers.invoke(SqlComplianceServiceGrpc.java:423) > at > com.google.zetasql.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171) > at > com.google.zetasql.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283) > at > com.google.zetasql.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711) > at > com.google.zetasql.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > com.google.zetasql.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > {code} > Apr 01, 2020 11:50:41 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > INFO: Processing Sql statement: SELECT a FROM (SELECT 2.2 a UNION ALL SELECT > 1.1 UNION ALL SELECT 3.3) ORDER BY a LIMIT 0 > Apr 01, 2020 11:50:42 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > INFO: Processing Sql statement: SELECT a FROM (SELECT 2.2 a UNION ALL SELECT > 1.1 UNION ALL SELECT 3.3) ORDER BY a LIMIT 1 > Apr 01, 2020 11:50:43 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > INFO: Processing Sql statement: SELECT a FROM (SELECT 2.2 a UNION ALL SELECT > 1.1 UNION ALL SELECT 3.3) ORDER BY a LIMIT 2 > Apr 01, 2020 11:50:43 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > INFO: Processing Sql statement: SELECT a FROM (SELECT 3.3 a UNION ALL SELECT > 1.1 UNION ALL SELECT 2.2) ORDER BY a LIMIT 3 > Apr
[jira] [Work logged] (BEAM-9661) LimitOffsetScanToOrderByLimitConverter IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/BEAM-9661?focusedWorklogId=430373=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430373 ] ASF GitHub Bot logged work on BEAM-9661: Author: ASF GitHub Bot Created on: 04/May/20 19:37 Start Date: 04/May/20 19:37 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #11602: URL: https://github.com/apache/beam/pull/11602#issuecomment-623663334 R: @robinyqiu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430373) Time Spent: 20m (was: 10m) > LimitOffsetScanToOrderByLimitConverter IndexOutOfBoundsException > > > Key: BEAM-9661 > URL: https://issues.apache.org/jira/browse/BEAM-9661 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Priority: Critical > Labels: zetasql-compliance > Time Spent: 20m > Remaining Estimate: 0h > > Fourteen failures in shard 37, one failure in shard 44 > {code:java} > java.lang.IndexOutOfBoundsException: index (4) must be less than size (1) > at > org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:310) > at > org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:293) > at > org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.SingletonImmutableList.get(SingletonImmutableList.java:41) > at > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexBuilder.makeInputRef(RexBuilder.java:855) > at > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Sort.(Sort.java:103) > at > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalSort.(LogicalSort.java:37) > at > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalSort.create(LogicalSort.java:63) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.LimitOffsetScanToOrderByLimitConverter.convert(LimitOffsetScanToOrderByLimitConverter.java:74) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.LimitOffsetScanToOrderByLimitConverter.convert(LimitOffsetScanToOrderByLimitConverter.java:40) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:97) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convert(QueryStatementConverter.java:84) > at > org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertRootQuery(QueryStatementConverter.java:51) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:160) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:131) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:115) > at > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl.executeQuery(ExecuteQueryServiceServer.java:242) > at > com.google.zetasql.testing.SqlComplianceServiceGrpc$MethodHandlers.invoke(SqlComplianceServiceGrpc.java:423) > at > com.google.zetasql.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171) > at > com.google.zetasql.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283) > at > com.google.zetasql.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711) > at > com.google.zetasql.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > com.google.zetasql.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > {code} > Apr 01, 2020 11:50:41 AM > cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl > executeQuery > INFO: Processing Sql statement: SELECT a FROM (SELECT 2.2 a UNION ALL SELECT > 1.1 UNION ALL SELECT 3.3) ORDER BY a LIMIT 0 > Apr 01, 2020 11:50:42
[jira] [Work logged] (BEAM-9661) LimitOffsetScanToOrderByLimitConverter IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/BEAM-9661?focusedWorklogId=430372=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430372 ] ASF GitHub Bot logged work on BEAM-9661: Author: ASF GitHub Bot Created on: 04/May/20 19:35 Start Date: 04/May/20 19:35 Worklog Time Spent: 10m Work Description: apilloud opened a new pull request #11602: URL: https://github.com/apache/beam/pull/11602 It turns out ZetaSQL column references don't always start at 0, an example of this is on union operators. We probably have a number of bugs in this regard. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-9877) Eager size estimation of large group-by-key iterables cause expensive / duplicate reads
[ https://issues.apache.org/jira/browse/BEAM-9877?focusedWorklogId=430371=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430371 ] ASF GitHub Bot logged work on BEAM-9877: Author: ASF GitHub Bot Created on: 04/May/20 19:30 Start Date: 04/May/20 19:30 Worklog Time Spent: 10m Work Description: ibzib opened a new pull request #11601: URL: https://github.com/apache/beam/pull/11601 …entByteSizeObservableIterable so that size estimation is lazy Cherry-pick of #11598. (I removed the commit history because I forgot to squash before merging the original PR, and one atomic commit is a lot easier to deal with.) R: @tudorm Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-9840) Support for Parameterized Types when converting from HCatRecords to Rows in HCatalogIO
[ https://issues.apache.org/jira/browse/BEAM-9840?focusedWorklogId=430369=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430369 ] ASF GitHub Bot logged work on BEAM-9840: Author: ASF GitHub Bot Created on: 04/May/20 19:07 Start Date: 04/May/20 19:07 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on a change in pull request #11569: URL: https://github.com/apache/beam/pull/11569#discussion_r419653421 ## File path: sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/SchemaUtilsTest.java ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.hcatalog; + +import java.util.ArrayList; +import java.util.List; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.hadoop.hive.metastore.api.FieldSchema; +import org.junit.Assert; +import org.junit.Test; + +public class SchemaUtilsTest { + @Test + public void testParameterizedTypesToBeamTypes() { +List listOfFieldSchema = new ArrayList<>(); +listOfFieldSchema.add(new FieldSchema("parameterizedChar", "char(10)", null)); +listOfFieldSchema.add(new FieldSchema("parameterizedVarchar", "varchar(100)", null)); +listOfFieldSchema.add(new FieldSchema("parameterizedDecimal", "decimal(30,16)", null)); + +Schema expectedSchema = +Schema.builder() +.addNullableField("parameterizedChar", Schema.FieldType.STRING) +.addNullableField("parameterizedVarchar", Schema.FieldType.STRING) +.addNullableField("parameterizedDecimal", Schema.FieldType.DECIMAL) Review comment: I think these should map to logical types instead of to primitives so we don't lose the information from the parameter. Unfortunately we don't (yet) have good logical types in `schemas.logicaltypes` to map them to, but maybe we will after your other PR, https://github.com/apache/beam/pull/11581 (or you could just add the relevant ones here). `char(10)` looks like it could map to a `FixedLengthString` logical type, `varchar(100)` probably deserves its own type, maybe just called `Varchar`? and I've been meaning to add a logical type for DECIMAL parameterized by precision and scale as part of [BEAM-7554](https://issues.apache.org/jira/browse/BEAM-7554) (and deprecate the primitive one). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430369) Time Spent: 50m (was: 40m) > Support for Parameterized Types when converting from HCatRecords to Rows in > HCatalogIO > -- > > Key: BEAM-9840 > URL: https://issues.apache.org/jira/browse/BEAM-9840 > Project: Beam > Issue Type: Improvement > Components: io-java-hcatalog >Reporter: Rahul Patwari >Assignee: Rahul Patwari >Priority: Major > Fix For: 2.22.0 > > Time Spent: 50m > Remaining Estimate: 0h > > In Hive, to use CHAR and VARCHAR as the data type for a column, the types > have to be parameterized with the length of the character sequence. Refer > [https://github.com/apache/hive/blob/f37c5de6c32b9395d1b34fa3c02ed06d1bfbf6eb/serde/src/test/org/apache/hadoop/hive/serde2/typeinfo/TestTypeInfoUtils.java#L68] > In addition, for the DECIMAL data type, custom precision and scale can be > provided as parameters. > A user has faced an Exception while reading data from a table created in Hive > with a parameterized DECIMAL data type. Refer > [https://lists.apache.org/thread.html/r159012fbefce24d734096e3ec24ecd112de5f89b8029e57147d233b0%40%3Cuser.beam.apache.org%3E]. > This ticket is created to support converting from HcatRecords to Rows when > the HCatRecords
[jira] [Work logged] (BEAM-9876) Migrate the Beam website from Jekyll to Hugo to enable localization of the site content
[ https://issues.apache.org/jira/browse/BEAM-9876?focusedWorklogId=430367=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430367 ] ASF GitHub Bot logged work on BEAM-9876: Author: ASF GitHub Bot Created on: 04/May/20 19:04 Start Date: 04/May/20 19:04 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #11554: URL: https://github.com/apache/beam/pull/11554#discussion_r419661024 ## File path: website/www/site/content/en/contribute/release-guide.md ## @@ -218,7 +214,7 @@ docker login docker.io After successful login, authorization info will be stored at ~/.docker/config.json file. For example, ``` "https://index.docker.io/v1/": { - "auth": "xx" + "auth": "aGFubmFoamlhbmc6cmtkdGpmZ2hrMTIxMw==" Review comment: Probably don't want to check this in. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430367) Time Spent: 20m (was: 10m) > Migrate the Beam website from Jekyll to Hugo to enable localization of the > site content > --- > > Key: BEAM-9876 > URL: https://issues.apache.org/jira/browse/BEAM-9876 > Project: Beam > Issue Type: Task > Components: website >Reporter: Aizhamal Nurmamat kyzy >Assignee: Aizhamal Nurmamat kyzy >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Enable internationalization of the Apache Beam website to increase the reach > of the project, and facilitate adoption and growth of its community. > The proposal was to do this by migrating the current Apache Beam website from > Jekyll do Hugo [1]. Hugo supports internationalization out-of-the-box, making > it easier both for contributors and maintainers support the > internationalization effort. > The further discussion on implementation can be viewed here [2] > [1] > [https://lists.apache.org/thread.html/rfab4cc1411318c3f4667bee051df68f37be11846ada877f3576c41a9%40%3Cdev.beam.apache.org%3E] > [2] > [https://lists.apache.org/thread.html/r6b999b6d7d1f6cbb94e16bb2deed2b65098a6b14c4ac98707fe0c36a%40%3Cdev.beam.apache.org%3E] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9874) Portable timers can't be cleared in batch mode
[ https://issues.apache.org/jira/browse/BEAM-9874?focusedWorklogId=430361=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430361 ] ASF GitHub Bot logged work on BEAM-9874: Author: ASF GitHub Bot Created on: 04/May/20 18:59 Start Date: 04/May/20 18:59 Worklog Time Spent: 10m Work Description: ibzib commented on a change in pull request #11597: URL: https://github.com/apache/beam/pull/11597#discussion_r419658069 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryTimerInternals.java ## @@ -167,20 +165,17 @@ public void deleteTimer(StateNamespace namespace, String timerId, TimeDomain tim @Deprecated @Override public void deleteTimer(StateNamespace namespace, String timerId, String timerFamilyId) { -TimerData existing = existingTimers.get(namespace, timerId + '+' + timerFamilyId); -if (existing != null) { - deleteTimer(existing); +TimerData removedTimer = existingTimers.remove(namespace, timerId + '+' + timerFamilyId); +if (removedTimer != null) { + timersForDomain(removedTimer.getDomain()).remove(removedTimer); } } /** @deprecated use {@link #deleteTimer(StateNamespace, String, TimeDomain)}. */ Review comment: Maybe worth a JIRA/refactor? I'll leave it up to you since I am not as familiar with this code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430361) Time Spent: 1h 20m (was: 1h 10m) > Portable timers can't be cleared in batch mode > -- > > Key: BEAM-9874 > URL: https://issues.apache.org/jira/browse/BEAM-9874 > Project: Beam > Issue Type: Bug > Components: runner-flink, runner-spark >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.21.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > After BEAM-9801, the {{test_pardo_timers_clear}} test fails. The test was > probably broken before but this was not visible because we weren't depleting > timers on shutdown. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9879) Support STRING_AGG in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099255#comment-17099255 ] Rui Wang commented on BEAM-9879: A good reference of implementation is what is done in https://issues.apache.org/jira/browse/BEAM-9418 (https://github.com/apache/beam/pull/11333) > Support STRING_AGG in BeamSQL > - > > Key: BEAM-9879 > URL: https://issues.apache.org/jira/browse/BEAM-9879 > Project: Beam > Issue Type: Task > Components: dsl-sql, dsl-sql-zetasql >Reporter: Rui Wang >Priority: Major > > Support STRING_AGG function in ZetaSQL dialect. (Also if Calcite has this > function built-in, support it in Calcite dialect too) > See the reference: > https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#string_agg > Examples: > {code:sql} > SELECT STRING_AGG(fruit) AS string_agg > FROM UNNEST(["apple", NULL, "pear", "banana", "pear"]) AS fruit; > ++ > | string_agg | > ++ > | apple,pear,banana,pear | > ++ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9879) Support STRING_AGG in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-9879: --- Status: Open (was: Triage Needed) > Support STRING_AGG in BeamSQL > - > > Key: BEAM-9879 > URL: https://issues.apache.org/jira/browse/BEAM-9879 > Project: Beam > Issue Type: Task > Components: dsl-sql, dsl-sql-zetasql >Reporter: Rui Wang >Priority: Major > > Support STRING_AGG function in ZetaSQL dialect. (Also if Calcite has this > function built-in, support it in Calcite dialect too) > See the reference: > https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#string_agg > Examples: > {code:sql} > SELECT STRING_AGG(fruit) AS string_agg > FROM UNNEST(["apple", NULL, "pear", "banana", "pear"]) AS fruit; > ++ > | string_agg | > ++ > | apple,pear,banana,pear | > ++ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9879) Support STRING_AGG in BeamSQL
Rui Wang created BEAM-9879: -- Summary: Support STRING_AGG in BeamSQL Key: BEAM-9879 URL: https://issues.apache.org/jira/browse/BEAM-9879 Project: Beam Issue Type: Task Components: dsl-sql, dsl-sql-zetasql Reporter: Rui Wang Support STRING_AGG function in ZetaSQL dialect. (Also if Calcite has this function built-in, support it in Calcite dialect too) See the reference: https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#string_agg Examples: {code:sql} SELECT STRING_AGG(fruit) AS string_agg FROM UNNEST(["apple", NULL, "pear", "banana", "pear"]) AS fruit; ++ | string_agg | ++ | apple,pear,banana,pear | ++ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9418) Support ANY_VALUE aggregation functions
[ https://issues.apache.org/jira/browse/BEAM-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang resolved BEAM-9418. Fix Version/s: Not applicable Resolution: Fixed > Support ANY_VALUE aggregation functions > --- > > Key: BEAM-9418 > URL: https://issues.apache.org/jira/browse/BEAM-9418 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Rui Wang >Assignee: John Mora >Priority: Major > Fix For: Not applicable > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Support the following functionality in BeamSQL: > {code:java} > "select t.key, ANY_VALUE(t.column) from t group by t.key"; > {code} > Spec link: > https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#any_value -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9418) Support ANY_VALUE aggregation functions
[ https://issues.apache.org/jira/browse/BEAM-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099247#comment-17099247 ] Rui Wang commented on BEAM-9418: Thanks John for your contribution! The PR has merged. > Support ANY_VALUE aggregation functions > --- > > Key: BEAM-9418 > URL: https://issues.apache.org/jira/browse/BEAM-9418 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Rui Wang >Assignee: John Mora >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > Support the following functionality in BeamSQL: > {code:java} > "select t.key, ANY_VALUE(t.column) from t group by t.key"; > {code} > Spec link: > https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#any_value -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9878) Update ElementByteSizeObservableIterable
Kyle Weaver created BEAM-9878: - Summary: Update ElementByteSizeObservableIterable Key: BEAM-9878 URL: https://issues.apache.org/jira/browse/BEAM-9878 Project: Beam Issue Type: Improvement Components: runner-dataflow Reporter: Kyle Weaver It's odd that ElementByteSizeObservableIterable::iterator adds observers within the method body. I assume this is for historic reasons, since it doesn't seem to do anything now, and the comment documenting references a setObserver method that doesn't exist. https://github.com/apache/beam/blob/6453e859badcb629ae2528b77d84235b7291ff89/sdks/java/core/src/main/java/org/apache/beam/sdk/util/common/ElementByteSizeObservableIterable.java#L49-L61 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9877) Eager size estimation of large group-by-key iterables cause expensive / duplicate reads
[ https://issues.apache.org/jira/browse/BEAM-9877?focusedWorklogId=430356=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430356 ] ASF GitHub Bot logged work on BEAM-9877: Author: ASF GitHub Bot Created on: 04/May/20 18:45 Start Date: 04/May/20 18:45 Worklog Time Spent: 10m Work Description: ibzib commented on a change in pull request #11598: URL: https://github.com/apache/beam/pull/11598#discussion_r419649174 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BatchGroupAlsoByWindowViaIteratorsFn.java ## @@ -165,12 +168,17 @@ public WindowReiterable( } @Override -public Reiterator iterator() { +public WindowReiterator iterator() { Review comment: It's odd that `ElementByteSizeObservableIterable::iterator` adds observers within the method body. I assume this is for historic reasons, since it doesn't seem to do anything now, and the comment documenting references a `setObserver` method that doesn't exist. Anyway, your change looks fine. But we should consider cleaning this up. https://github.com/apache/beam/blob/6453e859badcb629ae2528b77d84235b7291ff89/sdks/java/core/src/main/java/org/apache/beam/sdk/util/common/ElementByteSizeObservableIterable.java#L49-L61 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430356) Time Spent: 20m (was: 10m) > Eager size estimation of large group-by-key iterables cause expensive / > duplicate reads > --- > > Key: BEAM-9877 > URL: https://issues.apache.org/jira/browse/BEAM-9877 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Tudor Marian >Assignee: Tudor Marian >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > BatchGroupAlsoByWindowViaIteratorsFn WindowReiterator can cause very > expensive duplicate reading of data from a (Co-)GroupByKey for hot keys with > many values due to PCollection size estimation. Instead, it should perform > lazy estimation like GroupingShuffleReader and GroupingShuffleEntryInterator > (or perform no estimation at all). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9840) Support for Parameterized Types when converting from HCatRecords to Rows in HCatalogIO
[ https://issues.apache.org/jira/browse/BEAM-9840?focusedWorklogId=430353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430353 ] ASF GitHub Bot logged work on BEAM-9840: Author: ASF GitHub Bot Created on: 04/May/20 18:41 Start Date: 04/May/20 18:41 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #11569: URL: https://github.com/apache/beam/pull/11569#issuecomment-623636289 @akedin isn't very involved with Beam anymore. I think I can help review this instead This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430353) Time Spent: 40m (was: 0.5h) > Support for Parameterized Types when converting from HCatRecords to Rows in > HCatalogIO > -- > > Key: BEAM-9840 > URL: https://issues.apache.org/jira/browse/BEAM-9840 > Project: Beam > Issue Type: Improvement > Components: io-java-hcatalog >Reporter: Rahul Patwari >Assignee: Rahul Patwari >Priority: Major > Fix For: 2.22.0 > > Time Spent: 40m > Remaining Estimate: 0h > > In Hive, to use CHAR and VARCHAR as the data type for a column, the types > have to be parameterized with the length of the character sequence. Refer > [https://github.com/apache/hive/blob/f37c5de6c32b9395d1b34fa3c02ed06d1bfbf6eb/serde/src/test/org/apache/hadoop/hive/serde2/typeinfo/TestTypeInfoUtils.java#L68] > In addition, for the DECIMAL data type, custom precision and scale can be > provided as parameters. > A user has faced an Exception while reading data from a table created in Hive > with a parameterized DECIMAL data type. Refer > [https://lists.apache.org/thread.html/r159012fbefce24d734096e3ec24ecd112de5f89b8029e57147d233b0%40%3Cuser.beam.apache.org%3E]. > This ticket is created to support converting from HcatRecords to Rows when > the HCatRecords have parameterized types. > To Support Parameterized data types, we can make use of HcatFieldSchema. > Refer > [https://github.com/apache/hive/blob/f37c5de6c32b9395d1b34fa3c02ed06d1bfbf6eb/hcatalog/core/src/main/java/org/apache/hive/hcatalog/data/schema/HCatFieldSchema.java#L34] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9824) Multiple reshuffles are ignored in some cases on Flink batch runner.
[ https://issues.apache.org/jira/browse/BEAM-9824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette resolved BEAM-9824. - Resolution: Fixed Please re-open if this actually fixed > Multiple reshuffles are ignored in some cases on Flink batch runner. > > > Key: BEAM-9824 > URL: https://issues.apache.org/jira/browse/BEAM-9824 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.19.0, 2.20.0 >Reporter: David Morávek >Assignee: David Morávek >Priority: Major > Fix For: 2.22.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Multiple reshuffles are ignored in some cases on Flink batch runner. This may > lead to huge performace penalty in IO connectors (when reshuffling splits). > In flink optimizer, when we `.rebalance()` dataset, is output channel is > marked as `FORCED_REBALANCED`. When we chain this with another > `.rebalance()`, the latter is ignored because it's source is already > `FORCED_REBALANCED`, thus requested property is met. This is correct beaviour > because rebalance is idempotent. > When we include `flatMap` in between rebalances -> > `.rebalance().flatMap(...).rebalance()`, we need to reshuffle again, because > dataset distribution may have changed (eg. you can possibli emit unbouded > stream from a single element). Unfortunatelly `flatMap` output is still > incorrectly marked as `FORCED_REBALANCED` and the second reshuffle gets > ignored. > This especially affects IO connectors -> `FileIO.match()` returns reshuffled > list of matched files -> we split each file into ranges -> **reshuffle** -> > read. Ignoring the second reshuffle leads to huge perf. degradation (5m -> 2h > in one of our production pipelines) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9707) Hardcode runner harness container image for unified worker
[ https://issues.apache.org/jira/browse/BEAM-9707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette reassigned BEAM-9707: --- Assignee: Ankur Goenka > Hardcode runner harness container image for unified worker > -- > > Key: BEAM-9707 > URL: https://issues.apache.org/jira/browse/BEAM-9707 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Fix For: 2.22.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Hardcode runner harness image temporarily to support usage of unified worker > on head. > Remove this hardcoding once 2.21 is release -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9707) Hardcode runner harness container image for unified worker
[ https://issues.apache.org/jira/browse/BEAM-9707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette updated BEAM-9707: Status: Open (was: Triage Needed) > Hardcode runner harness container image for unified worker > -- > > Key: BEAM-9707 > URL: https://issues.apache.org/jira/browse/BEAM-9707 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Fix For: 2.22.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Hardcode runner harness image temporarily to support usage of unified worker > on head. > Remove this hardcoding once 2.21 is release -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-1819) Key should be available in @OnTimer methods
[ https://issues.apache.org/jira/browse/BEAM-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099228#comment-17099228 ] Brian Hulette commented on BEAM-1819: - [~mxm] or [~reuvenlax] could you close this if PR #11154 addressed it completely? > Key should be available in @OnTimer methods > --- > > Key: BEAM-1819 > URL: https://issues.apache.org/jira/browse/BEAM-1819 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Thomas Groh >Assignee: Rehman Murad Ali >Priority: Major > Fix For: 2.22.0 > > Time Spent: 32.5h > Remaining Estimate: 0h > > Every timer firing has an associated key. This key should be available when > the timer is delivered to a user's {{DoFn}}, so they don't have to store it > in state. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9643) Add user-facing Go SDF documentation.
[ https://issues.apache.org/jira/browse/BEAM-9643?focusedWorklogId=430343=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430343 ] ASF GitHub Bot logged work on BEAM-9643: Author: ASF GitHub Bot Created on: 04/May/20 18:22 Start Date: 04/May/20 18:22 Worklog Time Spent: 10m Work Description: lostluck commented on a change in pull request #11517: URL: https://github.com/apache/beam/pull/11517#discussion_r419635469 ## File path: sdks/go/pkg/beam/core/sdf/sdf.go ## @@ -13,63 +13,64 @@ // See the License for the specific language governing permissions and // limitations under the License. -// Package sdf is experimental, incomplete, and not yet meant for general usage. +// Package contains interfaces used specifically for splittable DoFns. +// +// Warning: Splittable DoFns are still experimental, largely untested, and +// likely to have bugs. package sdf // RTracker is an interface used to interact with restrictions while processing elements in -// SplittableDoFns. Each implementation of RTracker is expected to be used for tracking a single -// restriction type, which is the type that should be used to create the RTracker, and output by -// TrySplit. +// splittable DoFns (specifically, in the ProcessElement method). Each RTracker tracks the progress +// of a single restriction. type RTracker interface { - // TryClaim attempts to claim the block of work in the current restriction located at a given - // position. This method must be used in the ProcessElement method of Splittable DoFns to claim - // work before performing it. If no work is claimed, the ProcessElement is not allowed to perform - // work or emit outputs. If the claim is successful, the DoFn must process the entire block. If - // the claim is unsuccessful the ProcessElement method of the DoFn must return without performing - // any additional work or emitting any outputs. - // - // TryClaim accepts an arbitrary value that can be interpreted as the position of a block, and - // returns a boolean indicating whether the claim succeeded. + // TryClaim attempts to claim the block of work located in the given position of the + // restriction. This method must be called in ProcessElement to claim work before it can be + // processed. Processing work without claiming it first can lead to incorrect output. // - // If the claim fails due to an error, that error can be retrieved with GetError. + // If the claim is successful, the DoFn must process the entire block. If the claim is + // unsuccessful ProcessElement method of the DoFn must return without performing + // any additional work or emitting any outputs. // - // For SDFs to work properly, claims must always be monotonically increasing in reference to the - // restriction's start and end points, and every block of work in a restriction must be claimed. + // If the claim fails due to an error, that error is stored and will be automatically emitted + // when the RTracker is validated, or can be manually retrieved with GetError. // // This pseudocode example illustrates the typical usage of TryClaim: // - // pos = position of first block after restriction.start + // pos = position of first block within the restriction // for TryClaim(pos) == true { // // Do all work in the claimed block and emit outputs. - // pos = position of next block + // pos = position of next block within the restriction // } // return TryClaim(pos interface{}) (ok bool) - // GetError returns the error that made this RTracker stop executing, and it returns nil if no - // error occurred. If IsDone fails while validating this RTracker, this method will be - // called to log the error. + // GetError returns the error that made this RTracker stop executing, and returns nil if no + // error occurred. This is the error that is emitted if automated validation fails. GetError() error - // TrySplit splits the current restriction into a primary and residual based on a fraction of the - // work remaining. The split is performed along the first valid split point located after the - // given fraction of the remainder. This method is called by the SDK harness when receiving a - // split request by the runner. + // TrySplit splits the current restriction into a primary (currently executing work) and + // residual (work to be split off) based on a fraction of work remaining. The split is performed + // at the first valid split point located after the given fraction of remaining work. + // +
[jira] [Work logged] (BEAM-9874) Portable timers can't be cleared in batch mode
[ https://issues.apache.org/jira/browse/BEAM-9874?focusedWorklogId=430341=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430341 ] ASF GitHub Bot logged work on BEAM-9874: Author: ASF GitHub Bot Created on: 04/May/20 18:17 Start Date: 04/May/20 18:17 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #11597: URL: https://github.com/apache/beam/pull/11597#discussion_r419632632 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryTimerInternals.java ## @@ -167,20 +165,17 @@ public void deleteTimer(StateNamespace namespace, String timerId, TimeDomain tim @Deprecated @Override public void deleteTimer(StateNamespace namespace, String timerId, String timerFamilyId) { -TimerData existing = existingTimers.get(namespace, timerId + '+' + timerFamilyId); -if (existing != null) { - deleteTimer(existing); +TimerData removedTimer = existingTimers.remove(namespace, timerId + '+' + timerFamilyId); +if (removedTimer != null) { + timersForDomain(removedTimer.getDomain()).remove(removedTimer); } } /** @deprecated use {@link #deleteTimer(StateNamespace, String, TimeDomain)}. */ Review comment: yeah, actually that might have been a mistake when the timer family was added because the old way was to use `TimerData` and the new way is to use timerId/timerFamily. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430341) Time Spent: 1h 10m (was: 1h) > Portable timers can't be cleared in batch mode > -- > > Key: BEAM-9874 > URL: https://issues.apache.org/jira/browse/BEAM-9874 > Project: Beam > Issue Type: Bug > Components: runner-flink, runner-spark >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.21.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > After BEAM-9801, the {{test_pardo_timers_clear}} test fails. The test was > probably broken before but this was not visible because we weren't depleting > timers on shutdown. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9877) Eager size estimation of large group-by-key iterables cause expensive / duplicate reads
[ https://issues.apache.org/jira/browse/BEAM-9877?focusedWorklogId=430338=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430338 ] ASF GitHub Bot logged work on BEAM-9877: Author: ASF GitHub Bot Created on: 04/May/20 18:09 Start Date: 04/May/20 18:09 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11598: URL: https://github.com/apache/beam/pull/11598#issuecomment-623619474 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 430338) Remaining Estimate: 0h Time Spent: 10m > Eager size estimation of large group-by-key iterables cause expensive / > duplicate reads > --- > > Key: BEAM-9877 > URL: https://issues.apache.org/jira/browse/BEAM-9877 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Tudor Marian >Assignee: Tudor Marian >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > BatchGroupAlsoByWindowViaIteratorsFn WindowReiterator can cause very > expensive duplicate reading of data from a (Co-)GroupByKey for hot keys with > many values due to PCollection size estimation. Instead, it should perform > lazy estimation like GroupingShuffleReader and GroupingShuffleEntryInterator > (or perform no estimation at all). -- This message was sent by Atlassian Jira (v8.3.4#803005)