date:20200504

[jira] [Work logged] (BEAM-9430) Migrate from ProcessContext#updateWatermark to WatermarkEstimators

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9430?focusedWorklogId=430515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430515
 ]

ASF GitHub Bot logged work on BEAM-9430:


Author: ASF GitHub Bot
Created on: 05/May/20 05:15
Start Date: 05/May/20 05:15
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #11607:
URL: https://github.com/apache/beam/pull/11607#issuecomment-623860331


   Shouldn't we also disable `IllegalArgumentException`s in the constructors?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430515)
Time Spent: 5h 40m  (was: 5.5h)

> Migrate from ProcessContext#updateWatermark to WatermarkEstimators
> --
>
> Key: BEAM-9430
> URL: https://issues.apache.org/jira/browse/BEAM-9430
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Labels: backward-incompatible
> Fix For: 2.21.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Current discussion underway in 
> [https://lists.apache.org/thread.html/r5d974b6a58bc04ff4c02682fda4ef68608121f1bf23a86e9d592ca6e%40%3Cdev.beam.apache.org%3E]
>  
> Proposed API: [https://github.com/apache/beam/pull/10992]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9779) HL7v2IOWriteIT is flaky

2020-05-04 Thread Jacob Ferriero (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099525#comment-17099525
 ] 

Jacob Ferriero commented on BEAM-9779:
--

[~chamikara] Has the changes in https://github.com/apache/beam/pull/11450 
stabilized this test? should we close this?

> HL7v2IOWriteIT is flaky
> ---
>
> Key: BEAM-9779
> URL: https://issues.apache.org/jira/browse/BEAM-9779
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, test-failures
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Critical
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> There seems to be a race condition somewhere in HL7v2IOWriteIT that causes 
> flakiness.
> https://builds.apache.org/job/beam_PostCommit_Java/5947/
> https://builds.apache.org/job/beam_PostCommit_Java/5943/
> https://builds.apache.org/job/beam_PostCommit_Java/5942/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9449) Consider passing pipeline options for expansion service.

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9449?focusedWorklogId=430502=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430502
 ]

ASF GitHub Bot logged work on BEAM-9449:


Author: ASF GitHub Bot
Created on: 05/May/20 03:36
Start Date: 05/May/20 03:36
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #11574:
URL: https://github.com/apache/beam/pull/11574#issuecomment-623816157


   Thanks. Looks good to me overall.
   
   I think we should also consider adding optional `pipeline_options` argument 
to `ExternalTransform` given that each different expansion service needs 
different pipeline options.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430502)
Time Spent: 40m  (was: 0.5h)

> Consider passing pipeline options for expansion service.
> 
>
> Key: BEAM-9449
> URL: https://issues.apache.org/jira/browse/BEAM-9449
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=430499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430499
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 05/May/20 03:11
Start Date: 05/May/20 03:11
Worklog Time Spent: 10m 
  Work Description: ihji commented on a change in pull request #11039:
URL: https://github.com/apache/beam/pull/11039#discussion_r419841344



##
File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
##
@@ -462,7 +462,8 @@ def run_pipeline(self, pipeline, options):
 use_fnapi = apiclient._use_fnapi(options)
 from apache_beam.transforms import environments
 default_environment = environments.DockerEnvironment.from_container_image(
-apiclient.get_container_image_from_options(options))
+apiclient.get_container_image_from_options(options),
+artifacts=environments.python_sdk_dependencies(options))

Review comment:
   We need pipeline option to populate artifacts. I think we could either 
use `from_options` instead and override container image or just leave as is.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430499)
Time Spent: 8h  (was: 7h 50m)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9679) Core Transforms | Go SDK Code Katas

2020-05-04 Thread Damon Douglas (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099501#comment-17099501
 ] 

Damon Douglas commented on BEAM-9679:
-

PR [https://github.com/apache/beam/pull/11564] approved

> Core Transforms | Go SDK Code Katas
> ---
>
> Key: BEAM-9679
> URL: https://issues.apache.org/jira/browse/BEAM-9679
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: Major
>
> A kata devoted to core beam transforms patterns after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
>  where the take away is an individual's ability to master the following using 
> an Apache Beam pipeline using the Golang SDK.
>  * Branching
>  * CoGroupByKey
>  * Combine
>  * Composite Transform
>  * DoFn Additional Parameters
>  * Flatten
>  * GroupByKey
>  * [Map|[https://github.com/apache/beam/pull/11564]]
>  * Partition
>  * Side Input



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9679) Core Transforms | Go SDK Code Katas

2020-05-04 Thread Damon Douglas (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damon Douglas updated BEAM-9679:

Description: 
A kata devoted to core beam transforms patterns after 
[https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
 where the take away is an individual's ability to master the following using 
an Apache Beam pipeline using the Golang SDK.
 * Branching
 * CoGroupByKey
 * Combine
 * Composite Transform
 * DoFn Additional Parameters
 * Flatten
 * GroupByKey
 * [Map|[https://github.com/apache/beam/pull/11564]]
 * Partition
 * Side Input

  was:
A kata devoted to core beam transforms patterns after 
[https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
 where the take away is an individual's ability to master the following using 
an Apache Beam pipeline using the Golang SDK.
 * Branching
 * CoGroupByKey
 * Combine
 * Composite Transform
 * DoFn Additional Parameters
 * Flatten
 * GroupByKey
 * 
[Map|[https://github.com/damondouglas/beam/tree/BEAM-9679-core-transforms-map-kata-go]]
 * Partition
 * Side Input


> Core Transforms | Go SDK Code Katas
> ---
>
> Key: BEAM-9679
> URL: https://issues.apache.org/jira/browse/BEAM-9679
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: Major
>
> A kata devoted to core beam transforms patterns after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
>  where the take away is an individual's ability to master the following using 
> an Apache Beam pipeline using the Golang SDK.
>  * Branching
>  * CoGroupByKey
>  * Combine
>  * Composite Transform
>  * DoFn Additional Parameters
>  * Flatten
>  * GroupByKey
>  * [Map|[https://github.com/apache/beam/pull/11564]]
>  * Partition
>  * Side Input



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=430498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430498
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 05/May/20 03:04
Start Date: 05/May/20 03:04
Worklog Time Spent: 10m 
  Work Description: ihji commented on a change in pull request #11039:
URL: https://github.com/apache/beam/pull/11039#discussion_r419839927



##
File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py
##
@@ -102,7 +102,8 @@ def setUp(self):
 '--staging_location=ignored',
 '--temp_location=/dev/null',
 '--no_auth',
-'--dry_run=True'
+'--dry_run=True',
+'--sdk_location=container'

Review comment:
   The test tries to download a dev version of apache-beam dependency 
(which indeed does not exist in pypi) when it constructs the environment.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430498)
Time Spent: 7h 50m  (was: 7h 40m)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=430497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430497
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 05/May/20 02:57
Start Date: 05/May/20 02:57
Worklog Time Spent: 10m 
  Work Description: ihji commented on a change in pull request #11039:
URL: https://github.com/apache/beam/pull/11039#discussion_r419838456



##
File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java
##
@@ -336,25 +323,26 @@ public DataflowPackage stageToFile(
 final AtomicInteger numCached = new AtomicInteger(0);
 List> destinationPackages = new 
ArrayList<>();
 
-for (String classpathElement : classpathElements) {
-  DataflowPackage sourcePackage = new DataflowPackage();
-  if (classpathElement.contains("=")) {
-String[] components = classpathElement.split("=", 2);
-sourcePackage.setName(components[0]);
-sourcePackage.setLocation(components[1]);
-  } else {
-sourcePackage.setName(null);
-sourcePackage.setLocation(classpathElement);
+for (StagedFile classpathElement : classpathElements) {
+  DataflowPackage targetPackage = classpathElement.getStagedPackage();
+  String source = classpathElement.getSource();
+  if (source.contains("=")) {

Review comment:
   removed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430497)
Time Spent: 7h 40m  (was: 7.5h)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=430496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430496
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 05/May/20 02:56
Start Date: 05/May/20 02:56
Worklog Time Spent: 10m 
  Work Description: ihji commented on a change in pull request #11039:
URL: https://github.com/apache/beam/pull/11039#discussion_r419838363



##
File path: model/pipeline/src/main/proto/beam_runner_api.proto
##
@@ -1271,6 +1271,11 @@ message DeferredArtifactPayload {
 message ArtifactStagingToRolePayload {
   // A generated staged name (relative path under staging directory).
   string staged_name = 1;
+
+  // (Optional) An artifact name when a runner supports it.
+  // For example, DataflowRunner requires predefined names for some artifacts
+  // such as "dataflow-worker.jar", "windmill_main".
+  string alias_name = 2;

Review comment:
   This is Dataflow specific requirement. `DataflowPackage` model has two 
separate fields for `location` and `name`. `staged_name` and `alias_name` 
correspond to `location` and `name` respectively.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430496)
Time Spent: 7.5h  (was: 7h 20m)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=430494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430494
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 05/May/20 02:27
Start Date: 05/May/20 02:27
Worklog Time Spent: 10m 
  Work Description: ihji commented on a change in pull request #11039:
URL: https://github.com/apache/beam/pull/11039#discussion_r419832339



##
File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
##
@@ -772,6 +783,88 @@ private Debuggee registerDebuggee(CloudDebugger 
debuggerClient, String uniquifie
 }
   }
 
+  private List stageArtifacts(RunnerApi.Pipeline pipeline) {
+ImmutableList.Builder filesToStageBuilder = 
ImmutableList.builder();
+for (Map.Entry entry :
+pipeline.getComponents().getEnvironmentsMap().entrySet()) {
+  for (RunnerApi.ArtifactInformation info : 
entry.getValue().getDependenciesList()) {
+if 
(!BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn()))
 {
+  throw new RuntimeException(
+  String.format("unsupported artifact type %s", 
info.getTypeUrn()));
+}
+RunnerApi.ArtifactFilePayload filePayload;
+try {
+  filePayload = 
RunnerApi.ArtifactFilePayload.parseFrom(info.getTypePayload());
+} catch (InvalidProtocolBufferException e) {
+  throw new RuntimeException("Error parsing artifact file payload.", 
e);
+}
+if (!BeamUrns.getUrn(RunnerApi.StandardArtifacts.Roles.STAGING_TO)
+.equals(info.getRoleUrn())) {
+  throw new RuntimeException(
+  String.format("unsupported artifact role %s", 
info.getRoleUrn()));
+}
+RunnerApi.ArtifactStagingToRolePayload stagingPayload;
+try {
+  stagingPayload = 
RunnerApi.ArtifactStagingToRolePayload.parseFrom(info.getRolePayload());
+} catch (InvalidProtocolBufferException e) {
+  throw new RuntimeException("Error parsing artifact staging_to role 
payload.", e);
+}
+DataflowPackage target = new DataflowPackage();
+target.setLocation(stagingPayload.getStagedName());
+if (!Strings.isNullOrEmpty(stagingPayload.getAliasName())) {
+  target.setName(stagingPayload.getAliasName());
+}
+filesToStageBuilder.add(StagedFile.of(filePayload.getPath(), target));
+  }
+}
+return options.getStager().stageFiles(filesToStageBuilder.build());
+  }
+
+  private List getDefaultArtifacts() {
+ImmutableList.Builder pathsToStageBuilder = 
ImmutableList.builder();
+ImmutableMap.Builder aliasMapBuilder = 
ImmutableMap.builder();
+String windmillBinary =
+
options.as(DataflowPipelineDebugOptions.class).getOverrideWindmillBinary();
+String dataflowWorkerJar = options.getDataflowWorkerJar();
+if (dataflowWorkerJar != null && !dataflowWorkerJar.isEmpty()) {
+  // Put the user specified worker jar at the start of the classpath, to 
be consistent with the
+  // built in worker order.
+  pathsToStageBuilder.add(dataflowWorkerJar);
+  aliasMapBuilder.put(dataflowWorkerJar, "dataflow-worker.jar");
+}
+for (String path : options.getFilesToStage()) {
+  if (path.contains("=")) {

Review comment:
   Yes. This syntax is only supported in Dataflow runner. `DataflowPackage` 
has a separate field `name` in addition to `location` and "=" separator allows 
to prefix `name` to the location of the source e.g. 
"dataflow.jar=/tmp/foo.jar". I could remove this special syntax but I decided 
to keep it since it's already exposed to users via `--filesToStage` option so 
removing it may cause backward compatibility issue.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430494)
Time Spent: 7h 20m  (was: 7h 10m)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8949) Add Spanner IO Integration Test for Python

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8949?focusedWorklogId=430493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430493
 ]

ASF GitHub Bot logged work on BEAM-8949:


Author: ASF GitHub Bot
Created on: 05/May/20 02:23
Start Date: 05/May/20 02:23
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11210:
URL: https://github.com/apache/beam/pull/11210#issuecomment-623801499


   @mszb - Do you know why the test is failing?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430493)
Time Spent: 8.5h  (was: 8h 20m)

> Add Spanner IO Integration Test for Python
> --
>
> Key: BEAM-8949
> URL: https://issues.apache.org/jira/browse/BEAM-8949
> Project: Beam
>  Issue Type: Test
>  Components: io-py-gcp
>Reporter: Shoaib Zafar
>Assignee: Shoaib Zafar
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Spanner IO (Python SDK) contains PTransform which uses the BatchAPI to read 
> from the spanner. Currently, it only contains direct runner unit tests. In 
> order to make this functionality available for the users, integration tests 
> also need to be added.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8910) Use AVRO instead of JSON in BigQuery bounded source.

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8910?focusedWorklogId=430492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430492
 ]

ASF GitHub Bot logged work on BEAM-8910:


Author: ASF GitHub Bot
Created on: 05/May/20 02:18
Start Date: 05/May/20 02:18
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11086:
URL: https://github.com/apache/beam/pull/11086#issuecomment-623800613







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430492)
Time Spent: 9h 50m  (was: 9h 40m)

> Use AVRO instead of JSON in BigQuery bounded source.
> 
>
> Key: BEAM-8910
> URL: https://issues.apache.org/jira/browse/BEAM-8910
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kamil Wasilewski
>Assignee: Pablo Estrada
>Priority: Minor
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> The proposed BigQuery bounded source in Python SDK (see PR: 
> [https://github.com/apache/beam/pull/9772)] uses a BigQuery export job to 
> take a snapshot of the table and read from each produced JSON file. A 
> performance improvement can be gain by switching to AVRO instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=430491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430491
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 05/May/20 02:16
Start Date: 05/May/20 02:16
Worklog Time Spent: 10m 
  Work Description: ihji commented on a change in pull request #11039:
URL: https://github.com/apache/beam/pull/11039#discussion_r419829971



##
File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
##
@@ -784,7 +877,25 @@ public DataflowPipelineJob run(Pipeline pipeline) {
 "Executing pipeline on the Dataflow Service, which will have billing 
implications "
 + "related to Google Compute Engine usage and other Google Cloud 
Services.");
 
-List packages = options.getStager().stageDefaultFiles();
+// Capture the sdkComponents for look up during step translations
+SdkComponents sdkComponents = SdkComponents.create();
+
+DataflowPipelineOptions dataflowOptions = 
options.as(DataflowPipelineOptions.class);
+String workerHarnessContainerImageURL = 
DataflowRunner.getContainerImageForJob(dataflowOptions);
+RunnerApi.Environment defaultEnvironmentForDataflow =
+Environments.createDockerEnvironment(workerHarnessContainerImageURL);
+
+sdkComponents.registerEnvironment(
+defaultEnvironmentForDataflow
+.toBuilder()
+.addAllDependencies(getDefaultArtifacts())
+.build());
+
+RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, 
sdkComponents, true);
+
+LOG.debug("Portable pipeline proto:\n{}", 
TextFormat.printToString(pipelineProto));

Review comment:
   This debug log is not new. It's just relocated. Do you think it would be 
better to remove this?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430491)
Time Spent: 7h 10m  (was: 7h)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=430490=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430490
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 05/May/20 02:14
Start Date: 05/May/20 02:14
Worklog Time Spent: 10m 
  Work Description: ihji commented on a change in pull request #11039:
URL: https://github.com/apache/beam/pull/11039#discussion_r419829659



##
File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java
##
@@ -210,56 +209,55 @@ public static Environment createProcessEnvironment(
 }
   }
 
-  private static List getArtifacts(List 
stagingFiles) {
-Set pathsToStage = Sets.newHashSet(stagingFiles);
+  public static List getArtifacts(
+  List stagingFiles, StagingFileNameGenerator generator) {
 ImmutableList.Builder artifactsBuilder = 
ImmutableList.builder();
-for (String path : pathsToStage) {
+for (String path : ImmutableSet.copyOf(stagingFiles)) {

Review comment:
   `ImmutableSet` preserves the order but I think we don't need to make a 
copy here. Will use `LinkedHashSet` instead.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430490)
Time Spent: 7h  (was: 6h 50m)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-9886) ReadFromBigQuery should auto-infer which project to bill for BQ exports

2020-05-04 Thread Pablo Estrada (Jira)

Pablo Estrada created BEAM-9886:
---

 Summary: ReadFromBigQuery should auto-infer which project to bill 
for BQ exports
 Key: BEAM-9886
 URL: https://issues.apache.org/jira/browse/BEAM-9886
 Project: Beam
  Issue Type: Bug
  Components: io-py-gcp
Reporter: Pablo Estrada
Assignee: Pablo Estrada






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9840) Support for Parameterized Types when converting from HCatRecords to Rows in HCatalogIO

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9840?focusedWorklogId=430488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430488
 ]

ASF GitHub Bot logged work on BEAM-9840:


Author: ASF GitHub Bot
Created on: 05/May/20 02:05
Start Date: 05/May/20 02:05
Worklog Time Spent: 10m 
  Work Description: rahul8383 commented on a change in pull request #11569:
URL: https://github.com/apache/beam/pull/11569#discussion_r419827790



##
File path: 
sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/SchemaUtilsTest.java
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.hcatalog;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class SchemaUtilsTest {
+  @Test
+  public void testParameterizedTypesToBeamTypes() {
+List listOfFieldSchema = new ArrayList<>();
+listOfFieldSchema.add(new FieldSchema("parameterizedChar", "char(10)", 
null));
+listOfFieldSchema.add(new FieldSchema("parameterizedVarchar", 
"varchar(100)", null));
+listOfFieldSchema.add(new FieldSchema("parameterizedDecimal", 
"decimal(30,16)", null));
+
+Schema expectedSchema =
+Schema.builder()
+.addNullableField("parameterizedChar", Schema.FieldType.STRING)
+.addNullableField("parameterizedVarchar", Schema.FieldType.STRING)
+.addNullableField("parameterizedDecimal", Schema.FieldType.DECIMAL)

Review comment:
   Thanks @TheNeuralBit for the review.
   
   I am adding logical types in `schemas.logicaltypes` called 
`VariableLengthBytes`, `FixedLengthString`, `VariableLengthString`, 
`LogicalDecimal` as part of #11581 .
   
   I will take up the task of mapping these to logical types once my other PR 
gets merged. I also hope that #11272 get merged by then, so that I can use 
`SqlTypes.DATE` logical type. Can you please create a JIRA ticket and assign it 
to me.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430488)
Time Spent: 1h  (was: 50m)

> Support for Parameterized Types when converting from HCatRecords to Rows in 
> HCatalogIO
> --
>
> Key: BEAM-9840
> URL: https://issues.apache.org/jira/browse/BEAM-9840
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hcatalog
>Reporter: Rahul Patwari
>Assignee: Rahul Patwari
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In Hive, to use CHAR and VARCHAR as the data type for a column, the types 
> have to be parameterized with the length of the character sequence. Refer 
> [https://github.com/apache/hive/blob/f37c5de6c32b9395d1b34fa3c02ed06d1bfbf6eb/serde/src/test/org/apache/hadoop/hive/serde2/typeinfo/TestTypeInfoUtils.java#L68]
> In addition, for the DECIMAL data type, custom precision and scale can be 
> provided as parameters.
> A user has faced an Exception while reading data from a table created in Hive 
> with a parameterized DECIMAL data type. Refer 
> [https://lists.apache.org/thread.html/r159012fbefce24d734096e3ec24ecd112de5f89b8029e57147d233b0%40%3Cuser.beam.apache.org%3E].
> This ticket is created to support converting from HcatRecords to Rows when 
> the HCatRecords have parameterized types.
> To Support Parameterized data types, we can make use of HcatFieldSchema. 
> Refer 
>

[jira] [Updated] (BEAM-9886) ReadFromBigQuery should auto-infer which project to bill for BQ exports

2020-05-04 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9886:
--
Status: Open  (was: Triage Needed)

> ReadFromBigQuery should auto-infer which project to bill for BQ exports
> ---
>
> Key: BEAM-9886
> URL: https://issues.apache.org/jira/browse/BEAM-9886
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9876) Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9876?focusedWorklogId=430487=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430487
 ]

ASF GitHub Bot logged work on BEAM-9876:


Author: ASF GitHub Bot
Created on: 05/May/20 02:02
Start Date: 05/May/20 02:02
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #11554:
URL: https://github.com/apache/beam/pull/11554#discussion_r419826968



##
File path: website/www/site/content/en/community/contact-us.md
##
@@ -0,0 +1,47 @@
+---
+title: "Contact Us"
+aliases:
+  - /community/
+  - /use/issue-tracking/
+  - /use/mailing-lists/
+  - /get-started/support/
+---
+
+
+# Contact Us
+
+There are many ways to reach the Beam user and developer communities - use
+whichever one seems best.
+
+

Review comment:
   What is the problem here? What is markdownify and what is superscripts ?

##
File path: 
website/www/site/content/en/documentation/transforms/java/elementwise/regex.md
##
@@ -0,0 +1,34 @@
+---
+title: "Regex"
+---
+
+# Regex
+
+https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Regex.html;>
+  https://beam.apache.org/images/logos/sdks/java.png; 
width="20px" height="20px"
+   alt="Javadoc" />
+ Javadoc
+
+
+

Review comment:
   Why do we multiple breaks here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430487)
Time Spent: 50m  (was: 40m)

> Migrate the Beam website from Jekyll to Hugo to enable localization of the 
> site content
> ---
>
> Key: BEAM-9876
> URL: https://issues.apache.org/jira/browse/BEAM-9876
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Aizhamal Nurmamat kyzy
>Assignee: Aizhamal Nurmamat kyzy
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Enable internationalization of the Apache Beam website to increase the reach 
> of the project, and facilitate adoption and growth of its community.
> The proposal was to do this by migrating the current Apache Beam website from 
> Jekyll do Hugo [1]. Hugo supports internationalization out-of-the-box, making 
> it easier both for contributors and maintainers support the 
> internationalization effort.
> The further discussion on implementation can be viewed here  [2]
> [1] 
> [https://lists.apache.org/thread.html/rfab4cc1411318c3f4667bee051df68f37be11846ada877f3576c41a9%40%3Cdev.beam.apache.org%3E]
> [2] 
> [https://lists.apache.org/thread.html/r6b999b6d7d1f6cbb94e16bb2deed2b65098a6b14c4ac98707fe0c36a%40%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9876) Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9876?focusedWorklogId=430485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430485
 ]

ASF GitHub Bot logged work on BEAM-9876:


Author: ASF GitHub Bot
Created on: 05/May/20 01:40
Start Date: 05/May/20 01:40
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #11554:
URL: https://github.com/apache/beam/pull/11554#discussion_r419822575



##
File path: 
website/www/site/content/en/documentation/transforms/python/elementwise/pardo.md
##
@@ -0,0 +1,142 @@
+---
+title: "Partition"

Review comment:
   Another copy of "Partiton." (There may be others, we should verify we 
haven't lost content in the move.)

##
File path: 
website/www/site/content/en/documentation/transforms/python/elementwise/map.md
##
@@ -1,8 +1,5 @@
 ---
-layout: section
-title: "Map"
-permalink: /documentation/transforms/python/elementwise/map/
-section_menu: section-menu/documentation.html
+title: "Partition"

Review comment:
   Is this Partition or Map?

##
File path: 
website/www/site/content/en/documentation/transforms/python/elementwise/kvswap.md
##
@@ -0,0 +1,142 @@
+---
+title: "Partition"

Review comment:
   This seems to be the wrong page. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430485)
Time Spent: 40m  (was: 0.5h)

> Migrate the Beam website from Jekyll to Hugo to enable localization of the 
> site content
> ---
>
> Key: BEAM-9876
> URL: https://issues.apache.org/jira/browse/BEAM-9876
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Aizhamal Nurmamat kyzy
>Assignee: Aizhamal Nurmamat kyzy
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Enable internationalization of the Apache Beam website to increase the reach 
> of the project, and facilitate adoption and growth of its community.
> The proposal was to do this by migrating the current Apache Beam website from 
> Jekyll do Hugo [1]. Hugo supports internationalization out-of-the-box, making 
> it easier both for contributors and maintainers support the 
> internationalization effort.
> The further discussion on implementation can be viewed here  [2]
> [1] 
> [https://lists.apache.org/thread.html/rfab4cc1411318c3f4667bee051df68f37be11846ada877f3576c41a9%40%3Cdev.beam.apache.org%3E]
> [2] 
> [https://lists.apache.org/thread.html/r6b999b6d7d1f6cbb94e16bb2deed2b65098a6b14c4ac98707fe0c36a%40%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9856?focusedWorklogId=430483=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430483
 ]

ASF GitHub Bot logged work on BEAM-9856:


Author: ASF GitHub Bot
Created on: 05/May/20 01:36
Start Date: 05/May/20 01:36
Worklog Time Spent: 10m 
  Work Description: jaketf commented on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623793025


   @chamikaramj @lukecwik 
   Apologies, I believe the the NPE was a user error on my part. 
   I've been able to revert my changes to to OffsetRangeTracker without 
reintroducing the NPE.
   
   To help future users as foolish as me,  to get a less confusing NPE, I 
suggest we do something like I put in 8891292. But this is definitely not a 
core issue in OffsetRangeTracker.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430483)
Time Spent: 2h 20m  (was: 2h 10m)

> HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
> --
>
> Key: BEAM-9856
> URL: https://issues.apache.org/jira/browse/BEAM-9856
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently the List Messages API paginates through in a single ProcessElement 
> Call.
> However we could get a restriction based on createTime using Messages.List 
> filter and orderby.
>  
> This is inline with the future roadmap of  HL7v2 bulk export API becomes 
> available that should allow splitting on (e.g. create time dimension). 
> Leveraging this bulk export might be  a future optimization to explore.
>  
> This could take one of two forms:
> 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make 
> optimization the runner's problem, potentially unnecessarily complex for this 
> use case )
> 2. static splitting on some time partition e.g. finding the earliest 
> createTime and emitting a PCollection of 1 hour partitions and paginating 
> through each hour of data w/ in the time frame that the store spans, in a 
> separate ProcessElement. (easy to implement but will likely have hot keys / 
> stragglers based on "busy hours")
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-9881) refactor KafkaIOIT for reusing pipeline

2020-05-04 Thread Heejong Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee resolved BEAM-9881.
---
Fix Version/s: Not applicable
   Resolution: Won't Do

> refactor KafkaIOIT for reusing pipeline
> ---
>
> Key: BEAM-9881
> URL: https://issues.apache.org/jira/browse/BEAM-9881
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9856?focusedWorklogId=430474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430474
 ]

ASF GitHub Bot logged work on BEAM-9856:


Author: ASF GitHub Bot
Created on: 05/May/20 00:46
Start Date: 05/May/20 00:46
Worklog Time Spent: 10m 
  Work Description: jaketf edited a comment on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796


   @chamikaramj thanks for the suggestion. I will look into using BoundedSource 
API in a separate PR and we can compare.
   
   Unfortunately, regular DoFns don't cut it because a single elements outputs 
are committed atomically (see this 
[conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)).
   Basically we have one input element (HL7v2 store) exploding to many, many 
output elements (all the messages in that store) in a single ProcessElement 
call. I'm trying to explore strategies for splitting up this listing (because 
due to pagination nature of messages.list it is single threaded).
   
   I originally chose splittable DoFn over BoundedSource based off the 
sentiment of this statement:
   > **Coding against the Source API involves a lot of boilerplate and is 
error-prone**, and it does not compose well with the rest of the Beam model 
because a Source can appear only at the root of a pipeline. - 
https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html
   
   Using splittable DoFn also made reading from multiple HL7v2 stores come for 
free (e.g. if you had several regional HL7v2 stores and your use case was to 
read from them all and write to a single multi-regional store). This admittedly 
a rather contrived use case. 
   
   The blog also mentions 
   - A Source can not emit an additional output (for example, records that 
failed to parse).
   - Healthcare customers feeding requirements for this plugin want DLQ on 
all sinks and sources. To be consistent with the streaming API provided in 
`HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I 
believe this is more of a nice to have for batch use cases (because there's no 
room for passing ListMessages bad messages IDs like there is in HL7v2IO.Read).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430474)
Time Spent: 2h 10m  (was: 2h)

> HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
> --
>
> Key: BEAM-9856
> URL: https://issues.apache.org/jira/browse/BEAM-9856
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently the List Messages API paginates through in a single ProcessElement 
> Call.
> However we could get a restriction based on createTime using Messages.List 
> filter and orderby.
>  
> This is inline with the future roadmap of  HL7v2 bulk export API becomes 
> available that should allow splitting on (e.g. create time dimension). 
> Leveraging this bulk export might be  a future optimization to explore.
>  
> This could take one of two forms:
> 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make 
> optimization the runner's problem, potentially unnecessarily complex for this 
> use case )
> 2. static splitting on some time partition e.g. finding the earliest 
> createTime and emitting a PCollection of 1 hour partitions and paginating 
> through each hour of data w/ in the time frame that the store spans, in a 
> separate ProcessElement. (easy to implement but will likely have hot keys / 
> stragglers based on "busy hours")
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9650) Add consistent slowly changing side inputs support

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9650?focusedWorklogId=430473=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430473
 ]

ASF GitHub Bot logged work on BEAM-9650:


Author: ASF GitHub Bot
Created on: 05/May/20 00:42
Start Date: 05/May/20 00:42
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #11582:
URL: https://github.com/apache/beam/pull/11582#discussion_r419808099



##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -283,6 +284,8 @@ def compute_table_name(row):
 'BigQuerySink',
 'WriteToBigQuery',
 'ReadFromBigQuery',
+'ReadAllFromBigQueryRequest',

Review comment:
   Does the java api use a similar concept?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430473)
Time Spent: 10.5h  (was: 10h 20m)

> Add consistent slowly changing side inputs support
> --
>
> Key: BEAM-9650
> URL: https://issues.apache.org/jira/browse/BEAM-9650
> Project: Beam
>  Issue Type: Bug
>  Components: io-ideas
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> Add implementation for slowly changing dimentions based on [design 
> doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9875) PR11585 breaks python2PostCommit cross-lang suite

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9875?focusedWorklogId=430472=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430472
 ]

ASF GitHub Bot logged work on BEAM-9875:


Author: ASF GitHub Bot
Created on: 05/May/20 00:12
Start Date: 05/May/20 00:12
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11603:
URL: https://github.com/apache/beam/pull/11603#issuecomment-623772817


   R: @ihji 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430472)
Time Spent: 1h  (was: 50m)

> PR11585 breaks python2PostCommit cross-lang suite
> -
>
> Key: BEAM-9875
> URL: https://issues.apache.org/jira/browse/BEAM-9875
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Build scan: https://scans.gradle.com/s/6s6vse2zqr5x4
> Typical error msg:
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
> "__main__", fname, loader, pkg_name)  
>   File "/usr/lib/python2.7/runpy.py", line 72, in _run_code   
> exec code in run_globals  
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2/src/sdks/python/apache_beam/examples/wordcount_xlang.py",
>  line 137, in  
> main()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2/src/sdks/python/apache_beam/examples/wordcount_xlang.py",
>  line 128, in main 
> p.runner.create_job_service(pipeline_options) 
>   File "apache_beam/runners/portability/portable_runner.py", line 328, in 
> create_job_service  
> server = self.default_job_server(options) 
>   File "apache_beam/runners/portability/portable_runner.py", line 307, in 
> default_job_server  
> 'You must specify a --job_endpoint when using --runner=PortableRunner. '  
> NotImplementedError: You must specify a --job_endpoint when using 
> --runner=PortableRunner. Alternatively, you may specify which portable runner 
> you intend to use, such as --runner=FlinkRunner or --runner=SparkRunner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9856?focusedWorklogId=430468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430468
 ]

ASF GitHub Bot logged work on BEAM-9856:


Author: ASF GitHub Bot
Created on: 04/May/20 23:57
Start Date: 04/May/20 23:57
Worklog Time Spent: 10m 
  Work Description: jaketf edited a comment on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796


   @chamikaramj thanks for the suggestion. I will look into using BoundedSource 
API in a separate PR and we can compare.
   
   Unfortunately, regular DoFns don't cut it because a single elements outputs 
are committed atomically (see this 
[conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)).
   Basically we have one input element (HL7v2 store) exploding to many, many 
output elements (all the messages in that store) in a single ProcessElement 
call. I'm trying to explore strategies for splitting up this listing.
   
   I originally chose splittable DoFn over BoundedSource based off the 
sentiment of this statement:
   > **Coding against the Source API involves a lot of boilerplate and is 
error-prone**, and it does not compose well with the rest of the Beam model 
because a Source can appear only at the root of a pipeline. - 
https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html
   
   Using splittable DoFn also made reading from multiple HL7v2 stores come for 
free (e.g. if you had several regional HL7v2 stores and your use case was to 
read from them all and write to a single multi-regional store). This admittedly 
a rather contrived use case. 
   
   The blog also mentions 
   - A Source can not emit an additional output (for example, records that 
failed to parse).
   - Healthcare customers feeding requirements for this plugin want DLQ on 
all sinks and sources. To be consistent with the streaming API provided in 
`HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I 
believe this is more of a nice to have for batch use cases (because there's no 
room for passing ListMessages bad messages IDs like there is in HL7v2IO.Read).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430468)
Time Spent: 2h  (was: 1h 50m)

> HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
> --
>
> Key: BEAM-9856
> URL: https://issues.apache.org/jira/browse/BEAM-9856
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently the List Messages API paginates through in a single ProcessElement 
> Call.
> However we could get a restriction based on createTime using Messages.List 
> filter and orderby.
>  
> This is inline with the future roadmap of  HL7v2 bulk export API becomes 
> available that should allow splitting on (e.g. create time dimension). 
> Leveraging this bulk export might be  a future optimization to explore.
>  
> This could take one of two forms:
> 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make 
> optimization the runner's problem, potentially unnecessarily complex for this 
> use case )
> 2. static splitting on some time partition e.g. finding the earliest 
> createTime and emitting a PCollection of 1 hour partitions and paginating 
> through each hour of data w/ in the time frame that the store spans, in a 
> separate ProcessElement. (easy to implement but will likely have hot keys / 
> stragglers based on "busy hours")
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9856?focusedWorklogId=430467=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430467
 ]

ASF GitHub Bot logged work on BEAM-9856:


Author: ASF GitHub Bot
Created on: 04/May/20 23:54
Start Date: 04/May/20 23:54
Worklog Time Spent: 10m 
  Work Description: jaketf edited a comment on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796


   @chamikaramj thanks for the suggestion. I will look into using BoundedSource 
API in a separate PR and we can compare.
   
   Unfortunately, regular DoFns don't cut it because a single elements outputs 
are committed atomically (see this 
[conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)).
   Basically we have one input element (HL7v2 store) exploding to many, many 
output elements (all the messages in that store) in a single ProcessElement 
call. I'm trying to explore strategies for splitting up this listing.
   
   I originally chose splittable DoFn over BoundedSource based off the 
sentiment of this statement:
   > **Coding against the Source API involves a lot of boilerplate and is 
error-prone**, and it does not compose well with the rest of the Beam model 
because a Source can appear only at the root of a pipeline. - 
https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html
   
   The blog also mentions 
   - A Source can not emit an additional output (for example, records that 
failed to parse).
   - Healthcare customers feeding requirements for this plugin want DLQ on 
all sinks and sources. To be consistent with the streaming API provided in 
`HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I 
believe this is more of a nice to have for batch use cases (because there's no 
room for passing ListMessages bad messages IDs like there is in HL7v2IO.Read).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430467)
Time Spent: 1h 50m  (was: 1h 40m)

> HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
> --
>
> Key: BEAM-9856
> URL: https://issues.apache.org/jira/browse/BEAM-9856
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently the List Messages API paginates through in a single ProcessElement 
> Call.
> However we could get a restriction based on createTime using Messages.List 
> filter and orderby.
>  
> This is inline with the future roadmap of  HL7v2 bulk export API becomes 
> available that should allow splitting on (e.g. create time dimension). 
> Leveraging this bulk export might be  a future optimization to explore.
>  
> This could take one of two forms:
> 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make 
> optimization the runner's problem, potentially unnecessarily complex for this 
> use case )
> 2. static splitting on some time partition e.g. finding the earliest 
> createTime and emitting a PCollection of 1 hour partitions and paginating 
> through each hour of data w/ in the time frame that the store spans, in a 
> separate ProcessElement. (easy to implement but will likely have hot keys / 
> stragglers based on "busy hours")
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9856?focusedWorklogId=430466=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430466
 ]

ASF GitHub Bot logged work on BEAM-9856:


Author: ASF GitHub Bot
Created on: 04/May/20 23:52
Start Date: 04/May/20 23:52
Worklog Time Spent: 10m 
  Work Description: jaketf edited a comment on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796


   @chamikaramj thanks for the suggestion. I will look into using BoundedSource 
API.
   
   Unfortunately, regular DoFns don't cut it because a single elements outputs 
are committed atomically (see this 
[conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)).
   Basically we have one input element (HL7v2 store) exploding to many, many 
output elements (all the messages in that store) in a single ProcessElement 
call. I'm trying to explore strategies for splitting up this listing.
   
   I originally chose splittable DoFn over BoundedSource based off the 
sentiment of this statement:
   > **Coding against the Source API involves a lot of boilerplate and is 
error-prone**, and it does not compose well with the rest of the Beam model 
because a Source can appear only at the root of a pipeline. - 
https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html
   
   The blog also mentions 
   - A Source can not emit an additional output (for example, records that 
failed to parse).
   - Healthcare customers feeding requirements for this plugin want DLQ on 
all sinks and sources. To be consistent with the streaming API provided in 
`HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I 
believe this is more of a nice to have for batch use cases (because there's no 
room for passing ListMessages bad messages IDs like there is in HL7v2IO.Read).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430466)
Time Spent: 1h 40m  (was: 1.5h)

> HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
> --
>
> Key: BEAM-9856
> URL: https://issues.apache.org/jira/browse/BEAM-9856
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently the List Messages API paginates through in a single ProcessElement 
> Call.
> However we could get a restriction based on createTime using Messages.List 
> filter and orderby.
>  
> This is inline with the future roadmap of  HL7v2 bulk export API becomes 
> available that should allow splitting on (e.g. create time dimension). 
> Leveraging this bulk export might be  a future optimization to explore.
>  
> This could take one of two forms:
> 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make 
> optimization the runner's problem, potentially unnecessarily complex for this 
> use case )
> 2. static splitting on some time partition e.g. finding the earliest 
> createTime and emitting a PCollection of 1 hour partitions and paginating 
> through each hour of data w/ in the time frame that the store spans, in a 
> separate ProcessElement. (easy to implement but will likely have hot keys / 
> stragglers based on "busy hours")
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9856?focusedWorklogId=430465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430465
 ]

ASF GitHub Bot logged work on BEAM-9856:


Author: ASF GitHub Bot
Created on: 04/May/20 23:49
Start Date: 04/May/20 23:49
Worklog Time Spent: 10m 
  Work Description: jaketf commented on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796


   @chamikaramj thanks for the suggestion. I will look into using BoundedSource 
API.
   
   Unfortunately, regular DoFns don't cut it because a single elements outputs 
are committed atomically (see this 
[conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)).
   Basically we have one input element (HL7v2 store) exploding to many, many 
output elements (all the messages in that store) in a single ProcessElement 
call. I'm trying to explore strategies for splitting up this listing.
   
   I originally chose splittable DoFn over BoundedSource based off the 
sentiment of this statement:
   > **Coding against the Source API involves a lot of boilerplate and is 
error-prone**, and it does not compose well with the rest of the Beam model 
because a Source can appear only at the root of a pipeline. - 
https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html
   
   The blog also mentions 
   - A Source can not emit an additional output (for example, records that 
failed to parse).
   - Healthcare customers feeding requirements for this plugin want DLQ on 
all sinks and sources. To be consistent with the streaming API provided in 
`HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I 
believe this is more of a nice to have for batch use cases.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430465)
Time Spent: 1.5h  (was: 1h 20m)

> HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
> --
>
> Key: BEAM-9856
> URL: https://issues.apache.org/jira/browse/BEAM-9856
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently the List Messages API paginates through in a single ProcessElement 
> Call.
> However we could get a restriction based on createTime using Messages.List 
> filter and orderby.
>  
> This is inline with the future roadmap of  HL7v2 bulk export API becomes 
> available that should allow splitting on (e.g. create time dimension). 
> Leveraging this bulk export might be  a future optimization to explore.
>  
> This could take one of two forms:
> 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make 
> optimization the runner's problem, potentially unnecessarily complex for this 
> use case )
> 2. static splitting on some time partition e.g. finding the earliest 
> createTime and emitting a PCollection of 1 hour partitions and paginating 
> through each hour of data w/ in the time frame that the store spans, in a 
> separate ProcessElement. (easy to implement but will likely have hot keys / 
> stragglers based on "busy hours")
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9154) Move Chicago Taxi Example to Python 3

2020-05-04 Thread Valentyn Tymofieiev (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099414#comment-17099414
 ] 

Valentyn Tymofieiev commented on BEAM-9154:
---

I see, once you have time feel free to let me know if this is still an issue 
and if it is reproducible in newer versions of TFT. Thanks.

> Move Chicago Taxi Example to Python 3
> -
>
> Key: BEAM-9154
> URL: https://issues.apache.org/jira/browse/BEAM-9154
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Kamil Wasilewski
>Priority: Major
>
> The Chicago Taxi Example[1] should be moved to the latest version of Python 
> supported by Beam (currently it's Python 3.7).
> At the moment, the following error occurs when running the benchmark on 
> Python 3.7 (requires futher investigation):
> {code:java}
> Traceback (most recent call last):
>   File "preprocess.py", line 259, in 
> main()
>   File "preprocess.py", line 254, in main
> project=known_args.metric_reporting_project
>   File "preprocess.py", line 155, in transform_data
> ('Analyze' >> tft_beam.AnalyzeDataset(preprocessing_fn)))
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 987, in __ror__
> return self.transform.__ror__(pvalueish, self.label)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 547, in __ror__
> result = p.apply(self, pvalueish, label)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 
> 532, in apply
> return self.apply(transform, pvalueish)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 
> 573, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", 
> line 223, in apply_PTransform
> return transform.expand(input)
>   File 
> "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
>  line 825, in expand
> input_metadata))
>   File 
> "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
>  line 716, in expand
> output_signature = self._preprocessing_fn(copied_inputs)
>   File "preprocess.py", line 102, in preprocessing_fn
> _fill_in_missing(inputs[key]),
> KeyError: 'company'
> {code}
> [1] sdks/python/apache_beam/testing/benchmarks/chicago_taxi



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9875) PR11585 breaks python2PostCommit cross-lang suite

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9875?focusedWorklogId=430449=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430449
 ]

ASF GitHub Bot logged work on BEAM-9875:


Author: ASF GitHub Bot
Created on: 04/May/20 23:00
Start Date: 04/May/20 23:00
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11603:
URL: https://github.com/apache/beam/pull/11603#issuecomment-623752060


   Run Python 2 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430449)
Time Spent: 40m  (was: 0.5h)

> PR11585 breaks python2PostCommit cross-lang suite
> -
>
> Key: BEAM-9875
> URL: https://issues.apache.org/jira/browse/BEAM-9875
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Build scan: https://scans.gradle.com/s/6s6vse2zqr5x4
> Typical error msg:
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
> "__main__", fname, loader, pkg_name)  
>   File "/usr/lib/python2.7/runpy.py", line 72, in _run_code   
> exec code in run_globals  
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2/src/sdks/python/apache_beam/examples/wordcount_xlang.py",
>  line 137, in  
> main()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2/src/sdks/python/apache_beam/examples/wordcount_xlang.py",
>  line 128, in main 
> p.runner.create_job_service(pipeline_options) 
>   File "apache_beam/runners/portability/portable_runner.py", line 328, in 
> create_job_service  
> server = self.default_job_server(options) 
>   File "apache_beam/runners/portability/portable_runner.py", line 307, in 
> default_job_server  
> 'You must specify a --job_endpoint when using --runner=PortableRunner. '  
> NotImplementedError: You must specify a --job_endpoint when using 
> --runner=PortableRunner. Alternatively, you may specify which portable runner 
> you intend to use, such as --runner=FlinkRunner or --runner=SparkRunner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9856?focusedWorklogId=430443=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430443
 ]

ASF GitHub Bot logged work on BEAM-9856:


Author: ASF GitHub Bot
Created on: 04/May/20 22:29
Start Date: 04/May/20 22:29
Worklog Time Spent: 10m 
  Work Description: chamikaramj edited a comment on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623741333


   Is using SplittableDoFn API here intentional ?
   
   I think this API is being updated. cc: @lukecwik 
   
   If you need dynamic work rebalancing, consider using the BoundedSource 
interface. Otherwise we can just implement the source using regular DoFns and 
wait for SplittableDoFn API to stabilize before adding support for dynamic work 
rebalancing.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430443)
Time Spent: 1h 20m  (was: 1h 10m)

> HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
> --
>
> Key: BEAM-9856
> URL: https://issues.apache.org/jira/browse/BEAM-9856
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently the List Messages API paginates through in a single ProcessElement 
> Call.
> However we could get a restriction based on createTime using Messages.List 
> filter and orderby.
>  
> This is inline with the future roadmap of  HL7v2 bulk export API becomes 
> available that should allow splitting on (e.g. create time dimension). 
> Leveraging this bulk export might be  a future optimization to explore.
>  
> This could take one of two forms:
> 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make 
> optimization the runner's problem, potentially unnecessarily complex for this 
> use case )
> 2. static splitting on some time partition e.g. finding the earliest 
> createTime and emitting a PCollection of 1 hour partitions and paginating 
> through each hour of data w/ in the time frame that the store spans, in a 
> separate ProcessElement. (easy to implement but will likely have hot keys / 
> stragglers based on "busy hours")
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9856?focusedWorklogId=430442=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430442
 ]

ASF GitHub Bot logged work on BEAM-9856:


Author: ASF GitHub Bot
Created on: 04/May/20 22:28
Start Date: 04/May/20 22:28
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623741333


   Is using SplittableDoFn API here intentional ?
   
   I think this API is being updated. cc: @lukecwik 
   
   If you need dynamic work rebalancing, consider using the BoundedSource 
interface. Otherwise we can just implement the source as a regular and wait for 
SplittableDoFn API to stabilize before adding support for dynamic work 
rebalancing.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430442)
Time Spent: 1h 10m  (was: 1h)

> HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
> --
>
> Key: BEAM-9856
> URL: https://issues.apache.org/jira/browse/BEAM-9856
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently the List Messages API paginates through in a single ProcessElement 
> Call.
> However we could get a restriction based on createTime using Messages.List 
> filter and orderby.
>  
> This is inline with the future roadmap of  HL7v2 bulk export API becomes 
> available that should allow splitting on (e.g. create time dimension). 
> Leveraging this bulk export might be  a future optimization to explore.
>  
> This could take one of two forms:
> 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make 
> optimization the runner's problem, potentially unnecessarily complex for this 
> use case )
> 2. static splitting on some time partition e.g. finding the earliest 
> createTime and emitting a PCollection of 1 hour partitions and paginating 
> through each hour of data w/ in the time frame that the store spans, in a 
> separate ProcessElement. (easy to implement but will likely have hot keys / 
> stragglers based on "busy hours")
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9885) Use artifact staging for SqlTransform rather than staging jar experiment

2020-05-04 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9885:
--
Status: Open  (was: Triage Needed)

> Use artifact staging for SqlTransform rather than staging jar experiment
> 
>
> Key: BEAM-9885
> URL: https://issues.apache.org/jira/browse/BEAM-9885
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-9885) Use artifact staging for SqlTransform rather than staging jar experiment

2020-05-04 Thread Brian Hulette (Jira)

Brian Hulette created BEAM-9885:
---

 Summary: Use artifact staging for SqlTransform rather than staging 
jar experiment
 Key: BEAM-9885
 URL: https://issues.apache.org/jira/browse/BEAM-9885
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Brian Hulette
Assignee: Brian Hulette






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9856?focusedWorklogId=430441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430441
 ]

ASF GitHub Bot logged work on BEAM-9856:


Author: ASF GitHub Bot
Created on: 04/May/20 22:19
Start Date: 04/May/20 22:19
Worklog Time Spent: 10m 
  Work Description: jaketf commented on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623737955


   > The NPE is caused by `OffsetRangeTracker.lastAttemptedOffset` being 
unboxed in `OffsetRangeTracker::checkDone` 
[here](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L98).
   > 
   > 
[all](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L53)
 
[other](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L77)
 
[uses](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L119)
 check that `lastAttemptedOffset` is not null.
   > 
   > I'm not sure if this was intentional in the implementation of 
OffsetRangeTracker.
   
   @chamikaramj pablo said you might know about this.
   
   should check done have some conditional on `lastAttemptedOffset != null`
   e.g.
   ```java
 @Override
 public void checkDone() throws IllegalStateException {
   if (range.getFrom() == range.getTo()) {
 return;
   }
   
   if (lastAttemptedOffset != null) {
 checkState(
 lastAttemptedOffset >= range.getTo() - 1,
 "Last attempted offset was %s in range %s, claiming work in [%s, 
%s) was not attempted",
 lastAttemptedOffset,
 range,
 lastAttemptedOffset + 1,
 range.getTo());
   }
 }
   ```
   
   I'm not really familiar with what checkDone should do in the case that 
lastAttemptedOffset was null.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430441)
Time Spent: 1h  (was: 50m)

> HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
> --
>
> Key: BEAM-9856
> URL: https://issues.apache.org/jira/browse/BEAM-9856
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently the List Messages API paginates through in a single ProcessElement 
> Call.
> However we could get a restriction based on createTime using Messages.List 
> filter and orderby.
>  
> This is inline with the future roadmap of  HL7v2 bulk export API becomes 
> available that should allow splitting on (e.g. create time dimension). 
> Leveraging this bulk export might be  a future optimization to explore.
>  
> This could take one of two forms:
> 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make 
> optimization the runner's problem, potentially unnecessarily complex for this 
> use case )
> 2. static splitting on some time partition e.g. finding the earliest 
> createTime and emitting a PCollection of 1 hour partitions and paginating 
> through each hour of data w/ in the time frame that the store spans, in a 
> separate ProcessElement. (easy to implement but will likely have hot keys / 
> stragglers based on "busy hours")
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9875) PR11585 breaks python2PostCommit cross-lang suite

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9875?focusedWorklogId=430439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430439
 ]

ASF GitHub Bot logged work on BEAM-9875:


Author: ASF GitHub Bot
Created on: 04/May/20 22:14
Start Date: 04/May/20 22:14
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11603:
URL: https://github.com/apache/beam/pull/11603#issuecomment-623736044


   Run Python 2 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430439)
Time Spent: 0.5h  (was: 20m)

> PR11585 breaks python2PostCommit cross-lang suite
> -
>
> Key: BEAM-9875
> URL: https://issues.apache.org/jira/browse/BEAM-9875
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Build scan: https://scans.gradle.com/s/6s6vse2zqr5x4
> Typical error msg:
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
> "__main__", fname, loader, pkg_name)  
>   File "/usr/lib/python2.7/runpy.py", line 72, in _run_code   
> exec code in run_globals  
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2/src/sdks/python/apache_beam/examples/wordcount_xlang.py",
>  line 137, in  
> main()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2/src/sdks/python/apache_beam/examples/wordcount_xlang.py",
>  line 128, in main 
> p.runner.create_job_service(pipeline_options) 
>   File "apache_beam/runners/portability/portable_runner.py", line 328, in 
> create_job_service  
> server = self.default_job_server(options) 
>   File "apache_beam/runners/portability/portable_runner.py", line 307, in 
> default_job_server  
> 'You must specify a --job_endpoint when using --runner=PortableRunner. '  
> NotImplementedError: You must specify a --job_endpoint when using 
> --runner=PortableRunner. Alternatively, you may specify which portable runner 
> you intend to use, such as --runner=FlinkRunner or --runner=SparkRunner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9856?focusedWorklogId=430438=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430438
 ]

ASF GitHub Bot logged work on BEAM-9856:


Author: ASF GitHub Bot
Created on: 04/May/20 22:06
Start Date: 04/May/20 22:06
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623733179


   @chamikaramj can you take a look at this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430438)
Time Spent: 50m  (was: 40m)

> HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
> --
>
> Key: BEAM-9856
> URL: https://issues.apache.org/jira/browse/BEAM-9856
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently the List Messages API paginates through in a single ProcessElement 
> Call.
> However we could get a restriction based on createTime using Messages.List 
> filter and orderby.
>  
> This is inline with the future roadmap of  HL7v2 bulk export API becomes 
> available that should allow splitting on (e.g. create time dimension). 
> Leveraging this bulk export might be  a future optimization to explore.
>  
> This could take one of two forms:
> 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make 
> optimization the runner's problem, potentially unnecessarily complex for this 
> use case )
> 2. static splitting on some time partition e.g. finding the earliest 
> createTime and emitting a PCollection of 1 hour partitions and paginating 
> through each hour of data w/ in the time frame that the store spans, in a 
> separate ProcessElement. (easy to implement but will likely have hot keys / 
> stragglers based on "busy hours")
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9861) BigQueryStorageStreamSource fails with split fractions of 0.0 or 1.0

2020-05-04 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9861:
---
Status: Open  (was: Triage Needed)

> BigQueryStorageStreamSource fails with split fractions of 0.0 or 1.0
> 
>
> Key: BEAM-9861
> URL: https://issues.apache.org/jira/browse/BEAM-9861
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Kenneth Jung
>Assignee: Kenneth Jung
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9862) test_multimap_multiside_input breaks Spark python VR test

2020-05-04 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9862:
---
Status: Open  (was: Triage Needed)

> test_multimap_multiside_input breaks Spark python VR test
> -
>
> Key: BEAM-9862
> URL: https://issues.apache.org/jira/browse/BEAM-9862
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Pablo Estrada
>Priority: Major
>
> ==
> ERROR: test_multimap_multiside_input (__main__.SparkRunnerTest)   
> --
> Traceback (most recent call last):
>   File "apache_beam/runners/portability/fn_api_runner/fn_runner_test.py", 
> line 265, in test_multimap_multiside_input  
> equal_to([('a', [1, 3], [1, 2, 3]), ('b', [2], [1, 2, 3])]))  
>   File "apache_beam/pipeline.py", line 543, in __exit__   
> self.run().wait_until_finish()
>   File "apache_beam/runners/portability/portable_runner.py", line 571, in 
> wait_until_finish   
> (self._job_id, self._state, self._last_error_message()))  
> RuntimeError: Pipeline 
> test_multimap_multiside_input_1588248535.23_9fee5456-d12d-451e-b112-a022d1c6ce1f
>  failed in state FAILED: java.lang.IllegalArgumentException: Multiple entries 
> with same key: 
> ref_PCollection_PCollection_21=(Broadcast(37),WindowedValue$FullWindowedValueCoder(KvCoder(ByteArrayCoder,VarLongCoder),GlobalWindow$Coder))
>  and 
> ref_PCollection_PCollection_21=(Broadcast(36),WindowedValue$FullWindowedValueCoder(KvCoder(ByteArrayCoder,VarLongCoder),GlobalWindow$Coder))
> Link: https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/3099/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization

2020-05-04 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9856:
---
Status: Open  (was: Triage Needed)

> HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
> --
>
> Key: BEAM-9856
> URL: https://issues.apache.org/jira/browse/BEAM-9856
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently the List Messages API paginates through in a single ProcessElement 
> Call.
> However we could get a restriction based on createTime using Messages.List 
> filter and orderby.
>  
> This is inline with the future roadmap of  HL7v2 bulk export API becomes 
> available that should allow splitting on (e.g. create time dimension). 
> Leveraging this bulk export might be  a future optimization to explore.
>  
> This could take one of two forms:
> 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make 
> optimization the runner's problem, potentially unnecessarily complex for this 
> use case )
> 2. static splitting on some time partition e.g. finding the earliest 
> createTime and emitting a PCollection of 1 hour partitions and paginating 
> through each hour of data w/ in the time frame that the store spans, in a 
> separate ProcessElement. (easy to implement but will likely have hot keys / 
> stragglers based on "busy hours")
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8458) BigQueryIO.Read needs permissions to create datasets to be able to run queries

2020-05-04 Thread Israel Herraiz (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Israel Herraiz updated BEAM-8458:
-
Fix Version/s: (was: 2.20.0)
   2.21.0

> BigQueryIO.Read needs permissions to create datasets to be able to run queries
> --
>
> Key: BEAM-8458
> URL: https://issues.apache.org/jira/browse/BEAM-8458
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Israel Herraiz
>Assignee: Israel Herraiz
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> When using {{fromQuery}}, BigQueryIO creates a temp dataset to store the 
> results of the query.
> Therefore, Beam requires permissions to create datasets just to be able to 
> run a query. In practice, this means that Beam requires the role 
> bigQuery.User just to run queries, whereas if you use {{from}} (to read from 
> a table), the role bigQuery.jobUser suffices.
> BigQueryIO.Read should have an option to set an existing dataset  to write 
> the temp results of
>  a query, so it would be enough with having the role bigQuery.jobUser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9880) touch: build/target/third_party_licenses/skip: No such file or directory

2020-05-04 Thread Hannah Jiang (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099365#comment-17099365
 ] 

Hannah Jiang commented on BEAM-9880:


Created a PR, that should solve the issue. 
[https://github.com/apache/beam/pull/11606]

> touch: build/target/third_party_licenses/skip: No such file or directory
> 
>
> Key: BEAM-9880
> URL: https://issues.apache.org/jira/browse/BEAM-9880
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Kyle Weaver
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When I run ./gradlew 
> :sdks:python:test-suites:portable:py2:crossLanguageTests, I get the following 
> error:
> > Task :sdks:java:container:createFile FAILED
> touch: build/target/third_party_licenses/skip: No such file or directory
> When I do `ls build`, the only thing it outputs is `gradleenv`. So it looks 
> like it's assuming the directory exists, when it might not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (BEAM-9880) touch: build/target/third_party_licenses/skip: No such file or directory

2020-05-04 Thread Hannah Jiang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-9880 started by Hannah Jiang.
--
> touch: build/target/third_party_licenses/skip: No such file or directory
> 
>
> Key: BEAM-9880
> URL: https://issues.apache.org/jira/browse/BEAM-9880
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Kyle Weaver
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When I run ./gradlew 
> :sdks:python:test-suites:portable:py2:crossLanguageTests, I get the following 
> error:
> > Task :sdks:java:container:createFile FAILED
> touch: build/target/third_party_licenses/skip: No such file or directory
> When I do `ls build`, the only thing it outputs is `gradleenv`. So it looks 
> like it's assuming the directory exists, when it might not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9880) touch: build/target/third_party_licenses/skip: No such file or directory

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9880?focusedWorklogId=430430=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430430
 ]

ASF GitHub Bot logged work on BEAM-9880:


Author: ASF GitHub Bot
Created on: 04/May/20 21:35
Start Date: 04/May/20 21:35
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang opened a new pull request #11606:
URL: https://github.com/apache/beam/pull/11606


   R: @ibzib 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build

[jira] [Work logged] (BEAM-9799) Add automated RTracker validation to Go SDF

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9799?focusedWorklogId=430429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430429
 ]

ASF GitHub Bot logged work on BEAM-9799:


Author: ASF GitHub Bot
Created on: 04/May/20 21:34
Start Date: 04/May/20 21:34
Worklog Time Spent: 10m 
  Work Description: youngoli commented on a change in pull request #11553:
URL: https://github.com/apache/beam/pull/11553#discussion_r419743023



##
File path: sdks/go/pkg/beam/core/runtime/exec/pardo.go
##
@@ -162,6 +168,14 @@ func (n *ParDo) processMainInput(mainIn *MainInput) error {
return nil
 }
 
+func rtErrHelper(err error) error {
+   if err != nil {
+   return err
+   } else {
+   return errors.New("RTracker IsDone failed for unspecified 
reason")
+   }

Review comment:
   Yeah good point. I'll rephrase it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430429)
Time Spent: 40m  (was: 0.5h)

> Add automated RTracker validation to Go SDF
> ---
>
> Key: BEAM-9799
> URL: https://issues.apache.org/jira/browse/BEAM-9799
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> After finishing executing an SDF we should be validating the restriction 
> trackers to make sure they succeeded without errors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8603) Add Python SqlTransform MVP

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8603?focusedWorklogId=430423=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430423
 ]

ASF GitHub Bot logged work on BEAM-8603:


Author: ASF GitHub Bot
Created on: 04/May/20 21:25
Start Date: 04/May/20 21:25
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10055:
URL: https://github.com/apache/beam/pull/10055#issuecomment-623714745


   yoohooo



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430423)
Time Spent: 12h 40m  (was: 12.5h)

> Add Python SqlTransform MVP
> ---
>
> Key: BEAM-8603
> URL: https://issues.apache.org/jira/browse/BEAM-8603
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9449) Consider passing pipeline options for expansion service.

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9449?focusedWorklogId=430420=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430420
 ]

ASF GitHub Bot logged work on BEAM-9449:


Author: ASF GitHub Bot
Created on: 04/May/20 21:23
Start Date: 04/May/20 21:23
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #11574:
URL: https://github.com/apache/beam/pull/11574#issuecomment-623713457


   Synced this up now that #11571 is merged. This is ready for review now.
   
   @ihji do you have time to review?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430420)
Time Spent: 0.5h  (was: 20m)

> Consider passing pipeline options for expansion service.
> 
>
> Key: BEAM-9449
> URL: https://issues.apache.org/jira/browse/BEAM-9449
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9883) Generalize SDF-validating restrictions

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9883?focusedWorklogId=430421=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430421
 ]

ASF GitHub Bot logged work on BEAM-9883:


Author: ASF GitHub Bot
Created on: 04/May/20 21:23
Start Date: 04/May/20 21:23
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #11605:
URL: https://github.com/apache/beam/pull/11605#issuecomment-623713567


   R: @lostluck 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430421)
Time Spent: 20m  (was: 10m)

> Generalize SDF-validating restrictions
> --
>
> Key: BEAM-9883
> URL: https://issues.apache.org/jira/browse/BEAM-9883
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have some restrictions written for the purpose of validating that SDFs 
> work in sdf_invokers_test.go, but they can be improved and generalized. The 
> main improvement is changing the validation approach so that the restriction 
> keeps track of each method it's had called on it. Then this can be 
> generalized so that it can be used in upcoming integration tests to validate 
> which methods are being called.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9883) Generalize SDF-validating restrictions

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9883?focusedWorklogId=430418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430418
 ]

ASF GitHub Bot logged work on BEAM-9883:


Author: ASF GitHub Bot
Created on: 04/May/20 21:12
Start Date: 04/May/20 21:12
Worklog Time Spent: 10m 
  Work Description: youngoli opened a new pull request #11605:
URL: https://github.com/apache/beam/pull/11605


   Refactoring the restriction used for testing SDFs. Instead of having
   some obtuse behavior that we can validate, it now just contains a bunch
   of flags we can flip to track that it was used in each method.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build

[jira] [Commented] (BEAM-9880) touch: build/target/third_party_licenses/skip: No such file or directory

2020-05-04 Thread Kyle Weaver (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099353#comment-17099353
 ] 

Kyle Weaver commented on BEAM-9880:
---

I am running all these commands from the Beam root directory by the way.

> touch: build/target/third_party_licenses/skip: No such file or directory
> 
>
> Key: BEAM-9880
> URL: https://issues.apache.org/jira/browse/BEAM-9880
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Kyle Weaver
>Assignee: Hannah Jiang
>Priority: Major
>
> When I run ./gradlew 
> :sdks:python:test-suites:portable:py2:crossLanguageTests, I get the following 
> error:
> > Task :sdks:java:container:createFile FAILED
> touch: build/target/third_party_licenses/skip: No such file or directory
> When I do `ls build`, the only thing it outputs is `gradleenv`. So it looks 
> like it's assuming the directory exists, when it might not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9880) touch: build/target/third_party_licenses/skip: No such file or directory

2020-05-04 Thread Kyle Weaver (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-9880:
--
Status: Open  (was: Triage Needed)

> touch: build/target/third_party_licenses/skip: No such file or directory
> 
>
> Key: BEAM-9880
> URL: https://issues.apache.org/jira/browse/BEAM-9880
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Kyle Weaver
>Assignee: Hannah Jiang
>Priority: Major
>
> When I run ./gradlew 
> :sdks:python:test-suites:portable:py2:crossLanguageTests, I get the following 
> error:
> > Task :sdks:java:container:createFile FAILED
> touch: build/target/third_party_licenses/skip: No such file or directory
> When I do `ls build`, the only thing it outputs is `gradleenv`. So it looks 
> like it's assuming the directory exists, when it might not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9880) touch: build/target/third_party_licenses/skip: No such file or directory

2020-05-04 Thread Kyle Weaver (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099346#comment-17099346
 ] 

Kyle Weaver commented on BEAM-9880:
---

Same error happens when I do ./gradlew :sdks:java:container:docker. (I'm not 
using docker-pull-licenses in the case since I'm just trying to run some tests 
locally.)

Can you try doing `./gradlew clean && ./gradlew :sdks:java:container:docker`?

> touch: build/target/third_party_licenses/skip: No such file or directory
> 
>
> Key: BEAM-9880
> URL: https://issues.apache.org/jira/browse/BEAM-9880
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Kyle Weaver
>Assignee: Hannah Jiang
>Priority: Major
>
> When I run ./gradlew 
> :sdks:python:test-suites:portable:py2:crossLanguageTests, I get the following 
> error:
> > Task :sdks:java:container:createFile FAILED
> touch: build/target/third_party_licenses/skip: No such file or directory
> When I do `ls build`, the only thing it outputs is `gradleenv`. So it looks 
> like it's assuming the directory exists, when it might not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9884) Add local option for configuring SqlTransform planner

2020-05-04 Thread Brian Hulette (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-9884:

Summary: Add local option for configuring SqlTransform planner  (was: Add 
local option for configuraing SqlTransform planner)

> Add local option for configuring SqlTransform planner
> -
>
> Key: BEAM-9884
> URL: https://issues.apache.org/jira/browse/BEAM-9884
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Priority: Minor
>
> Currently it's only possible to change the SqlTransform planner globally via 
> pipeline options. It should be possible to configure the planner per 
> transform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-9884) Add local option for configuraing SqlTransform planner

2020-05-04 Thread Brian Hulette (Jira)

Brian Hulette created BEAM-9884:
---

 Summary: Add local option for configuraing SqlTransform planner
 Key: BEAM-9884
 URL: https://issues.apache.org/jira/browse/BEAM-9884
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Brian Hulette


Currently it's only possible to change the SqlTransform planner globally via 
pipeline options. It should be possible to configure the planner per transform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9880) touch: build/target/third_party_licenses/skip: No such file or directory

2020-05-04 Thread Hannah Jiang (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099344#comment-17099344
 ] 

Hannah Jiang commented on BEAM-9880:


The directly should be created as of 
[https://github.com/apache/beam/blob/master/sdks/java/container/build.gradle#L110]

I cannot reproduce it. Can you try to create a java image only?

By the way, you should use docker-pull-licenses or isRelease (hasn't been 
merged yet) to pull licenses.

 

> touch: build/target/third_party_licenses/skip: No such file or directory
> 
>
> Key: BEAM-9880
> URL: https://issues.apache.org/jira/browse/BEAM-9880
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Kyle Weaver
>Assignee: Hannah Jiang
>Priority: Major
>
> When I run ./gradlew 
> :sdks:python:test-suites:portable:py2:crossLanguageTests, I get the following 
> error:
> > Task :sdks:java:container:createFile FAILED
> touch: build/target/third_party_licenses/skip: No such file or directory
> When I do `ls build`, the only thing it outputs is `gradleenv`. So it looks 
> like it's assuming the directory exists, when it might not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9883) Generalize SDF-validating restrictions

2020-05-04 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9883:
--
Status: Open  (was: Triage Needed)

> Generalize SDF-validating restrictions
> --
>
> Key: BEAM-9883
> URL: https://issues.apache.org/jira/browse/BEAM-9883
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>
> We have some restrictions written for the purpose of validating that SDFs 
> work in sdf_invokers_test.go, but they can be improved and generalized. The 
> main improvement is changing the validation approach so that the restriction 
> keeps track of each method it's had called on it. Then this can be 
> generalized so that it can be used in upcoming integration tests to validate 
> which methods are being called.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9882) Go SplittableDoFn testing

2020-05-04 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9882:
--
Status: Open  (was: Triage Needed)

> Go SplittableDoFn testing
> -
>
> Key: BEAM-9882
> URL: https://issues.apache.org/jira/browse/BEAM-9882
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>
> This is a catch-all jira for all tasks needed to fully test SplittableDoFns 
> in Go. For progress on SplittableDoFns themselves, see BEAM-3301



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-9882) Go SplittableDoFn testing

2020-05-04 Thread Daniel Oliveira (Jira)

Daniel Oliveira created BEAM-9882:
-

 Summary: Go SplittableDoFn testing
 Key: BEAM-9882
 URL: https://issues.apache.org/jira/browse/BEAM-9882
 Project: Beam
  Issue Type: Improvement
  Components: sdk-go
Reporter: Daniel Oliveira
Assignee: Daniel Oliveira


This is a catch-all jira for all tasks needed to fully test SplittableDoFns in 
Go. For progress on SplittableDoFns themselves, see BEAM-3301



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9880) touch: build/target/third_party_licenses/skip: No such file or directory

2020-05-04 Thread Kyle Weaver (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099333#comment-17099333
 ] 

Kyle Weaver commented on BEAM-9880:
---

This happened on head. sdks/java/container/build/target/third_party_licenses 
does not exist.

$ ls sdks/java/container/build/target/
LICENSE NOTICE


> touch: build/target/third_party_licenses/skip: No such file or directory
> 
>
> Key: BEAM-9880
> URL: https://issues.apache.org/jira/browse/BEAM-9880
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Kyle Weaver
>Assignee: Hannah Jiang
>Priority: Major
>
> When I run ./gradlew 
> :sdks:python:test-suites:portable:py2:crossLanguageTests, I get the following 
> error:
> > Task :sdks:java:container:createFile FAILED
> touch: build/target/third_party_licenses/skip: No such file or directory
> When I do `ls build`, the only thing it outputs is `gradleenv`. So it looks 
> like it's assuming the directory exists, when it might not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9881) refactor KafkaIOIT for reusing pipeline

2020-05-04 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9881:
--
Status: Open  (was: Triage Needed)

> refactor KafkaIOIT for reusing pipeline
> ---
>
> Key: BEAM-9881
> URL: https://issues.apache.org/jira/browse/BEAM-9881
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-9877) Eager size estimation of large group-by-key iterables cause expensive / duplicate reads

2020-05-04 Thread Kyle Weaver (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-9877.
---
Fix Version/s: 2.21.0
   Resolution: Fixed

> Eager size estimation of large group-by-key iterables cause expensive / 
> duplicate reads
> ---
>
> Key: BEAM-9877
> URL: https://issues.apache.org/jira/browse/BEAM-9877
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Tudor Marian
>Assignee: Tudor Marian
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> BatchGroupAlsoByWindowViaIteratorsFn WindowReiterator can cause very 
> expensive duplicate reading of data from a (Co-)GroupByKey for hot keys with 
> many values due to PCollection size estimation.  Instead, it should perform 
> lazy estimation like GroupingShuffleReader and GroupingShuffleEntryInterator 
> (or perform no estimation at all).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9877) Eager size estimation of large group-by-key iterables cause expensive / duplicate reads

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9877?focusedWorklogId=430399=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430399
 ]

ASF GitHub Bot logged work on BEAM-9877:


Author: ASF GitHub Bot
Created on: 04/May/20 20:20
Start Date: 04/May/20 20:20
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11601:
URL: https://github.com/apache/beam/pull/11601#issuecomment-623684159


   Java test failing due to BEAM-9164 (known flake).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430399)
Time Spent: 1h  (was: 50m)

> Eager size estimation of large group-by-key iterables cause expensive / 
> duplicate reads
> ---
>
> Key: BEAM-9877
> URL: https://issues.apache.org/jira/browse/BEAM-9877
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Tudor Marian
>Assignee: Tudor Marian
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> BatchGroupAlsoByWindowViaIteratorsFn WindowReiterator can cause very 
> expensive duplicate reading of data from a (Co-)GroupByKey for hot keys with 
> many values due to PCollection size estimation.  Instead, it should perform 
> lazy estimation like GroupingShuffleReader and GroupingShuffleEntryInterator 
> (or perform no estimation at all).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (BEAM-9843) Flink UnboundedSourceWrapperTest flaky due to a timeout

2020-05-04 Thread Kyle Weaver (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver closed BEAM-9843.
-
Fix Version/s: Not applicable
   Resolution: Duplicate

> Flink UnboundedSourceWrapperTest flaky due to a timeout
> ---
>
> Key: BEAM-9843
> URL: https://issues.apache.org/jira/browse/BEAM-9843
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Chamikara Madhusanka Jayalath
>Priority: Major
> Fix For: Not applicable
>
>
> For example,
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2685/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2684/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2682/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2680/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2685/testReport/junit/org.apache.beam.runners.flink.translation.wrappers.streaming.io/UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest/testWatermarkEmission_numTasks___4__numSplits_4_/]
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds at sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at 
> org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest.testWatermarkEmission(UnboundedSourceWrapperTest.java:354)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9880) touch: build/target/third_party_licenses/skip: No such file or directory

2020-05-04 Thread Hannah Jiang (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099324#comment-17099324
 ] 

Hannah Jiang commented on BEAM-9880:


Did you pull from the head and still see the issue? 
Can you check if sdks/java/container/build/target/third_party_licenses exists?

> touch: build/target/third_party_licenses/skip: No such file or directory
> 
>
> Key: BEAM-9880
> URL: https://issues.apache.org/jira/browse/BEAM-9880
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Kyle Weaver
>Assignee: Hannah Jiang
>Priority: Major
>
> When I run ./gradlew 
> :sdks:python:test-suites:portable:py2:crossLanguageTests, I get the following 
> error:
> > Task :sdks:java:container:createFile FAILED
> touch: build/target/third_party_licenses/skip: No such file or directory
> When I do `ls build`, the only thing it outputs is `gradleenv`. So it looks 
> like it's assuming the directory exists, when it might not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9164) [PreCommit_Java] [Flake] UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest

2020-05-04 Thread Kyle Weaver (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-9164:
--
Issue Type: Bug  (was: Improvement)

> [PreCommit_Java] [Flake] 
> UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest
> ---
>
> Key: BEAM-9164
> URL: https://issues.apache.org/jira/browse/BEAM-9164
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kirill Kozlov
>Priority: Major
>  Labels: flake
>
> Test:
> org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest
>  >> testWatermarkEmission[numTasks = 1; numSplits=1]
> Fails with the following exception:
> {code:java}
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds{code}
> Affected Jenkins job: 
> [https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1665/]
> Gradle build scan: [https://scans.gradle.com/s/nvgeb425fe63q]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9659) ArrayScanToJoinConverter ResolvedGetStructField cast to ResolvedColumnRef

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9659?focusedWorklogId=430392=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430392
 ]

ASF GitHub Bot logged work on BEAM-9659:


Author: ASF GitHub Bot
Created on: 04/May/20 20:10
Start Date: 04/May/20 20:10
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #11604:
URL: https://github.com/apache/beam/pull/11604#issuecomment-623679534


   R: @robinyqiu 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430392)
Time Spent: 20m  (was: 10m)

> ArrayScanToJoinConverter ResolvedGetStructField cast to ResolvedColumnRef
> -
>
> Key: BEAM-9659
> URL: https://issues.apache.org/jira/browse/BEAM-9659
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Critical
>  Labels: zetasql-compliance
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> two failures in shard 19
> {code:java}
> java.lang.ClassCastException: 
> com.google.zetasql.resolvedast.ResolvedNodes$ResolvedGetStructField cannot be 
> cast to com.google.zetasql.resolvedast.ResolvedNodes$ResolvedColumnRef
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:62)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:37)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:97)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at java.util.Collections$2.tryAdvance(Collections.java:4717)
>   at java.util.Collections$2.forEachRemaining(Collections.java:4725)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:96)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convert(QueryStatementConverter.java:84)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertRootQuery(QueryStatementConverter.java:51)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:160)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:131)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:115)
>   at 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl.executeQuery(ExecuteQueryServiceServer.java:242)
>   at 
> com.google.zetasql.testing.SqlComplianceServiceGrpc$MethodHandlers.invoke(SqlComplianceServiceGrpc.java:423)
>   at 
> com.google.zetasql.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
>   at 
> com.google.zetasql.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
>   at 
> com.google.zetasql.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711)
>   at 
> com.google.zetasql.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>   at 
> com.google.zetasql.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> {code:java}
> Apr 01, 2020 11:48:18 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT b, x
> FROM UNNEST(
>[STRUCT(true AS b, [3, 5] AS arr), STRUCT(false AS b, [7, 9] AS

[jira] [Work logged] (BEAM-9659) ArrayScanToJoinConverter ResolvedGetStructField cast to ResolvedColumnRef

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9659?focusedWorklogId=430391=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430391
 ]

ASF GitHub Bot logged work on BEAM-9659:


Author: ASF GitHub Bot
Created on: 04/May/20 20:10
Start Date: 04/May/20 20:10
Worklog Time Spent: 10m 
  Work Description: apilloud opened a new pull request #11604:
URL: https://github.com/apache/beam/pull/11604


   There is no trivial path to fixing these, so just ensure they return a 
sensible error for now.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build

[jira] [Assigned] (BEAM-9664) ArrayScanToJoinConverter ResolvedSubqueryExpr cast to ResolvedColumnRef

2020-05-04 Thread Andrew Pilloud (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Pilloud reassigned BEAM-9664:


Assignee: Andrew Pilloud

> ArrayScanToJoinConverter ResolvedSubqueryExpr cast to ResolvedColumnRef
> ---
>
> Key: BEAM-9664
> URL: https://issues.apache.org/jira/browse/BEAM-9664
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Critical
>  Labels: zetasql-compliance
>
> one failure in shard 49
> {code}
> java.lang.ClassCastException: 
> com.google.zetasql.resolvedast.ResolvedNodes$ResolvedSubqueryExpr cannot be 
> cast to com.google.zetasql.resolvedast.ResolvedNodes$ResolvedColumnRef
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:62)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:37)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:97)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at java.util.Collections$2.tryAdvance(Collections.java:4717)
>   at java.util.Collections$2.forEachRemaining(Collections.java:4725)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:96)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convert(QueryStatementConverter.java:84)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertRootQuery(QueryStatementConverter.java:51)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:160)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:131)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:115)
>   at 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl.executeQuery(ExecuteQueryServiceServer.java:242)
>   at 
> com.google.zetasql.testing.SqlComplianceServiceGrpc$MethodHandlers.invoke(SqlComplianceServiceGrpc.java:423)
>   at 
> com.google.zetasql.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
>   at 
> com.google.zetasql.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
>   at 
> com.google.zetasql.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711)
>   at 
> com.google.zetasql.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>   at 
> com.google.zetasql.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> {code}
> Apr 01, 2020 11:48:49 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT a, b
> FROM UNNEST([1, 2, 3]) a
>   JOIN UNNEST(ARRAY(SELECT b FROM UNNEST([3, 2, 1]) b)) b
> ON a = b;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9875) PR11585 breaks python2PostCommit cross-lang suite

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9875?focusedWorklogId=430388=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430388
 ]

ASF GitHub Bot logged work on BEAM-9875:


Author: ASF GitHub Bot
Created on: 04/May/20 20:07
Start Date: 04/May/20 20:07
Worklog Time Spent: 10m 
  Work Description: ibzib opened a new pull request #11603:
URL: https://github.com/apache/beam/pull/11603


   …ge tests.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build

[jira] [Work logged] (BEAM-9875) PR11585 breaks python2PostCommit cross-lang suite

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9875?focusedWorklogId=430389=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430389
 ]

ASF GitHub Bot logged work on BEAM-9875:


Author: ASF GitHub Bot
Created on: 04/May/20 20:07
Start Date: 04/May/20 20:07
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11603:
URL: https://github.com/apache/beam/pull/11603#issuecomment-623678292


   Run Python 2 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430389)
Time Spent: 20m  (was: 10m)

> PR11585 breaks python2PostCommit cross-lang suite
> -
>
> Key: BEAM-9875
> URL: https://issues.apache.org/jira/browse/BEAM-9875
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Build scan: https://scans.gradle.com/s/6s6vse2zqr5x4
> Typical error msg:
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
> "__main__", fname, loader, pkg_name)  
>   File "/usr/lib/python2.7/runpy.py", line 72, in _run_code   
> exec code in run_globals  
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2/src/sdks/python/apache_beam/examples/wordcount_xlang.py",
>  line 137, in  
> main()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2/src/sdks/python/apache_beam/examples/wordcount_xlang.py",
>  line 128, in main 
> p.runner.create_job_service(pipeline_options) 
>   File "apache_beam/runners/portability/portable_runner.py", line 328, in 
> create_job_service  
> server = self.default_job_server(options) 
>   File "apache_beam/runners/portability/portable_runner.py", line 307, in 
> default_job_server  
> 'You must specify a --job_endpoint when using --runner=PortableRunner. '  
> NotImplementedError: You must specify a --job_endpoint when using 
> --runner=PortableRunner. Alternatively, you may specify which portable runner 
> you intend to use, such as --runner=FlinkRunner or --runner=SparkRunner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (BEAM-9659) ArrayScanToJoinConverter ResolvedGetStructField cast to ResolvedColumnRef

2020-05-04 Thread Andrew Pilloud (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-9659 started by Andrew Pilloud.

> ArrayScanToJoinConverter ResolvedGetStructField cast to ResolvedColumnRef
> -
>
> Key: BEAM-9659
> URL: https://issues.apache.org/jira/browse/BEAM-9659
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Critical
>  Labels: zetasql-compliance
>
> two failures in shard 19
> {code:java}
> java.lang.ClassCastException: 
> com.google.zetasql.resolvedast.ResolvedNodes$ResolvedGetStructField cannot be 
> cast to com.google.zetasql.resolvedast.ResolvedNodes$ResolvedColumnRef
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:62)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:37)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:97)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at java.util.Collections$2.tryAdvance(Collections.java:4717)
>   at java.util.Collections$2.forEachRemaining(Collections.java:4725)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:96)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convert(QueryStatementConverter.java:84)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertRootQuery(QueryStatementConverter.java:51)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:160)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:131)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:115)
>   at 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl.executeQuery(ExecuteQueryServiceServer.java:242)
>   at 
> com.google.zetasql.testing.SqlComplianceServiceGrpc$MethodHandlers.invoke(SqlComplianceServiceGrpc.java:423)
>   at 
> com.google.zetasql.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
>   at 
> com.google.zetasql.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
>   at 
> com.google.zetasql.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711)
>   at 
> com.google.zetasql.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>   at 
> com.google.zetasql.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> {code:java}
> Apr 01, 2020 11:48:18 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT b, x
> FROM UNNEST(
>[STRUCT(true AS b, [3, 5] AS arr), STRUCT(false AS b, [7, 9] AS arr)]) 
> t
> LEFT JOIN UNNEST(t.arr) x ON b
> Apr 01, 2020 11:48:18 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT b, x, p
> FROM UNNEST(
>[STRUCT(true AS b, [3, 5] AS arr), STRUCT(false AS b, [7, 9] AS arr)]) 
> t
> LEFT JOIN UNNEST(t.arr) x WITH OFFSET p ON b
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-9659) ArrayScanToJoinConverter ResolvedGetStructField cast to ResolvedColumnRef

2020-05-04 Thread Andrew Pilloud (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Pilloud reassigned BEAM-9659:


Assignee: Andrew Pilloud

> ArrayScanToJoinConverter ResolvedGetStructField cast to ResolvedColumnRef
> -
>
> Key: BEAM-9659
> URL: https://issues.apache.org/jira/browse/BEAM-9659
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Critical
>  Labels: zetasql-compliance
>
> two failures in shard 19
> {code:java}
> java.lang.ClassCastException: 
> com.google.zetasql.resolvedast.ResolvedNodes$ResolvedGetStructField cannot be 
> cast to com.google.zetasql.resolvedast.ResolvedNodes$ResolvedColumnRef
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:62)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:37)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:97)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at java.util.Collections$2.tryAdvance(Collections.java:4717)
>   at java.util.Collections$2.forEachRemaining(Collections.java:4725)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:96)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convert(QueryStatementConverter.java:84)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertRootQuery(QueryStatementConverter.java:51)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:160)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:131)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:115)
>   at 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl.executeQuery(ExecuteQueryServiceServer.java:242)
>   at 
> com.google.zetasql.testing.SqlComplianceServiceGrpc$MethodHandlers.invoke(SqlComplianceServiceGrpc.java:423)
>   at 
> com.google.zetasql.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
>   at 
> com.google.zetasql.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
>   at 
> com.google.zetasql.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711)
>   at 
> com.google.zetasql.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>   at 
> com.google.zetasql.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> {code:java}
> Apr 01, 2020 11:48:18 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT b, x
> FROM UNNEST(
>[STRUCT(true AS b, [3, 5] AS arr), STRUCT(false AS b, [7, 9] AS arr)]) 
> t
> LEFT JOIN UNNEST(t.arr) x ON b
> Apr 01, 2020 11:48:18 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT b, x, p
> FROM UNNEST(
>[STRUCT(true AS b, [3, 5] AS arr), STRUCT(false AS b, [7, 9] AS arr)]) 
> t
> LEFT JOIN UNNEST(t.arr) x WITH OFFSET p ON b
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-9657) ArrayScanToJoinConverter ResolvedLiteral cast to ResolvedColumnRef

2020-05-04 Thread Andrew Pilloud (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Pilloud reassigned BEAM-9657:


Assignee: Andrew Pilloud

> ArrayScanToJoinConverter ResolvedLiteral cast to ResolvedColumnRef
> --
>
> Key: BEAM-9657
> URL: https://issues.apache.org/jira/browse/BEAM-9657
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Critical
>  Labels: zetasql-compliance
>
> 5 failures in shard 2
> {code:java}
> Apr 01, 2020 11:48:47 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> SEVERE:  com.google.zetasql.resolvedast.ResolvedNodes$ResolvedLiteral 
> cannot be cast to 
> com.google.zetasql.resolvedast.ResolvedNodes$ResolvedColumnRef
> java.lang.ClassCastException: 
> com.google.zetasql.resolvedast.ResolvedNodes$ResolvedLiteral cannot be cast 
> to com.google.zetasql.resolvedast.ResolvedNodes$ResolvedColumnRef
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:62)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:37)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:97)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at java.util.Collections$2.tryAdvance(Collections.java:4717)
>   at java.util.Collections$2.forEachRemaining(Collections.java:4725)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:96)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convert(QueryStatementConverter.java:84)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertRootQuery(QueryStatementConverter.java:51)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:160)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:131)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:115)
>   at 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl.executeQuery(ExecuteQueryServiceServer.java:242)
>   at 
> com.google.zetasql.testing.SqlComplianceServiceGrpc$MethodHandlers.invoke(SqlComplianceServiceGrpc.java:423)
>   at 
> com.google.zetasql.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
>   at 
> com.google.zetasql.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
>   at 
> com.google.zetasql.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711)
>   at 
> com.google.zetasql.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>   at 
> com.google.zetasql.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748) {code}
> {code}
> Apr 01, 2020 11:48:47 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT a, b FROM (SELECT 1 a) JOIN UNNEST([1, 
> 2, 3]) b ON b > a
> Apr 01, 2020 11:48:48 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT a, b
> FROM UNNEST([1, 1, 2, 3, 5, 8, 13, NULL]) a
> JOIN UNNEST([1, 2, 3, 5, 7, 11, 13, NULL]) b
>   ON a = b
> Apr 01, 2020 11:48:48 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT a, b
> FROM UNNEST([1, 1, 2, 3, 5, 8, 13, NULL]) a
> LEFT JOIN UNNEST([1, 2, 3, 5, 7, 11, 13, NULL]) b
>   ON a = b
> Apr 01, 2020 11:48:49 AM 
>

[jira] [Work started] (BEAM-9657) ArrayScanToJoinConverter ResolvedLiteral cast to ResolvedColumnRef

2020-05-04 Thread Andrew Pilloud (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-9657 started by Andrew Pilloud.

> ArrayScanToJoinConverter ResolvedLiteral cast to ResolvedColumnRef
> --
>
> Key: BEAM-9657
> URL: https://issues.apache.org/jira/browse/BEAM-9657
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Critical
>  Labels: zetasql-compliance
>
> 5 failures in shard 2
> {code:java}
> Apr 01, 2020 11:48:47 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> SEVERE:  com.google.zetasql.resolvedast.ResolvedNodes$ResolvedLiteral 
> cannot be cast to 
> com.google.zetasql.resolvedast.ResolvedNodes$ResolvedColumnRef
> java.lang.ClassCastException: 
> com.google.zetasql.resolvedast.ResolvedNodes$ResolvedLiteral cannot be cast 
> to com.google.zetasql.resolvedast.ResolvedNodes$ResolvedColumnRef
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:62)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.ArrayScanToJoinConverter.convert(ArrayScanToJoinConverter.java:37)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:97)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at java.util.Collections$2.tryAdvance(Collections.java:4717)
>   at java.util.Collections$2.forEachRemaining(Collections.java:4725)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:96)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convert(QueryStatementConverter.java:84)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertRootQuery(QueryStatementConverter.java:51)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:160)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:131)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:115)
>   at 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl.executeQuery(ExecuteQueryServiceServer.java:242)
>   at 
> com.google.zetasql.testing.SqlComplianceServiceGrpc$MethodHandlers.invoke(SqlComplianceServiceGrpc.java:423)
>   at 
> com.google.zetasql.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
>   at 
> com.google.zetasql.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
>   at 
> com.google.zetasql.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711)
>   at 
> com.google.zetasql.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>   at 
> com.google.zetasql.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748) {code}
> {code}
> Apr 01, 2020 11:48:47 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT a, b FROM (SELECT 1 a) JOIN UNNEST([1, 
> 2, 3]) b ON b > a
> Apr 01, 2020 11:48:48 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT a, b
> FROM UNNEST([1, 1, 2, 3, 5, 8, 13, NULL]) a
> JOIN UNNEST([1, 2, 3, 5, 7, 11, 13, NULL]) b
>   ON a = b
> Apr 01, 2020 11:48:48 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT a, b
> FROM UNNEST([1, 1, 2, 3, 5, 8, 13, NULL]) a
> LEFT JOIN UNNEST([1, 2, 3, 5, 7, 11, 13, NULL]) b
>   ON a = b
> Apr 01, 2020 11:48:49 AM 
>

[jira] [Created] (BEAM-9880) touch: build/target/third_party_licenses/skip: No such file or directory

2020-05-04 Thread Kyle Weaver (Jira)

Kyle Weaver created BEAM-9880:
-

 Summary: touch: build/target/third_party_licenses/skip: No such 
file or directory
 Key: BEAM-9880
 URL: https://issues.apache.org/jira/browse/BEAM-9880
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-harness
Reporter: Kyle Weaver
Assignee: Hannah Jiang


When I run ./gradlew :sdks:python:test-suites:portable:py2:crossLanguageTests, 
I get the following error:

> Task :sdks:java:container:createFile FAILED
touch: build/target/third_party_licenses/skip: No such file or directory

When I do `ls build`, the only thing it outputs is `gradleenv`. So it looks 
like it's assuming the directory exists, when it might not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9877) Eager size estimation of large group-by-key iterables cause expensive / duplicate reads

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9877?focusedWorklogId=430384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430384
 ]

ASF GitHub Bot logged work on BEAM-9877:


Author: ASF GitHub Bot
Created on: 04/May/20 19:59
Start Date: 04/May/20 19:59
Worklog Time Spent: 10m 
  Work Description: tudorm commented on pull request #11601:
URL: https://github.com/apache/beam/pull/11601#issuecomment-623674559


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430384)
Time Spent: 50m  (was: 40m)

> Eager size estimation of large group-by-key iterables cause expensive / 
> duplicate reads
> ---
>
> Key: BEAM-9877
> URL: https://issues.apache.org/jira/browse/BEAM-9877
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Tudor Marian
>Assignee: Tudor Marian
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> BatchGroupAlsoByWindowViaIteratorsFn WindowReiterator can cause very 
> expensive duplicate reading of data from a (Co-)GroupByKey for hot keys with 
> many values due to PCollection size estimation.  Instead, it should perform 
> lazy estimation like GroupingShuffleReader and GroupingShuffleEntryInterator 
> (or perform no estimation at all).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9877) Eager size estimation of large group-by-key iterables cause expensive / duplicate reads

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9877?focusedWorklogId=430382=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430382
 ]

ASF GitHub Bot logged work on BEAM-9877:


Author: ASF GitHub Bot
Created on: 04/May/20 19:56
Start Date: 04/May/20 19:56
Worklog Time Spent: 10m 
  Work Description: tudorm commented on pull request #11598:
URL: https://github.com/apache/beam/pull/11598#issuecomment-623673128


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430382)
Time Spent: 40m  (was: 0.5h)

> Eager size estimation of large group-by-key iterables cause expensive / 
> duplicate reads
> ---
>
> Key: BEAM-9877
> URL: https://issues.apache.org/jira/browse/BEAM-9877
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Tudor Marian
>Assignee: Tudor Marian
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> BatchGroupAlsoByWindowViaIteratorsFn WindowReiterator can cause very 
> expensive duplicate reading of data from a (Co-)GroupByKey for hot keys with 
> many values due to PCollection size estimation.  Instead, it should perform 
> lazy estimation like GroupingShuffleReader and GroupingShuffleEntryInterator 
> (or perform no estimation at all).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (BEAM-9661) LimitOffsetScanToOrderByLimitConverter IndexOutOfBoundsException

2020-05-04 Thread Andrew Pilloud (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-9661 started by Andrew Pilloud.

> LimitOffsetScanToOrderByLimitConverter IndexOutOfBoundsException
> 
>
> Key: BEAM-9661
> URL: https://issues.apache.org/jira/browse/BEAM-9661
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Critical
>  Labels: zetasql-compliance
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Fourteen failures in shard 37, one failure in shard 44
> {code:java}
> java.lang.IndexOutOfBoundsException: index (4) must be less than size (1)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:310)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:293)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.SingletonImmutableList.get(SingletonImmutableList.java:41)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexBuilder.makeInputRef(RexBuilder.java:855)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Sort.(Sort.java:103)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalSort.(LogicalSort.java:37)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalSort.create(LogicalSort.java:63)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.LimitOffsetScanToOrderByLimitConverter.convert(LimitOffsetScanToOrderByLimitConverter.java:74)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.LimitOffsetScanToOrderByLimitConverter.convert(LimitOffsetScanToOrderByLimitConverter.java:40)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:97)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convert(QueryStatementConverter.java:84)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertRootQuery(QueryStatementConverter.java:51)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:160)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:131)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:115)
>   at 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl.executeQuery(ExecuteQueryServiceServer.java:242)
>   at 
> com.google.zetasql.testing.SqlComplianceServiceGrpc$MethodHandlers.invoke(SqlComplianceServiceGrpc.java:423)
>   at 
> com.google.zetasql.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
>   at 
> com.google.zetasql.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
>   at 
> com.google.zetasql.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711)
>   at 
> com.google.zetasql.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>   at 
> com.google.zetasql.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> {code}
> Apr 01, 2020 11:50:41 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT a FROM (SELECT 2.2 a UNION ALL SELECT 
> 1.1 UNION ALL SELECT 3.3) ORDER BY a LIMIT 0
> Apr 01, 2020 11:50:42 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT a FROM (SELECT 2.2 a UNION ALL SELECT 
> 1.1 UNION ALL SELECT 3.3) ORDER BY a LIMIT 1
> Apr 01, 2020 11:50:43 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT a FROM (SELECT 2.2 a UNION ALL SELECT 
> 1.1 UNION ALL SELECT 3.3) ORDER BY a LIMIT 2
> Apr 01, 2020 11:50:43 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT a FROM (SELECT 3.3 a UNION ALL SELECT 
> 1.1 UNION ALL SELECT 2.2) ORDER BY a LIMIT 3
> Apr 01, 2020

[jira] [Assigned] (BEAM-9661) LimitOffsetScanToOrderByLimitConverter IndexOutOfBoundsException

2020-05-04 Thread Andrew Pilloud (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Pilloud reassigned BEAM-9661:


Assignee: Andrew Pilloud

> LimitOffsetScanToOrderByLimitConverter IndexOutOfBoundsException
> 
>
> Key: BEAM-9661
> URL: https://issues.apache.org/jira/browse/BEAM-9661
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Critical
>  Labels: zetasql-compliance
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Fourteen failures in shard 37, one failure in shard 44
> {code:java}
> java.lang.IndexOutOfBoundsException: index (4) must be less than size (1)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:310)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:293)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.SingletonImmutableList.get(SingletonImmutableList.java:41)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexBuilder.makeInputRef(RexBuilder.java:855)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Sort.(Sort.java:103)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalSort.(LogicalSort.java:37)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalSort.create(LogicalSort.java:63)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.LimitOffsetScanToOrderByLimitConverter.convert(LimitOffsetScanToOrderByLimitConverter.java:74)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.LimitOffsetScanToOrderByLimitConverter.convert(LimitOffsetScanToOrderByLimitConverter.java:40)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:97)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convert(QueryStatementConverter.java:84)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertRootQuery(QueryStatementConverter.java:51)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:160)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:131)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:115)
>   at 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl.executeQuery(ExecuteQueryServiceServer.java:242)
>   at 
> com.google.zetasql.testing.SqlComplianceServiceGrpc$MethodHandlers.invoke(SqlComplianceServiceGrpc.java:423)
>   at 
> com.google.zetasql.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
>   at 
> com.google.zetasql.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
>   at 
> com.google.zetasql.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711)
>   at 
> com.google.zetasql.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>   at 
> com.google.zetasql.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> {code}
> Apr 01, 2020 11:50:41 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT a FROM (SELECT 2.2 a UNION ALL SELECT 
> 1.1 UNION ALL SELECT 3.3) ORDER BY a LIMIT 0
> Apr 01, 2020 11:50:42 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT a FROM (SELECT 2.2 a UNION ALL SELECT 
> 1.1 UNION ALL SELECT 3.3) ORDER BY a LIMIT 1
> Apr 01, 2020 11:50:43 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT a FROM (SELECT 2.2 a UNION ALL SELECT 
> 1.1 UNION ALL SELECT 3.3) ORDER BY a LIMIT 2
> Apr 01, 2020 11:50:43 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT a FROM (SELECT 3.3 a UNION ALL SELECT 
> 1.1 UNION ALL SELECT 2.2) ORDER BY a LIMIT 3
> Apr

[jira] [Work logged] (BEAM-9661) LimitOffsetScanToOrderByLimitConverter IndexOutOfBoundsException

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9661?focusedWorklogId=430373=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430373
 ]

ASF GitHub Bot logged work on BEAM-9661:


Author: ASF GitHub Bot
Created on: 04/May/20 19:37
Start Date: 04/May/20 19:37
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #11602:
URL: https://github.com/apache/beam/pull/11602#issuecomment-623663334


   R: @robinyqiu 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430373)
Time Spent: 20m  (was: 10m)

> LimitOffsetScanToOrderByLimitConverter IndexOutOfBoundsException
> 
>
> Key: BEAM-9661
> URL: https://issues.apache.org/jira/browse/BEAM-9661
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Priority: Critical
>  Labels: zetasql-compliance
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Fourteen failures in shard 37, one failure in shard 44
> {code:java}
> java.lang.IndexOutOfBoundsException: index (4) must be less than size (1)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:310)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:293)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.SingletonImmutableList.get(SingletonImmutableList.java:41)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexBuilder.makeInputRef(RexBuilder.java:855)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Sort.(Sort.java:103)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalSort.(LogicalSort.java:37)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalSort.create(LogicalSort.java:63)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.LimitOffsetScanToOrderByLimitConverter.convert(LimitOffsetScanToOrderByLimitConverter.java:74)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.LimitOffsetScanToOrderByLimitConverter.convert(LimitOffsetScanToOrderByLimitConverter.java:40)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode(QueryStatementConverter.java:97)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convert(QueryStatementConverter.java:84)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertRootQuery(QueryStatementConverter.java:51)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:160)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:131)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:115)
>   at 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl.executeQuery(ExecuteQueryServiceServer.java:242)
>   at 
> com.google.zetasql.testing.SqlComplianceServiceGrpc$MethodHandlers.invoke(SqlComplianceServiceGrpc.java:423)
>   at 
> com.google.zetasql.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
>   at 
> com.google.zetasql.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
>   at 
> com.google.zetasql.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711)
>   at 
> com.google.zetasql.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>   at 
> com.google.zetasql.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> {code}
> Apr 01, 2020 11:50:41 AM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT a FROM (SELECT 2.2 a UNION ALL SELECT 
> 1.1 UNION ALL SELECT 3.3) ORDER BY a LIMIT 0
> Apr 01, 2020 11:50:42

[jira] [Work logged] (BEAM-9661) LimitOffsetScanToOrderByLimitConverter IndexOutOfBoundsException

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9661?focusedWorklogId=430372=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430372
 ]

ASF GitHub Bot logged work on BEAM-9661:


Author: ASF GitHub Bot
Created on: 04/May/20 19:35
Start Date: 04/May/20 19:35
Worklog Time Spent: 10m 
  Work Description: apilloud opened a new pull request #11602:
URL: https://github.com/apache/beam/pull/11602


   It turns out ZetaSQL column references don't always start at 0, an example 
of this is on union operators. We probably have a number of bugs in this regard.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build

[jira] [Work logged] (BEAM-9877) Eager size estimation of large group-by-key iterables cause expensive / duplicate reads

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9877?focusedWorklogId=430371=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430371
 ]

ASF GitHub Bot logged work on BEAM-9877:


Author: ASF GitHub Bot
Created on: 04/May/20 19:30
Start Date: 04/May/20 19:30
Worklog Time Spent: 10m 
  Work Description: ibzib opened a new pull request #11601:
URL: https://github.com/apache/beam/pull/11601


   …entByteSizeObservableIterable so that size estimation is lazy
   
   Cherry-pick of #11598. (I removed the commit history because I forgot to 
squash before merging the original PR, and one atomic commit is a lot easier to 
deal with.) R: @tudorm 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build

[jira] [Work logged] (BEAM-9840) Support for Parameterized Types when converting from HCatRecords to Rows in HCatalogIO

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9840?focusedWorklogId=430369=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430369
 ]

ASF GitHub Bot logged work on BEAM-9840:


Author: ASF GitHub Bot
Created on: 04/May/20 19:07
Start Date: 04/May/20 19:07
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on a change in pull request 
#11569:
URL: https://github.com/apache/beam/pull/11569#discussion_r419653421



##
File path: 
sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/SchemaUtilsTest.java
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.hcatalog;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class SchemaUtilsTest {
+  @Test
+  public void testParameterizedTypesToBeamTypes() {
+List listOfFieldSchema = new ArrayList<>();
+listOfFieldSchema.add(new FieldSchema("parameterizedChar", "char(10)", 
null));
+listOfFieldSchema.add(new FieldSchema("parameterizedVarchar", 
"varchar(100)", null));
+listOfFieldSchema.add(new FieldSchema("parameterizedDecimal", 
"decimal(30,16)", null));
+
+Schema expectedSchema =
+Schema.builder()
+.addNullableField("parameterizedChar", Schema.FieldType.STRING)
+.addNullableField("parameterizedVarchar", Schema.FieldType.STRING)
+.addNullableField("parameterizedDecimal", Schema.FieldType.DECIMAL)

Review comment:
   I think these should map to logical types instead of to primitives so we 
don't lose the information from the parameter. Unfortunately we don't (yet) 
have good logical types in `schemas.logicaltypes` to map them to, but maybe we 
will after your other PR, https://github.com/apache/beam/pull/11581 (or you 
could just add the relevant ones here).
   
   `char(10)` looks like it could map to a `FixedLengthString` logical type, 
`varchar(100)` probably deserves its own type, maybe just called `Varchar`? and 
I've been meaning to add a logical type for DECIMAL parameterized by precision 
and scale as part of 
[BEAM-7554](https://issues.apache.org/jira/browse/BEAM-7554) (and deprecate the 
primitive one).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430369)
Time Spent: 50m  (was: 40m)

> Support for Parameterized Types when converting from HCatRecords to Rows in 
> HCatalogIO
> --
>
> Key: BEAM-9840
> URL: https://issues.apache.org/jira/browse/BEAM-9840
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hcatalog
>Reporter: Rahul Patwari
>Assignee: Rahul Patwari
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In Hive, to use CHAR and VARCHAR as the data type for a column, the types 
> have to be parameterized with the length of the character sequence. Refer 
> [https://github.com/apache/hive/blob/f37c5de6c32b9395d1b34fa3c02ed06d1bfbf6eb/serde/src/test/org/apache/hadoop/hive/serde2/typeinfo/TestTypeInfoUtils.java#L68]
> In addition, for the DECIMAL data type, custom precision and scale can be 
> provided as parameters.
> A user has faced an Exception while reading data from a table created in Hive 
> with a parameterized DECIMAL data type. Refer 
> [https://lists.apache.org/thread.html/r159012fbefce24d734096e3ec24ecd112de5f89b8029e57147d233b0%40%3Cuser.beam.apache.org%3E].
> This ticket is created to support converting from HcatRecords to Rows when 
> the HCatRecords

[jira] [Work logged] (BEAM-9876) Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9876?focusedWorklogId=430367=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430367
 ]

ASF GitHub Bot logged work on BEAM-9876:


Author: ASF GitHub Bot
Created on: 04/May/20 19:04
Start Date: 04/May/20 19:04
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #11554:
URL: https://github.com/apache/beam/pull/11554#discussion_r419661024



##
File path: website/www/site/content/en/contribute/release-guide.md
##
@@ -218,7 +214,7 @@ docker login docker.io
 After successful login, authorization info will be stored at 
~/.docker/config.json file. For example,
 ```
 "https://index.docker.io/v1/": {
-   "auth": "xx"
+   "auth": "aGFubmFoamlhbmc6cmtkdGpmZ2hrMTIxMw=="

Review comment:
   Probably don't want to check this in.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430367)
Time Spent: 20m  (was: 10m)

> Migrate the Beam website from Jekyll to Hugo to enable localization of the 
> site content
> ---
>
> Key: BEAM-9876
> URL: https://issues.apache.org/jira/browse/BEAM-9876
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Aizhamal Nurmamat kyzy
>Assignee: Aizhamal Nurmamat kyzy
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Enable internationalization of the Apache Beam website to increase the reach 
> of the project, and facilitate adoption and growth of its community.
> The proposal was to do this by migrating the current Apache Beam website from 
> Jekyll do Hugo [1]. Hugo supports internationalization out-of-the-box, making 
> it easier both for contributors and maintainers support the 
> internationalization effort.
> The further discussion on implementation can be viewed here  [2]
> [1] 
> [https://lists.apache.org/thread.html/rfab4cc1411318c3f4667bee051df68f37be11846ada877f3576c41a9%40%3Cdev.beam.apache.org%3E]
> [2] 
> [https://lists.apache.org/thread.html/r6b999b6d7d1f6cbb94e16bb2deed2b65098a6b14c4ac98707fe0c36a%40%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9874) Portable timers can't be cleared in batch mode

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9874?focusedWorklogId=430361=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430361
 ]

ASF GitHub Bot logged work on BEAM-9874:


Author: ASF GitHub Bot
Created on: 04/May/20 18:59
Start Date: 04/May/20 18:59
Worklog Time Spent: 10m 
  Work Description: ibzib commented on a change in pull request #11597:
URL: https://github.com/apache/beam/pull/11597#discussion_r419658069



##
File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryTimerInternals.java
##
@@ -167,20 +165,17 @@ public void deleteTimer(StateNamespace namespace, String 
timerId, TimeDomain tim
   @Deprecated
   @Override
   public void deleteTimer(StateNamespace namespace, String timerId, String 
timerFamilyId) {
-TimerData existing = existingTimers.get(namespace, timerId + '+' + 
timerFamilyId);
-if (existing != null) {
-  deleteTimer(existing);
+TimerData removedTimer = existingTimers.remove(namespace, timerId + '+' + 
timerFamilyId);
+if (removedTimer != null) {
+  timersForDomain(removedTimer.getDomain()).remove(removedTimer);
 }
   }
 
   /** @deprecated use {@link #deleteTimer(StateNamespace, String, 
TimeDomain)}. */

Review comment:
   Maybe worth a JIRA/refactor? I'll leave it up to you since I am not as 
familiar with this code.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430361)
Time Spent: 1h 20m  (was: 1h 10m)

> Portable timers can't be cleared in batch mode
> --
>
> Key: BEAM-9874
> URL: https://issues.apache.org/jira/browse/BEAM-9874
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> After BEAM-9801, the {{test_pardo_timers_clear}} test fails. The test was 
> probably broken before but this was not visible because we weren't depleting 
> timers on shutdown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9879) Support STRING_AGG in BeamSQL

2020-05-04 Thread Rui Wang (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099255#comment-17099255
 ] 

Rui Wang commented on BEAM-9879:


A good reference of implementation is what is done in 

https://issues.apache.org/jira/browse/BEAM-9418 
(https://github.com/apache/beam/pull/11333)

> Support STRING_AGG in BeamSQL
> -
>
> Key: BEAM-9879
> URL: https://issues.apache.org/jira/browse/BEAM-9879
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Major
>
> Support STRING_AGG function in ZetaSQL dialect. (Also if Calcite has this 
> function built-in, support it in Calcite dialect too)
> See the reference: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#string_agg
> Examples:
> {code:sql}
> SELECT STRING_AGG(fruit) AS string_agg
> FROM UNNEST(["apple", NULL, "pear", "banana", "pear"]) AS fruit;
> ++
> | string_agg |
> ++
> | apple,pear,banana,pear |
> ++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9879) Support STRING_AGG in BeamSQL

2020-05-04 Thread Rui Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-9879:
---
Status: Open  (was: Triage Needed)

> Support STRING_AGG in BeamSQL
> -
>
> Key: BEAM-9879
> URL: https://issues.apache.org/jira/browse/BEAM-9879
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Major
>
> Support STRING_AGG function in ZetaSQL dialect. (Also if Calcite has this 
> function built-in, support it in Calcite dialect too)
> See the reference: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#string_agg
> Examples:
> {code:sql}
> SELECT STRING_AGG(fruit) AS string_agg
> FROM UNNEST(["apple", NULL, "pear", "banana", "pear"]) AS fruit;
> ++
> | string_agg |
> ++
> | apple,pear,banana,pear |
> ++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-9879) Support STRING_AGG in BeamSQL

2020-05-04 Thread Rui Wang (Jira)

Rui Wang created BEAM-9879:
--

 Summary: Support STRING_AGG in BeamSQL
 Key: BEAM-9879
 URL: https://issues.apache.org/jira/browse/BEAM-9879
 Project: Beam
  Issue Type: Task
  Components: dsl-sql, dsl-sql-zetasql
Reporter: Rui Wang


Support STRING_AGG function in ZetaSQL dialect. (Also if Calcite has this 
function built-in, support it in Calcite dialect too)

See the reference: 
https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#string_agg


Examples:


{code:sql}
SELECT STRING_AGG(fruit) AS string_agg
FROM UNNEST(["apple", NULL, "pear", "banana", "pear"]) AS fruit;

++
| string_agg |
++
| apple,pear,banana,pear |
++
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-9418) Support ANY_VALUE aggregation functions

2020-05-04 Thread Rui Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang resolved BEAM-9418.

Fix Version/s: Not applicable
   Resolution: Fixed

> Support ANY_VALUE aggregation functions
> ---
>
> Key: BEAM-9418
> URL: https://issues.apache.org/jira/browse/BEAM-9418
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: John Mora
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Support the following functionality in BeamSQL:
> {code:java}
> "select t.key, ANY_VALUE(t.column) from t group by t.key";
> {code}
> Spec link: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#any_value



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9418) Support ANY_VALUE aggregation functions

2020-05-04 Thread Rui Wang (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099247#comment-17099247
 ] 

Rui Wang commented on BEAM-9418:


Thanks John for your contribution! The PR has merged.

> Support ANY_VALUE aggregation functions
> ---
>
> Key: BEAM-9418
> URL: https://issues.apache.org/jira/browse/BEAM-9418
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: John Mora
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Support the following functionality in BeamSQL:
> {code:java}
> "select t.key, ANY_VALUE(t.column) from t group by t.key";
> {code}
> Spec link: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#any_value



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-9878) Update ElementByteSizeObservableIterable

2020-05-04 Thread Kyle Weaver (Jira)

Kyle Weaver created BEAM-9878:
-

 Summary: Update ElementByteSizeObservableIterable
 Key: BEAM-9878
 URL: https://issues.apache.org/jira/browse/BEAM-9878
 Project: Beam
  Issue Type: Improvement
  Components: runner-dataflow
Reporter: Kyle Weaver


It's odd that ElementByteSizeObservableIterable::iterator adds observers within 
the method body. I assume this is for historic reasons, since it doesn't seem 
to do anything now, and the comment documenting references a setObserver method 
that doesn't exist.

https://github.com/apache/beam/blob/6453e859badcb629ae2528b77d84235b7291ff89/sdks/java/core/src/main/java/org/apache/beam/sdk/util/common/ElementByteSizeObservableIterable.java#L49-L61



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9877) Eager size estimation of large group-by-key iterables cause expensive / duplicate reads

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9877?focusedWorklogId=430356=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430356
 ]

ASF GitHub Bot logged work on BEAM-9877:


Author: ASF GitHub Bot
Created on: 04/May/20 18:45
Start Date: 04/May/20 18:45
Worklog Time Spent: 10m 
  Work Description: ibzib commented on a change in pull request #11598:
URL: https://github.com/apache/beam/pull/11598#discussion_r419649174



##
File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BatchGroupAlsoByWindowViaIteratorsFn.java
##
@@ -165,12 +168,17 @@ public WindowReiterable(
 }
 
 @Override
-public Reiterator iterator() {
+public WindowReiterator iterator() {

Review comment:
   It's odd that `ElementByteSizeObservableIterable::iterator` adds 
observers within the method body. I assume this is for historic reasons, since 
it doesn't seem to do anything now, and the comment documenting references a 
`setObserver` method that doesn't exist. Anyway, your change looks fine. But we 
should consider cleaning this up.
   
   
https://github.com/apache/beam/blob/6453e859badcb629ae2528b77d84235b7291ff89/sdks/java/core/src/main/java/org/apache/beam/sdk/util/common/ElementByteSizeObservableIterable.java#L49-L61





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430356)
Time Spent: 20m  (was: 10m)

> Eager size estimation of large group-by-key iterables cause expensive / 
> duplicate reads
> ---
>
> Key: BEAM-9877
> URL: https://issues.apache.org/jira/browse/BEAM-9877
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Tudor Marian
>Assignee: Tudor Marian
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> BatchGroupAlsoByWindowViaIteratorsFn WindowReiterator can cause very 
> expensive duplicate reading of data from a (Co-)GroupByKey for hot keys with 
> many values due to PCollection size estimation.  Instead, it should perform 
> lazy estimation like GroupingShuffleReader and GroupingShuffleEntryInterator 
> (or perform no estimation at all).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9840) Support for Parameterized Types when converting from HCatRecords to Rows in HCatalogIO

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9840?focusedWorklogId=430353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430353
 ]

ASF GitHub Bot logged work on BEAM-9840:


Author: ASF GitHub Bot
Created on: 04/May/20 18:41
Start Date: 04/May/20 18:41
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #11569:
URL: https://github.com/apache/beam/pull/11569#issuecomment-623636289


   @akedin isn't very involved with Beam anymore. I think I can help review 
this instead



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430353)
Time Spent: 40m  (was: 0.5h)

> Support for Parameterized Types when converting from HCatRecords to Rows in 
> HCatalogIO
> --
>
> Key: BEAM-9840
> URL: https://issues.apache.org/jira/browse/BEAM-9840
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hcatalog
>Reporter: Rahul Patwari
>Assignee: Rahul Patwari
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In Hive, to use CHAR and VARCHAR as the data type for a column, the types 
> have to be parameterized with the length of the character sequence. Refer 
> [https://github.com/apache/hive/blob/f37c5de6c32b9395d1b34fa3c02ed06d1bfbf6eb/serde/src/test/org/apache/hadoop/hive/serde2/typeinfo/TestTypeInfoUtils.java#L68]
> In addition, for the DECIMAL data type, custom precision and scale can be 
> provided as parameters.
> A user has faced an Exception while reading data from a table created in Hive 
> with a parameterized DECIMAL data type. Refer 
> [https://lists.apache.org/thread.html/r159012fbefce24d734096e3ec24ecd112de5f89b8029e57147d233b0%40%3Cuser.beam.apache.org%3E].
> This ticket is created to support converting from HcatRecords to Rows when 
> the HCatRecords have parameterized types.
> To Support Parameterized data types, we can make use of HcatFieldSchema. 
> Refer 
> [https://github.com/apache/hive/blob/f37c5de6c32b9395d1b34fa3c02ed06d1bfbf6eb/hcatalog/core/src/main/java/org/apache/hive/hcatalog/data/schema/HCatFieldSchema.java#L34]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-9824) Multiple reshuffles are ignored in some cases on Flink batch runner.

2020-05-04 Thread Brian Hulette (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved BEAM-9824.
-
Resolution: Fixed

Please re-open if this actually fixed

> Multiple reshuffles are ignored in some cases on Flink batch runner.
> 
>
> Key: BEAM-9824
> URL: https://issues.apache.org/jira/browse/BEAM-9824
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.19.0, 2.20.0
>Reporter: David Morávek
>Assignee: David Morávek
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Multiple reshuffles are ignored in some cases on Flink batch runner. This may 
> lead to huge performace penalty in IO connectors (when reshuffling splits).
> In flink optimizer, when we `.rebalance()` dataset, is output channel is 
> marked as `FORCED_REBALANCED`. When we chain this with another 
> `.rebalance()`, the latter is ignored because it's source is already 
> `FORCED_REBALANCED`, thus requested property is met. This is correct beaviour 
> because rebalance is idempotent.
> When we include `flatMap` in between rebalances -> 
> `.rebalance().flatMap(...).rebalance()`, we need to reshuffle again, because 
> dataset distribution may have changed (eg. you can possibli emit unbouded 
> stream from a single element). Unfortunatelly `flatMap` output is still 
> incorrectly marked as `FORCED_REBALANCED` and the second reshuffle gets 
> ignored.
> This especially affects IO connectors -> `FileIO.match()` returns reshuffled 
> list of matched files -> we split each file into ranges -> **reshuffle** -> 
> read. Ignoring the second reshuffle leads to huge perf. degradation (5m -> 2h 
> in one of our production pipelines)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-9707) Hardcode runner harness container image for unified worker

2020-05-04 Thread Brian Hulette (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned BEAM-9707:
---

Assignee: Ankur Goenka

> Hardcode runner harness container image for unified worker
> --
>
> Key: BEAM-9707
> URL: https://issues.apache.org/jira/browse/BEAM-9707
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Hardcode runner harness image temporarily to support usage of unified worker 
> on head.
> Remove this hardcoding once 2.21 is release



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9707) Hardcode runner harness container image for unified worker

2020-05-04 Thread Brian Hulette (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-9707:

Status: Open  (was: Triage Needed)

> Hardcode runner harness container image for unified worker
> --
>
> Key: BEAM-9707
> URL: https://issues.apache.org/jira/browse/BEAM-9707
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Hardcode runner harness image temporarily to support usage of unified worker 
> on head.
> Remove this hardcoding once 2.21 is release



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-1819) Key should be available in @OnTimer methods

2020-05-04 Thread Brian Hulette (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099228#comment-17099228
 ] 

Brian Hulette commented on BEAM-1819:
-

[~mxm] or [~reuvenlax]  could you close this if PR #11154 addressed it 
completely?

> Key should be available in @OnTimer methods
> ---
>
> Key: BEAM-1819
> URL: https://issues.apache.org/jira/browse/BEAM-1819
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Rehman Murad Ali
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 32.5h
>  Remaining Estimate: 0h
>
> Every timer firing has an associated key. This key should be available when 
> the timer is delivered to a user's {{DoFn}}, so they don't have to store it 
> in state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9643) Add user-facing Go SDF documentation.

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9643?focusedWorklogId=430343=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430343
 ]

ASF GitHub Bot logged work on BEAM-9643:


Author: ASF GitHub Bot
Created on: 04/May/20 18:22
Start Date: 04/May/20 18:22
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #11517:
URL: https://github.com/apache/beam/pull/11517#discussion_r419635469



##
File path: sdks/go/pkg/beam/core/sdf/sdf.go
##
@@ -13,63 +13,64 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
-// Package sdf is experimental, incomplete, and not yet meant for general 
usage.
+// Package contains interfaces used specifically for splittable DoFns.
+//
+// Warning: Splittable DoFns are still experimental, largely untested, and
+// likely to have bugs.
 package sdf
 
 // RTracker is an interface used to interact with restrictions while 
processing elements in
-// SplittableDoFns. Each implementation of RTracker is expected to be used for 
tracking a single
-// restriction type, which is the type that should be used to create the 
RTracker, and output by
-// TrySplit.
+// splittable DoFns (specifically, in the ProcessElement method). Each 
RTracker tracks the progress
+// of a single restriction.
 type RTracker interface {
-   // TryClaim attempts to claim the block of work in the current 
restriction located at a given
-   // position. This method must be used in the ProcessElement method of 
Splittable DoFns to claim
-   // work before performing it. If no work is claimed, the ProcessElement 
is not allowed to perform
-   // work or emit outputs. If the claim is successful, the DoFn must 
process the entire block. If
-   // the claim is unsuccessful the ProcessElement method of the DoFn must 
return without performing
-   // any additional work or emitting any outputs.
-   //
-   // TryClaim accepts an arbitrary value that can be interpreted as the 
position of a block, and
-   // returns a boolean indicating whether the claim succeeded.
+   // TryClaim attempts to claim the block of work located in the given 
position of the
+   // restriction. This method must be called in ProcessElement to claim 
work before it can be
+   // processed. Processing work without claiming it first can lead to 
incorrect output.
//
-   // If the claim fails due to an error, that error can be retrieved with 
GetError.
+   // If the claim is successful, the DoFn must process the entire block. 
If the claim is
+   // unsuccessful ProcessElement method of the DoFn must return without 
performing
+   // any additional work or emitting any outputs.
//
-   // For SDFs to work properly, claims must always be monotonically 
increasing in reference to the
-   // restriction's start and end points, and every block of work in a 
restriction must be claimed.
+   // If the claim fails due to an error, that error is stored and will be 
automatically emitted
+   // when the RTracker is validated, or can be manually retrieved with 
GetError.
//
// This pseudocode example illustrates the typical usage of TryClaim:
//
-   //  pos = position of first block after restriction.start
+   //  pos = position of first block within the restriction
//  for TryClaim(pos) == true {
//  // Do all work in the claimed block and emit outputs.
-   //  pos = position of next block
+   //  pos = position of next block within the restriction
//  }
//  return
TryClaim(pos interface{}) (ok bool)
 
-   // GetError returns the error that made this RTracker stop executing, 
and it returns nil if no
-   // error occurred. If IsDone fails while validating this RTracker, this 
method will be
-   // called to log the error.
+   // GetError returns the error that made this RTracker stop executing, 
and returns nil if no
+   // error occurred. This is the error that is emitted if automated 
validation fails.
GetError() error
 
-   // TrySplit splits the current restriction into a primary and residual 
based on a fraction of the
-   // work remaining. The split is performed along the first valid split 
point located after the
-   // given fraction of the remainder. This method is called by the SDK 
harness when receiving a
-   // split request by the runner.
+   // TrySplit splits the current restriction into a primary (currently 
executing work) and
+   // residual (work to be split off) based on a fraction of work 
remaining. The split is performed
+   // at the first valid split point located after the given fraction of 
remaining work.
+   //
+

[jira] [Work logged] (BEAM-9874) Portable timers can't be cleared in batch mode

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9874?focusedWorklogId=430341=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430341
 ]

ASF GitHub Bot logged work on BEAM-9874:


Author: ASF GitHub Bot
Created on: 04/May/20 18:17
Start Date: 04/May/20 18:17
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #11597:
URL: https://github.com/apache/beam/pull/11597#discussion_r419632632



##
File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryTimerInternals.java
##
@@ -167,20 +165,17 @@ public void deleteTimer(StateNamespace namespace, String 
timerId, TimeDomain tim
   @Deprecated
   @Override
   public void deleteTimer(StateNamespace namespace, String timerId, String 
timerFamilyId) {
-TimerData existing = existingTimers.get(namespace, timerId + '+' + 
timerFamilyId);
-if (existing != null) {
-  deleteTimer(existing);
+TimerData removedTimer = existingTimers.remove(namespace, timerId + '+' + 
timerFamilyId);
+if (removedTimer != null) {
+  timersForDomain(removedTimer.getDomain()).remove(removedTimer);
 }
   }
 
   /** @deprecated use {@link #deleteTimer(StateNamespace, String, 
TimeDomain)}. */

Review comment:
   yeah, actually that might have been a mistake when the timer family was 
added because the old way was to use `TimerData` and the new way is to use 
timerId/timerFamily.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430341)
Time Spent: 1h 10m  (was: 1h)

> Portable timers can't be cleared in batch mode
> --
>
> Key: BEAM-9874
> URL: https://issues.apache.org/jira/browse/BEAM-9874
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> After BEAM-9801, the {{test_pardo_timers_clear}} test fails. The test was 
> probably broken before but this was not visible because we weren't depleting 
> timers on shutdown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9877) Eager size estimation of large group-by-key iterables cause expensive / duplicate reads

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9877?focusedWorklogId=430338=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-430338
 ]

ASF GitHub Bot logged work on BEAM-9877:


Author: ASF GitHub Bot
Created on: 04/May/20 18:09
Start Date: 04/May/20 18:09
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11598:
URL: https://github.com/apache/beam/pull/11598#issuecomment-623619474


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 430338)
Remaining Estimate: 0h
Time Spent: 10m

> Eager size estimation of large group-by-key iterables cause expensive / 
> duplicate reads
> ---
>
> Key: BEAM-9877
> URL: https://issues.apache.org/jira/browse/BEAM-9877
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Tudor Marian
>Assignee: Tudor Marian
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> BatchGroupAlsoByWindowViaIteratorsFn WindowReiterator can cause very 
> expensive duplicate reading of data from a (Co-)GroupByKey for hot keys with 
> many values due to PCollection size estimation.  Instead, it should perform 
> lazy estimation like GroupingShuffleReader and GroupingShuffleEntryInterator 
> (or perform no estimation at all).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 >

1 - 100 of 173 matches

Mail list logo