[jira] [Resolved] (BEAM-8828) BigQueryTableProvider should allow configuration of write disposition

2020-06-09 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved BEAM-8828.
-
Fix Version/s: 2.23.0
   Resolution: Fixed

> BigQueryTableProvider should allow configuration of write disposition
> -
>
> Key: BEAM-8828
> URL: https://issues.apache.org/jira/browse/BEAM-8828
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Assignee: Scott Lukas
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> It should be possible to set BigQueryIO's 
> [writeDisposition|https://github.com/apache/beam/blob/b446304f75078ca9c97437e685409c31ceab7503/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L2122-L2125]
>  in a Beam SQL big query table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9043) BigQueryIO fails cryptically if gcpTempLocation is set and tempLocation is not

2020-06-09 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-9043:

Labels:   (was: stale-P2)

> BigQueryIO fails cryptically if gcpTempLocation is set and tempLocation is not
> --
>
> Key: BEAM-9043
> URL: https://issues.apache.org/jira/browse/BEAM-9043
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Brian Hulette
>Priority: P2
>
> The following error arises when running a pipeline that uses BigQueryIO with 
> gcpTempLocation set and tempLocation not set. We should either handle this 
> case gracefully, or throw a more helpful error like "please specify 
> tempLocation".
> {code:java}
> 2019-12-24 13:06:18 WARN  UnboundedReadFromBoundedSource:152 - Exception 
> while splitting 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource@5d21202d, skips the 
> initial splits.
> java.lang.NullPointerException
> at java.util.regex.Matcher.getTextLength(Matcher.java:1283)
> at java.util.regex.Matcher.reset(Matcher.java:309)
> at java.util.regex.Matcher.(Matcher.java:229)
> at java.util.regex.Pattern.matcher(Pattern.java:1093)
> at 
> org.apache.beam.sdk.io.FileSystems.parseScheme(FileSystems.java:447)
> at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:533)
> at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.resolveTempLocation(BigQueryHelpers.java:706)
> at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.extractFiles(BigQuerySourceBase.java:125)
> at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.split(BigQuerySourceBase.java:148)
> at 
> org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter.split(UnboundedReadFromBoundedSource.java:144)
> at 
> org.apache.beam.runners.dataflow.internal.CustomSources.serializeToCloudSource(CustomSources.java:87)
> at 
> org.apache.beam.runners.dataflow.ReadTranslator.translateReadHelper(ReadTranslator.java:51)
> at 
> org.apache.beam.runners.dataflow.DataflowRunner$StreamingUnboundedRead$ReadWithIdsTranslator.translate(DataflowRunner.java:1590)
> at 
> org.apache.beam.runners.dataflow.DataflowRunner$StreamingUnboundedRead$ReadWithIdsTranslator.translate(DataflowRunner.java:1587)
> at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.visitPrimitiveTransform(DataflowPipelineTranslator.java:475)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:665)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:317)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:251)
> at 
> org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:460)
> at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:414)
> at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:173)
> at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:763)
> at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:186)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:315)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:301)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-10203) Replace fastjson with jackson

2020-06-09 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129671#comment-17129671
 ] 

Brian Hulette edited comment on BEAM-10203 at 6/9/20, 6:06 PM:
---

cc: [~slukas] this is relevant to your project since fastjson is used in SQL 
table providers


was (Author: bhulette):
cc: [~slukas] this is relevant to your project

> Replace fastjson with jackson
> -
>
> Key: BEAM-10203
> URL: https://issues.apache.org/jira/browse/BEAM-10203
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: P2
>  Labels: beam-fixit
>
> fastjson is only used by Beam SQL, we should switch to jackson to match the 
> rest of Beam and reduce our dependency update responsibilities.
> This is an actual issue, at least once we've hit a security vulnerability we 
> weren't tracking: https://github.com/apache/beam/pull/11758



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10203) Replace fastjson with jackson

2020-06-09 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129671#comment-17129671
 ] 

Brian Hulette commented on BEAM-10203:
--

cc: [~slukas] this is relevant to your project

> Replace fastjson with jackson
> -
>
> Key: BEAM-10203
> URL: https://issues.apache.org/jira/browse/BEAM-10203
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: P2
>  Labels: beam-fixit
>
> fastjson is only used by Beam SQL, we should switch to jackson to match the 
> rest of Beam and reduce our dependency update responsibilities.
> This is an actual issue, at least once we've hit a security vulnerability we 
> weren't tracking: https://github.com/apache/beam/pull/11758



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10224) Test using a DATE field in an aggregation

2020-06-09 Thread Brian Hulette (Jira)
Brian Hulette created BEAM-10224:


 Summary: Test using a DATE field in an aggregation
 Key: BEAM-10224
 URL: https://issues.apache.org/jira/browse/BEAM-10224
 Project: Beam
  Issue Type: Task
  Components: dsl-sql
Reporter: Brian Hulette
Assignee: Robin Qiu


Since logical types are aggregated with their representation type this may be 
an issue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8828) BigQueryTableProvider should allow configuration of write disposition

2020-06-04 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned BEAM-8828:
---

Assignee: Scott Lukas

> BigQueryTableProvider should allow configuration of write disposition
> -
>
> Key: BEAM-8828
> URL: https://issues.apache.org/jira/browse/BEAM-8828
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Assignee: Scott Lukas
>Priority: P2
>
> It should be possible to set BigQueryIO's 
> [writeDisposition|https://github.com/apache/beam/blob/b446304f75078ca9c97437e685409c31ceab7503/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L2122-L2125]
>  in a Beam SQL big query table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-10188) Automate Github release

2020-06-04 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette closed BEAM-10188.

Fix Version/s: Not applicable
   Resolution: Fixed

> Automate Github release
> ---
>
> Key: BEAM-10188
> URL: https://issues.apache.org/jira/browse/BEAM-10188
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Kyle Weaver
>Priority: P2
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, we push the tag to Github and fill in the release notes in 
> separate steps. For feeds consuming these updates, it would be better to do 
> both in the same step using the Github API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10188) Automate Github release

2020-06-04 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned BEAM-10188:


Assignee: (was: Brian Hulette)

> Automate Github release
> ---
>
> Key: BEAM-10188
> URL: https://issues.apache.org/jira/browse/BEAM-10188
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Kyle Weaver
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, we push the tag to Github and fill in the release notes in 
> separate steps. For feeds consuming these updates, it would be better to do 
> both in the same step using the Github API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10188) Automate Github release

2020-06-04 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned BEAM-10188:


Assignee: Brian Hulette

> Automate Github release
> ---
>
> Key: BEAM-10188
> URL: https://issues.apache.org/jira/browse/BEAM-10188
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Kyle Weaver
>Assignee: Brian Hulette
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, we push the tag to Github and fill in the release notes in 
> separate steps. For feeds consuming these updates, it would be better to do 
> both in the same step using the Github API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7405) Task :sdks:python:hdfsIntegrationTest is failing in Python PostCommits - docker-credential-gcloud not installed

2020-06-04 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126056#comment-17126056
 ] 

Brian Hulette commented on BEAM-7405:
-

Oh ok, thanks. I filed BEAM-10193 to track the VM image update. Currently 
unassigned, not sure who I could send it to..

> Task :sdks:python:hdfsIntegrationTest is failing in Python PostCommits - 
> docker-credential-gcloud not installed
> ---
>
> Key: BEAM-7405
> URL: https://issues.apache.org/jira/browse/BEAM-7405
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: P2
>  Labels: stale-assigned
> Fix For: 2.14.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> This failure happened on apache-beam-jenkins-14.
> {noformat}
> 18:47:03 > Task :sdks:python:hdfsIntegrationTest
> 18:47:03 ++ dirname 
> ./apache_beam/io/hdfs_integration_test/hdfs_integration_test.sh
> 18:47:03 + TEST_DIR=./apache_beam/io/hdfs_integration_test
> 18:47:03 + ROOT_DIR=./apache_beam/io/hdfs_integration_test/../../../../..
> 18:47:03 + 
> CONTEXT_DIR=./apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration
> 18:47:03 + rm -r 
> ./apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration
> 18:47:03 rm: cannot remove 
> './apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration':
>  No such file or directory
> 18:47:03 + true
> 18:47:03 + mkdir -p 
> ./apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration/sdks
> 18:47:03 + cp ./apache_beam/io/hdfs_integration_test/docker-compose.yml 
> ./apache_beam/io/hdfs_integration_test/Dockerfile 
> ./apache_beam/io/hdfs_integration_test/hdfscli.cfg 
> ./apache_beam/io/hdfs_integration_test/hdfs_integration_test.sh 
> ./apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration/
> 18:47:03 + cp -r 
> ./apache_beam/io/hdfs_integration_test/../../../../../sdks/python 
> ./apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration/sdks/
> 18:47:03 + cp -r ./apache_beam/io/hdfs_integration_test/../../../../../model 
> ./apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration/
> 18:47:03 ++ echo hdfs_IT-jenkins-beam_PostCommit_Python_Verify_PR-714
> 18:47:03 + PROJECT_NAME=hdfs_IT-jenkins-beam_PostCommit_Python_Verify_PR-714
> 18:47:03 + '[' -z jenkins-beam_PostCommit_Python_Verify_PR-714 ']'
> 18:47:03 + COLOR_OPT=--no-ansi
> 18:47:03 + COMPOSE_OPT='-p 
> hdfs_IT-jenkins-beam_PostCommit_Python_Verify_PR-714 --no-ansi'
> 18:47:03 + cd 
> ./apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration
> 18:47:03 + docker network prune --force
> 18:47:03 + trap finally EXIT
> 18:47:03 + docker-compose -p 
> hdfs_IT-jenkins-beam_PostCommit_Python_Verify_PR-714 --no-ansi build
> 18:47:03 namenode uses an image, skipping
> 18:47:03 datanode uses an image, skipping
> 18:47:03 Building test
> 18:47:03 [29234] Failed to execute script docker-compose
> 18:47:03 Traceback (most recent call last):
> 18:47:03   File "bin/docker-compose", line 6, in 
> 18:47:03   File "compose/cli/main.py", line 71, in main
> 18:47:03   File "compose/cli/main.py", line 127, in perform_command
> 18:47:03   File "compose/cli/main.py", line 287, in build
> 18:47:03   File "compose/project.py", line 386, in build
> 18:47:03   File "compose/project.py", line 368, in build_service
> 18:47:03   File "compose/service.py", line 1084, in build
> 18:47:03   File "site-packages/docker/api/build.py", line 260, in build
> 18:47:03   File "site-packages/docker/api/build.py", line 307, in 
> _set_auth_headers
> 18:47:03   File "site-packages/docker/auth.py", line 310, in 
> get_all_credentials
> 18:47:03   File "site-packages/docker/auth.py", line 262, in 
> _resolve_authconfig_credstore
> 18:47:03   File "site-packages/docker/auth.py", line 287, in 
> _get_store_instance
> 18:47:03   File "site-packages/dockerpycreds/store.py", line 25, in __init__
> 18:47:03 dockerpycreds.errors.InitializationError: docker-credential-gcloud 
> not installed or not available in PATH
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10193) Update Jenkins VMs with docker-credential-gcloud

2020-06-04 Thread Brian Hulette (Jira)
Brian Hulette created BEAM-10193:


 Summary: Update Jenkins VMs with docker-credential-gcloud
 Key: BEAM-10193
 URL: https://issues.apache.org/jira/browse/BEAM-10193
 Project: Beam
  Issue Type: Task
  Components: build-system
Reporter: Brian Hulette


See BEAM-7405 (test failure currently resolved with an inelegant workaround) 
for motivation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8602) Always use shadow configuration for direct runner dependencies

2020-06-03 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125431#comment-17125431
 ] 

Brian Hulette commented on BEAM-8602:
-

Just put up a PR to close this out

> Always use shadow configuration for direct runner dependencies
> --
>
> Key: BEAM-8602
> URL: https://issues.apache.org/jira/browse/BEAM-8602
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql, sdk-java-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8602) Always use shadow configuration for direct runner dependencies

2020-06-03 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-8602:

Labels:   (was: stale-assigned)

> Always use shadow configuration for direct runner dependencies
> --
>
> Key: BEAM-8602
> URL: https://issues.apache.org/jira/browse/BEAM-8602
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql, sdk-java-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-10187) build_release_candidate.sh does not push tag to Github

2020-06-03 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved BEAM-10187.
--
Fix Version/s: Not applicable
   Resolution: Fixed

> build_release_candidate.sh does not push tag to Github
> --
>
> Key: BEAM-10187
> URL: https://issues.apache.org/jira/browse/BEAM-10187
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7405) Task :sdks:python:hdfsIntegrationTest is failing in Python PostCommits - docker-credential-gcloud not installed

2020-06-03 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125382#comment-17125382
 ] 

Brian Hulette commented on BEAM-7405:
-

[~udim] what is the action on this now?

> Task :sdks:python:hdfsIntegrationTest is failing in Python PostCommits - 
> docker-credential-gcloud not installed
> ---
>
> Key: BEAM-7405
> URL: https://issues.apache.org/jira/browse/BEAM-7405
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: P2
>  Labels: stale-assigned
> Fix For: 2.14.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> This failure happened on apache-beam-jenkins-14.
> {noformat}
> 18:47:03 > Task :sdks:python:hdfsIntegrationTest
> 18:47:03 ++ dirname 
> ./apache_beam/io/hdfs_integration_test/hdfs_integration_test.sh
> 18:47:03 + TEST_DIR=./apache_beam/io/hdfs_integration_test
> 18:47:03 + ROOT_DIR=./apache_beam/io/hdfs_integration_test/../../../../..
> 18:47:03 + 
> CONTEXT_DIR=./apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration
> 18:47:03 + rm -r 
> ./apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration
> 18:47:03 rm: cannot remove 
> './apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration':
>  No such file or directory
> 18:47:03 + true
> 18:47:03 + mkdir -p 
> ./apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration/sdks
> 18:47:03 + cp ./apache_beam/io/hdfs_integration_test/docker-compose.yml 
> ./apache_beam/io/hdfs_integration_test/Dockerfile 
> ./apache_beam/io/hdfs_integration_test/hdfscli.cfg 
> ./apache_beam/io/hdfs_integration_test/hdfs_integration_test.sh 
> ./apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration/
> 18:47:03 + cp -r 
> ./apache_beam/io/hdfs_integration_test/../../../../../sdks/python 
> ./apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration/sdks/
> 18:47:03 + cp -r ./apache_beam/io/hdfs_integration_test/../../../../../model 
> ./apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration/
> 18:47:03 ++ echo hdfs_IT-jenkins-beam_PostCommit_Python_Verify_PR-714
> 18:47:03 + PROJECT_NAME=hdfs_IT-jenkins-beam_PostCommit_Python_Verify_PR-714
> 18:47:03 + '[' -z jenkins-beam_PostCommit_Python_Verify_PR-714 ']'
> 18:47:03 + COLOR_OPT=--no-ansi
> 18:47:03 + COMPOSE_OPT='-p 
> hdfs_IT-jenkins-beam_PostCommit_Python_Verify_PR-714 --no-ansi'
> 18:47:03 + cd 
> ./apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration
> 18:47:03 + docker network prune --force
> 18:47:03 + trap finally EXIT
> 18:47:03 + docker-compose -p 
> hdfs_IT-jenkins-beam_PostCommit_Python_Verify_PR-714 --no-ansi build
> 18:47:03 namenode uses an image, skipping
> 18:47:03 datanode uses an image, skipping
> 18:47:03 Building test
> 18:47:03 [29234] Failed to execute script docker-compose
> 18:47:03 Traceback (most recent call last):
> 18:47:03   File "bin/docker-compose", line 6, in 
> 18:47:03   File "compose/cli/main.py", line 71, in main
> 18:47:03   File "compose/cli/main.py", line 127, in perform_command
> 18:47:03   File "compose/cli/main.py", line 287, in build
> 18:47:03   File "compose/project.py", line 386, in build
> 18:47:03   File "compose/project.py", line 368, in build_service
> 18:47:03   File "compose/service.py", line 1084, in build
> 18:47:03   File "site-packages/docker/api/build.py", line 260, in build
> 18:47:03   File "site-packages/docker/api/build.py", line 307, in 
> _set_auth_headers
> 18:47:03   File "site-packages/docker/auth.py", line 310, in 
> get_all_credentials
> 18:47:03   File "site-packages/docker/auth.py", line 262, in 
> _resolve_authconfig_credstore
> 18:47:03   File "site-packages/docker/auth.py", line 287, in 
> _get_store_instance
> 18:47:03   File "site-packages/dockerpycreds/store.py", line 25, in __init__
> 18:47:03 dockerpycreds.errors.InitializationError: docker-credential-gcloud 
> not installed or not available in PATH
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-5173) org.apache.beam.runners.fnexecution.control.RemoteExecutionTest.testExecutionWithMultipleStages is flaky

2020-06-03 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125377#comment-17125377
 ] 

Brian Hulette commented on BEAM-5173:
-

This still hasn't flaked on PreCommit since the one I linked in Nov 2018: 
https://builds.apache.org/job/beam_PreCommit_Java_Cron/2831/testReport/junit/org.apache.beam.runners.fnexecution.control/RemoteExecutionTest/testExecutionWithMultipleStages/history/

I'm closing this

> org.apache.beam.runners.fnexecution.control.RemoteExecutionTest.testExecutionWithMultipleStages
>  is flaky
> 
>
> Key: BEAM-5173
> URL: https://issues.apache.org/jira/browse/BEAM-5173
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Valentyn Tymofieiev
>Assignee: Brian Hulette
>Priority: P2
>  Labels: stale-assigned
>
> Hi [~lcwik], this test failed in a [recent postcommit 
> build|https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1285/testReport/junit/org.apache.beam.runners.fnexecution.control/RemoteExecutionTest/testExecutionWithMultipleStages].
>  Could you please take a look or help triage to the right owner? Thank you.
> Stack trace: 
> ava.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: 
> org.apache.beam.vendor.grpc.v1.io.grpc.StatusRuntimeException: CANCELLED: 
> Runner closed connection
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.beam.runners.fnexecution.control.RemoteExecutionTest.tearDown(RemoteExecutionTest.java:198)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
>   at 
> 

[jira] [Resolved] (BEAM-5173) org.apache.beam.runners.fnexecution.control.RemoteExecutionTest.testExecutionWithMultipleStages is flaky

2020-06-03 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved BEAM-5173.
-
Fix Version/s: Not applicable
   Resolution: Cannot Reproduce

> org.apache.beam.runners.fnexecution.control.RemoteExecutionTest.testExecutionWithMultipleStages
>  is flaky
> 
>
> Key: BEAM-5173
> URL: https://issues.apache.org/jira/browse/BEAM-5173
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Valentyn Tymofieiev
>Assignee: Brian Hulette
>Priority: P2
>  Labels: stale-assigned
> Fix For: Not applicable
>
>
> Hi [~lcwik], this test failed in a [recent postcommit 
> build|https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1285/testReport/junit/org.apache.beam.runners.fnexecution.control/RemoteExecutionTest/testExecutionWithMultipleStages].
>  Could you please take a look or help triage to the right owner? Thank you.
> Stack trace: 
> ava.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: 
> org.apache.beam.vendor.grpc.v1.io.grpc.StatusRuntimeException: CANCELLED: 
> Runner closed connection
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.beam.runners.fnexecution.control.RemoteExecutionTest.tearDown(RemoteExecutionTest.java:198)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
>   at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:109)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 

[jira] [Commented] (BEAM-9208) Add support for mapping columns to pubsub message attributes in flat schemas DDL

2020-06-03 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125374#comment-17125374
 ] 

Brian Hulette commented on BEAM-9208:
-

I think this should remain P2

> Add support for mapping columns to pubsub message attributes in flat schemas 
> DDL
> 
>
> Key: BEAM-9208
> URL: https://issues.apache.org/jira/browse/BEAM-9208
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Priority: P2
>
> Context: 
> https://lists.apache.org/thread.html/bf4c37f21bda194d7f8c40f6e7b9a776262415755cc1658412af3c76%40%3Cdev.beam.apache.org%3E
> The syntax should look something like this (proposed by [~alexvanboxel]):
> {code}
> CREATE TABLE people (
> my_timestamp TIMESTAMP *OPTION(ref="pubsub:event_timestamp)*,
> my_id VARCHAR *OPTION(ref="pubsub:attributes['id_name']")*,
> name VARCHAR,
> age INTEGER
>   )
>   TYPE 'pubsub'
>   LOCATION 'projects/my-project/topics/my-topic'
> {code}
> This jira pertains specifically to the my_id field in this example.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9208) Add support for mapping columns to pubsub message attributes in flat schemas DDL

2020-06-03 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-9208:

Labels:   (was: stale-P2)

> Add support for mapping columns to pubsub message attributes in flat schemas 
> DDL
> 
>
> Key: BEAM-9208
> URL: https://issues.apache.org/jira/browse/BEAM-9208
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Priority: P2
>
> Context: 
> https://lists.apache.org/thread.html/bf4c37f21bda194d7f8c40f6e7b9a776262415755cc1658412af3c76%40%3Cdev.beam.apache.org%3E
> The syntax should look something like this (proposed by [~alexvanboxel]):
> {code}
> CREATE TABLE people (
> my_timestamp TIMESTAMP *OPTION(ref="pubsub:event_timestamp)*,
> my_id VARCHAR *OPTION(ref="pubsub:attributes['id_name']")*,
> name VARCHAR,
> age INTEGER
>   )
>   TYPE 'pubsub'
>   LOCATION 'projects/my-project/topics/my-topic'
> {code}
> This jira pertains specifically to the my_id field in this example.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8330) PubSubIO.readAvros should produce a schema'd PCollection if clazz has a schema

2020-06-03 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-8330:

Labels:   (was: stale-P2)

> PubSubIO.readAvros should produce a schema'd PCollection if clazz has a schema
> --
>
> Key: BEAM-8330
> URL: https://issues.apache.org/jira/browse/BEAM-8330
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Brian Hulette
>Priority: P3
>
> Currently {{PubsubIO.readAvros(clazz)}} *always* yields a PCollection with an 
> AvroCoder. This should only be a fallback in the event that no coder can be 
> inferred. That way if we can infer a schema for `clazz` we will produce a 
> PCollection with a schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8330) PubSubIO.readAvros should produce a schema'd PCollection if clazz has a schema

2020-06-03 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-8330:

Priority: P3  (was: P2)

> PubSubIO.readAvros should produce a schema'd PCollection if clazz has a schema
> --
>
> Key: BEAM-8330
> URL: https://issues.apache.org/jira/browse/BEAM-8330
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Brian Hulette
>Priority: P3
>  Labels: stale-P2
>
> Currently {{PubsubIO.readAvros(clazz)}} *always* yields a PCollection with an 
> AvroCoder. This should only be a fallback in the event that no coder can be 
> inferred. That way if we can infer a schema for `clazz` we will produce a 
> PCollection with a schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-10181) pull_licenses script should create python3 virtualenv

2020-06-03 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved BEAM-10181.
--
Resolution: Fixed

> pull_licenses script should create python3 virtualenv
> -
>
> Key: BEAM-10181
> URL: https://issues.apache.org/jira/browse/BEAM-10181
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Setting fix version as 2.22.0 since this is preventing me from building 
> release containers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10181) pull_licenses script should create python3 virtualenv

2020-06-03 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-10181:
-
Fix Version/s: (was: 2.22.0)
   2.23.0

> pull_licenses script should create python3 virtualenv
> -
>
> Key: BEAM-10181
> URL: https://issues.apache.org/jira/browse/BEAM-10181
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Setting fix version as 2.22.0 since this is preventing me from building 
> release containers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10181) pull_licenses script should create python3 virtualenv

2020-06-02 Thread Brian Hulette (Jira)
Brian Hulette created BEAM-10181:


 Summary: pull_licenses script should create python3 virtualenv
 Key: BEAM-10181
 URL: https://issues.apache.org/jira/browse/BEAM-10181
 Project: Beam
  Issue Type: Bug
  Components: build-system
Reporter: Brian Hulette
Assignee: Brian Hulette
 Fix For: 2.22.0


Setting fix version as 2.22.0 since this is preventing me from building release 
containers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9621) Python SqlTransform follow-ups

2020-06-02 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124315#comment-17124315
 ] 

Brian Hulette commented on BEAM-9621:
-

This is a tracking jira. Not sure how to make Beam Jira Bot happy with it. 
Dropped priority to P4.

> Python SqlTransform follow-ups
> --
>
> Key: BEAM-9621
> URL: https://issues.apache.org/jira/browse/BEAM-9621
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language, dsl-sql, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P4
>
> Tracking JIRA for follow-up work to improve SqlTransform in Python



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9621) Python SqlTransform follow-ups

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-9621:

Labels:   (was: stale-assigned)

> Python SqlTransform follow-ups
> --
>
> Key: BEAM-9621
> URL: https://issues.apache.org/jira/browse/BEAM-9621
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language, dsl-sql, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P4
>
> Tracking JIRA for follow-up work to improve SqlTransform in Python



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9621) Python SqlTransform follow-ups

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-9621:

Priority: P4  (was: P2)

> Python SqlTransform follow-ups
> --
>
> Key: BEAM-9621
> URL: https://issues.apache.org/jira/browse/BEAM-9621
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language, dsl-sql, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P4
>  Labels: stale-assigned
>
> Tracking JIRA for follow-up work to improve SqlTransform in Python



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9390) [PostCommit_Java_PortabilityApi] [BigQuery related ITs] UnsupportedOperationException: BigQuery source must be split before being read

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved BEAM-9390.
-
Fix Version/s: Not applicable
   Resolution: Won't Fix

This is obsolete since the job in question was disabled in 
https://github.com/apache/beam/pull/11635

> [PostCommit_Java_PortabilityApi] [BigQuery related ITs] 
> UnsupportedOperationException: BigQuery source must be split before being read
> --
>
> Key: BEAM-9390
> URL: https://issues.apache.org/jira/browse/BEAM-9390
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Robin Qiu
>Assignee: Luke Cwik
>Priority: P2
>  Labels: currently-failing, stale-assigned
> Fix For: Not applicable
>
>
> Failed tests:
> org.apache.beam.examples.cookbook.BigQueryTornadoesIT.testE2EBigQueryTornadoesWithExport
>  
> org.apache.beam.examples.cookbook.BigQueryTornadoesIT.testE2eBigQueryTornadoesWithStorageApi
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryTimePartitioningClusteringIT.testE2EBigQueryClusteringTableFunction
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryTimePartitioningClusteringIT.testE2EBigQueryClusteringDynamicDestinations
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryTimePartitioningClusteringIT.testE2EBigQueryTimePartitioning
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryTimePartitioningClusteringIT.testE2EBigQueryClustering
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testNewTypesQueryWithReshuffle
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testLegacyQueryWithoutReshuffle
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testNewTypesQueryWithoutReshuffle
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testStandardQueryWithoutCustom
> ([https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4226/#showFailuresLink)]
>  
> Example failures:
> java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: Error received from SDK harness for instruction 
> -596: java.lang.UnsupportedOperationException: BigQuery source must be split 
> before being read at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.createReader(BigQuerySourceBase.java:173)
>  at 
> org.apache.beam.fn.harness.BoundedSourceRunner.runReadLoop(BoundedSourceRunner.java:159)
>  at 
> org.apache.beam.fn.harness.BoundedSourceRunner.start(BoundedSourceRunner.java:146)
> ...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8808) TestBigQueryOptions is never registered

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved BEAM-8808.
-
Fix Version/s: Not applicable
   Resolution: Won't Fix

I don't think this is actually a bug. Test options don't need to be registered 
since they're just set programmatically. Closing as won't fix.

> TestBigQueryOptions is never registered
> ---
>
> Key: BEAM-8808
> URL: https://issues.apache.org/jira/browse/BEAM-8808
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
>  Labels: stale-assigned
> Fix For: Not applicable
>
>
> So it's not possible to set targetDataset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8030) Make VarIntCoder overflow behavior consistent

2020-06-02 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124284#comment-17124284
 ] 

Brian Hulette commented on BEAM-8030:
-

I'd still like to take a look at this.

> Make VarIntCoder overflow behavior consistent
> -
>
> Key: BEAM-8030
> URL: https://issues.apache.org/jira/browse/BEAM-8030
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
>
> The fast version of OutputStream.write_var_int_64 (and thus 
> VarIntCoder.encode) throws OverflowError for ints larger than 64 bits, but 
> the slow version does not. We should make them both throw an error.
> We may also want to add a write_var_int_32 that uses the same format, but 
> will throw an error for ints larger than 32 bits, for use in RowCoder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8030) Make VarIntCoder overflow behavior consistent

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-8030:

Labels:   (was: stale-assigned)

> Make VarIntCoder overflow behavior consistent
> -
>
> Key: BEAM-8030
> URL: https://issues.apache.org/jira/browse/BEAM-8030
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
>
> The fast version of OutputStream.write_var_int_64 (and thus 
> VarIntCoder.encode) throws OverflowError for ints larger than 64 bits, but 
> the slow version does not. We should make them both throw an error.
> We may also want to add a write_var_int_32 that uses the same format, but 
> will throw an error for ints larger than 32 bits, for use in RowCoder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10177) Remove "Review Release Notes in JIRA"

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned BEAM-10177:


Assignee: Brian Hulette

> Remove "Review Release Notes in JIRA"
> -
>
> Key: BEAM-10177
> URL: https://issues.apache.org/jira/browse/BEAM-10177
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kyle Weaver
>Assignee: Brian Hulette
>Priority: P3
>
> Release guide: "You should verify that the issues listed automatically by 
> JIRA are appropriate to appear in the Release Notes."
> I think it's safe to remove that now since a) the volume of jiras 
> (>150/release) makes that infeasible and b) we have CHANGES.md which should 
> replace the autogenerated release notes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10177) Remove "Review Release Notes in JIRA"

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-10177:
-
Status: Open  (was: Triage Needed)

> Remove "Review Release Notes in JIRA"
> -
>
> Key: BEAM-10177
> URL: https://issues.apache.org/jira/browse/BEAM-10177
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kyle Weaver
>Priority: P3
>
> Release guide: "You should verify that the issues listed automatically by 
> JIRA are appropriate to appear in the Release Notes."
> I think it's safe to remove that now since a) the volume of jiras 
> (>150/release) makes that infeasible and b) we have CHANGES.md which should 
> replace the autogenerated release notes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10058) VideoIntelligenceMlTestIT.test_label_detection_with_video_context is flaky

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-10058:
-
Fix Version/s: (was: 2.23.0)
   2.22.0

> VideoIntelligenceMlTestIT.test_label_detection_with_video_context is flaky
> --
>
> Key: BEAM-10058
> URL: https://issues.apache.org/jira/browse/BEAM-10058
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Brian Hulette
>Assignee: Kamil Wasilewski
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Example failure: https://builds.apache.org/job/beam_PostCommit_Python37/2371/
> {code}
> Dataflow pipeline failed. State: FAILED, Error:
> Traceback (most recent call last):
>   File "apache_beam/runners/common.py", line 961, in 
> apache_beam.runners.common.DoFnRunner.process
>   File "apache_beam/runners/common.py", line 554, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/sdks/python/apache_beam/transforms/core.py",
>  line 1511, in 
> wrapper = lambda x: [fn(x)]
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/sdks/python/apache_beam/testing/util.py",
>  line 218, in _matches
> hamcrest_assert(actual, contains_inanyorder(*expected_list))
>   File "/usr/local/lib/python3.7/site-packages/hamcrest/core/assert_that.py", 
> line 44, in assert_that
> _assert_match(actual=arg1, matcher=arg2, reason=arg3)
>   File "/usr/local/lib/python3.7/site-packages/hamcrest/core/assert_that.py", 
> line 60, in _assert_match
> raise AssertionError(description)
> AssertionError: 
> Expected: a sequence over [(a sequence containing 'bicycle' and a sequence 
> containing 'dinosaur')] in any order
>  but: not matched: <['land vehicle', 'animal']>
> {code}
> At least the error is amusing :)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10024) Spark runner failing testOutputTimestampDefault

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-10024:
-
Fix Version/s: (was: 2.22.0)

> Spark runner failing testOutputTimestampDefault
> ---
>
> Key: BEAM-10024
> URL: https://issues.apache.org/jira/browse/BEAM-10024
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: currently-failing
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is causing postcommit to fail
> java.lang.UnsupportedOperationException: Found TimerId annotations on 
> org.apache.beam.sdk.transforms.ParDoTest$TimerTests$12, but DoFn cannot yet 
> be used with timers in the SparkRunner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10050) VideoIntelligenceIT.annotateVideoFromURINoContext is flaky

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-10050:
-
Fix Version/s: (was: Not applicable)
   2.22.0

> VideoIntelligenceIT.annotateVideoFromURINoContext is flaky
> --
>
> Key: BEAM-10050
> URL: https://issues.apache.org/jira/browse/BEAM-10050
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Michał Walenia
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> I've seen this fail a few times in precommits [Example 
> failure|https://builds.apache.org/job/beam_PreCommit_Java_Commit/11515/]
> {code}
> java.lang.AssertionError: Annotate 
> video/ParDo(AnnotateVideoFromURI)/ParMultiDo(AnnotateVideoFromURI).output: 
> expected: but was:
>   at 
> org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:169)
>   at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:411)
>   at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:403)
>   at 
> org.apache.beam.sdk.extensions.ml.VideoIntelligenceIT.annotateVideoFromURINoContext(VideoIntelligenceIT.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10143) ClassCastException in GROUP BY with non-global window

2020-06-02 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124172#comment-17124172
 ] 

Brian Hulette commented on BEAM-10143:
--

There are some tests with Window.into and SqlTransform in Java: 
https://github.com/apache/beam/blob/9c16b898f0c90e83d74f5ac1a0d5b8853f872ebb/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslJoinTest.java#L158-L180

> ClassCastException in GROUP BY with non-global window
> -
>
> Key: BEAM-10143
> URL: https://issues.apache.org/jira/browse/BEAM-10143
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql, sdk-py-core
>Reporter: Maximilian Michels
>Priority: P1
>
> I'm using the SqlTransform as an external transform from within a Python
> pipeline. I apply windowing before a GROUP BY query as mentioned as the first 
> option in 
> https://beam.apache.org/documentation/dsls/sql/extensions/windowing-and-triggering/:
> {code:python}
>   input
>   | "Window" >> beam.WindowInto(window.FixedWindows(30))
>   | "Aggregate" >>
>   SqlTransform("""Select field, count(field) from PCOLLECTION
>   WHERE ...
> GROUP BY field
>""")
> {code}
> This results in an exception:
> {noformat}
> Caused by: java.lang.ClassCastException: 
> org.apache.beam.sdk.transforms.windowing.IntervalWindow cannot be cast to 
> org.apache.beam.sdk.transforms.windowing.GlobalWindow
>   at 
> org.apache.beam.sdk.transforms.windowing.GlobalWindow$Coder.encode(GlobalWindow.java:59)
>   at 
> org.apache.beam.sdk.coders.IterableLikeCoder.encode(IterableLikeCoder.java:98)
>   at 
> org.apache.beam.sdk.coders.IterableLikeCoder.encode(IterableLikeCoder.java:60)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:588)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541)
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109)
>   at 
> org.apache.beam.fn.harness.BeamFnDataWriteRunner.consume(BeamFnDataWriteRunner.java:154)
>   at 
> org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:216)
>   at 
> org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:179)
>   at 
> org.apache.beam.runners.fnexecution.control.FnApiControlClient$ResponseStreamObserver.onNext(FnApiControlClient.java:178)
>   at 
> org.apache.beam.runners.fnexecution.control.FnApiControlClient$ResponseStreamObserver.onNext(FnApiControlClient.java:158)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:251)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Contexts$ContextualizedServerCallListener.onMessage(Contexts.java:76)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:309)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:292)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:782)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   ... 1 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9120) Deprecate onSuccessMatcher, onCreateMatcher

2020-06-02 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124102#comment-17124102
 ] 

Brian Hulette commented on BEAM-9120:
-

Lowering priority on this. I'd still like to work on it.

I've removed all usages in Beam code. I think the action now is just to mark 
these options deprecated and remove them after a release or two.

> Deprecate onSuccessMatcher, onCreateMatcher
> ---
>
> Key: BEAM-9120
> URL: https://issues.apache.org/jira/browse/BEAM-9120
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P3
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Instead of creating matchers on PipelineResult we should just make assertions 
> on real matchers after waiting for the pipeline to finish.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9120) Deprecate onSuccessMatcher, onCreateMatcher

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-9120:

Labels:   (was: stale-assigned)

> Deprecate onSuccessMatcher, onCreateMatcher
> ---
>
> Key: BEAM-9120
> URL: https://issues.apache.org/jira/browse/BEAM-9120
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P3
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Instead of creating matchers on PipelineResult we should just make assertions 
> on real matchers after waiting for the pipeline to finish.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9120) Deprecate onSuccessMatcher, onCreateMatcher

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-9120:

Priority: P3  (was: P2)

> Deprecate onSuccessMatcher, onCreateMatcher
> ---
>
> Key: BEAM-9120
> URL: https://issues.apache.org/jira/browse/BEAM-9120
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P3
>  Labels: stale-assigned
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Instead of creating matchers on PipelineResult we should just make assertions 
> on real matchers after waiting for the pipeline to finish.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-4637) Flaky post-commit test org.apache.beam.examples.subprocess.ExampleEchoPipelineTest.testExampleEchoPipeline

2020-06-02 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-4637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124083#comment-17124083
 ] 

Brian Hulette commented on BEAM-4637:
-

Closing this as duplicate of BEAM-5286

> Flaky post-commit test 
> org.apache.beam.examples.subprocess.ExampleEchoPipelineTest.testExampleEchoPipeline
> --
>
> Key: BEAM-4637
> URL: https://issues.apache.org/jira/browse/BEAM-4637
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-java
>Reporter: Mikhail Gryzykhin
>Assignee: Batkhuyag Batsaikhan
>Priority: P2
>  Labels: flake, stale-assigned
> Fix For: Not applicable
>
>
> Post commit test failed with "Text file busy" exception.
> [https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/916/testReport/junit/org.apache.beam.examples.subprocess/ExampleEchoPipelineTest/testExampleEchoPipeline/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-4637) Flaky post-commit test org.apache.beam.examples.subprocess.ExampleEchoPipelineTest.testExampleEchoPipeline

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette closed BEAM-4637.
---
Fix Version/s: (was: 2.6.0)
   Not applicable
   Resolution: Duplicate

> Flaky post-commit test 
> org.apache.beam.examples.subprocess.ExampleEchoPipelineTest.testExampleEchoPipeline
> --
>
> Key: BEAM-4637
> URL: https://issues.apache.org/jira/browse/BEAM-4637
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-java
>Reporter: Mikhail Gryzykhin
>Assignee: Batkhuyag Batsaikhan
>Priority: P2
>  Labels: flake, stale-assigned
> Fix For: Not applicable
>
>
> Post commit test failed with "Text file busy" exception.
> [https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/916/testReport/junit/org.apache.beam.examples.subprocess/ExampleEchoPipelineTest/testExampleEchoPipeline/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8741) Queries that attempt to write to pubsub publish time should fail at construction time

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-8741:

Labels:   (was: stale-P2)

> Queries that attempt to write to pubsub publish time should fail at 
> construction time
> -
>
> Key: BEAM-8741
> URL: https://issues.apache.org/jira/browse/BEAM-8741
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Priority: P2
>
> Currently it's possible to perform a query like:
> {code:sql}
> CREATE TABLE pubsub (
>   event_timestamp TIMESTAMP,
>   id VARCHAR
> ) ...
> INSERT INTO pubsub (event_timestamp, id) VALUES (...)
> {code}
> But when this is executed, the event_timestamp will be dropped, because on 
> read it will be instead be populated with pubsub's publish time.
> A couple of ideas:
> - We could indicate that this is a VIRTUAL GENERATED column, and is therefore 
> read-only. Calcite seems to have some support for this concept, see 
> [ColumnStrategy.java|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/schema/ColumnStrategy.java].
> - We could just throw an exception in the Pubsub JSON Table Provider if the 
> query's output schema contains event_timestamp.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8741) Queries that attempt to write to pubsub publish time should fail at construction time

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-8741:

Description: 
Currently it's possible to perform a query like:

{code:sql}
CREATE TABLE pubsub (
  event_timestamp TIMESTAMP,
  id VARCHAR
) ...

INSERT INTO pubsub (event_timestamp, id) VALUES (...)
{code}

But when this is executed, the event_timestamp will be dropped, because on read 
it will be instead be populated with pubsub's publish time.

A couple of ideas:
- We could indicate that this is a VIRTUAL GENERATED column, and is therefore 
read-only. Calcite seems to have some support for this concept, see 
[ColumnStrategy.java|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/schema/ColumnStrategy.java].
- We could just throw an exception in the Pubsub JSON Table Provider if the 
query's output schema contains event_timestamp.

  was:
Currently it's possible to perform a query like:

{code:sql}
CREATE TABLE pubsub (
  event_timestamp TIMESTAMP,
  id VARCHAR
) ...

INSERT INTO pubsub (event_timestamp, id) VALUES (...)
{code}

But when this is executed, the event_timestamp will be dropped, because on read 
it will be instead be populated with pubsub's publish time.

We should somehow indicate that this is a VIRTUAL GENERATED column, and is 
therefore read-only. Calcite seems to have some support for this concept, see 
[ColumnStrategy.java|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/schema/ColumnStrategy.java].


> Queries that attempt to write to pubsub publish time should fail at 
> construction time
> -
>
> Key: BEAM-8741
> URL: https://issues.apache.org/jira/browse/BEAM-8741
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Priority: P2
>  Labels: stale-P2
>
> Currently it's possible to perform a query like:
> {code:sql}
> CREATE TABLE pubsub (
>   event_timestamp TIMESTAMP,
>   id VARCHAR
> ) ...
> INSERT INTO pubsub (event_timestamp, id) VALUES (...)
> {code}
> But when this is executed, the event_timestamp will be dropped, because on 
> read it will be instead be populated with pubsub's publish time.
> A couple of ideas:
> - We could indicate that this is a VIRTUAL GENERATED column, and is therefore 
> read-only. Calcite seems to have some support for this concept, see 
> [ColumnStrategy.java|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/schema/ColumnStrategy.java].
> - We could just throw an exception in the Pubsub JSON Table Provider if the 
> query's output schema contains event_timestamp.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8741) Queries that attempt to write to pubsub publish time should fail at construction time

2020-06-02 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124076#comment-17124076
 ] 

Brian Hulette commented on BEAM-8741:
-

Still P2

> Queries that attempt to write to pubsub publish time should fail at 
> construction time
> -
>
> Key: BEAM-8741
> URL: https://issues.apache.org/jira/browse/BEAM-8741
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Priority: P2
>  Labels: stale-P2
>
> Currently it's possible to perform a query like:
> {code:sql}
> CREATE TABLE pubsub (
>   event_timestamp TIMESTAMP,
>   id VARCHAR
> ) ...
> INSERT INTO pubsub (event_timestamp, id) VALUES (...)
> {code}
> But when this is executed, the event_timestamp will be dropped, because on 
> read it will be instead be populated with pubsub's publish time.
> A couple of ideas:
> - We could indicate that this is a VIRTUAL GENERATED column, and is therefore 
> read-only. Calcite seems to have some support for this concept, see 
> [ColumnStrategy.java|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/schema/ColumnStrategy.java].
> - We could just throw an exception in the Pubsub JSON Table Provider if the 
> query's output schema contains event_timestamp.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8741) Queries that attempt to write to pubsub publish time should fail at construction time

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-8741:

Summary: Queries that attempt to write to pubsub publish time should fail 
at construction time  (was: Queries that attempt to write to pubsub publish 
time should fail)

> Queries that attempt to write to pubsub publish time should fail at 
> construction time
> -
>
> Key: BEAM-8741
> URL: https://issues.apache.org/jira/browse/BEAM-8741
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Priority: P2
>  Labels: stale-P2
>
> Currently it's possible to perform a query like:
> {code:sql}
> CREATE TABLE pubsub (
>   event_timestamp TIMESTAMP,
>   id VARCHAR
> ) ...
> INSERT INTO pubsub (event_timestamp, id) VALUES (...)
> {code}
> But when this is executed, the event_timestamp will be dropped, because on 
> read it will be instead be populated with pubsub's publish time.
> We should somehow indicate that this is a VIRTUAL GENERATED column, and is 
> therefore read-only. Calcite seems to have some support for this concept, see 
> [ColumnStrategy.java|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/schema/ColumnStrategy.java].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8741) Queries that attempt to write to pubsub publish time should fail

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-8741:

Status: Open  (was: Triage Needed)

> Queries that attempt to write to pubsub publish time should fail
> 
>
> Key: BEAM-8741
> URL: https://issues.apache.org/jira/browse/BEAM-8741
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Priority: P2
>  Labels: stale-P2
>
> Currently it's possible to perform a query like:
> {code:sql}
> CREATE TABLE pubsub (
>   event_timestamp TIMESTAMP,
>   id VARCHAR
> ) ...
> INSERT INTO pubsub (event_timestamp, id) VALUES (...)
> {code}
> But when this is executed, the event_timestamp will be dropped, because on 
> read it will be instead be populated with pubsub's publish time.
> We should somehow indicate that this is a VIRTUAL GENERATED column, and is 
> therefore read-only. Calcite seems to have some support for this concept, see 
> [ColumnStrategy.java|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/schema/ColumnStrategy.java].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-5286) [beam_PostCommit_Java_GradleBuild][org.apache.beam.examples.subprocess.ExampleEchoPipelineTest.testExampleEchoPipeline][Flake] .sh script: text file busy.

2020-06-02 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124070#comment-17124070
 ] 

Brian Hulette commented on BEAM-5286:
-

Raised to P1 per https://beam.apache.org/contribute/jira-priorities/

> [beam_PostCommit_Java_GradleBuild][org.apache.beam.examples.subprocess.ExampleEchoPipelineTest.testExampleEchoPipeline][Flake]
>  .sh script: text file busy.
> --
>
> Key: BEAM-5286
> URL: https://issues.apache.org/jira/browse/BEAM-5286
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Alan Myrvold
>Priority: P1
>  Labels: flake, stale-assigned
> Fix For: Not applicable
>
>
> Sample failure: 
> [https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1375/testReport/junit/org.apache.beam.examples.subprocess/ExampleEchoPipelineTest/testExampleEchoPipeline/]
> Sample relevant log:
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.Exception: 
> java.io.IOException: Cannot run program 
> "/tmp/test-Echoo1519764280436328522/test-EchoAgain3143210610074994370.sh": 
> error=26, Text file busy



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-5286) [beam_PostCommit_Java_GradleBuild][org.apache.beam.examples.subprocess.ExampleEchoPipelineTest.testExampleEchoPipeline][Flake] .sh script: text file busy.

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-5286:

Priority: P1  (was: P2)

> [beam_PostCommit_Java_GradleBuild][org.apache.beam.examples.subprocess.ExampleEchoPipelineTest.testExampleEchoPipeline][Flake]
>  .sh script: text file busy.
> --
>
> Key: BEAM-5286
> URL: https://issues.apache.org/jira/browse/BEAM-5286
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Alan Myrvold
>Priority: P1
>  Labels: flake, stale-assigned
> Fix For: Not applicable
>
>
> Sample failure: 
> [https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1375/testReport/junit/org.apache.beam.examples.subprocess/ExampleEchoPipelineTest/testExampleEchoPipeline/]
> Sample relevant log:
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.Exception: 
> java.io.IOException: Cannot run program 
> "/tmp/test-Echoo1519764280436328522/test-EchoAgain3143210610074994370.sh": 
> error=26, Text file busy



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-5286) [beam_PostCommit_Java_GradleBuild][org.apache.beam.examples.subprocess.ExampleEchoPipelineTest.testExampleEchoPipeline][Flake] .sh script: text file busy.

2020-06-02 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124068#comment-17124068
 ] 

Brian Hulette commented on BEAM-5286:
-

Looks like this is still happening. Here's a failure from June 1: 
https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PreCommit_Java_Commit/11635/testReport/org.apache.beam.examples.subprocess/ExampleEchoPipelineTest/testExampleEchoPipeline/

[~alanmyrvold] do you have time to look at it?

> [beam_PostCommit_Java_GradleBuild][org.apache.beam.examples.subprocess.ExampleEchoPipelineTest.testExampleEchoPipeline][Flake]
>  .sh script: text file busy.
> --
>
> Key: BEAM-5286
> URL: https://issues.apache.org/jira/browse/BEAM-5286
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Alan Myrvold
>Priority: P2
>  Labels: flake, stale-assigned
> Fix For: Not applicable
>
>
> Sample failure: 
> [https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1375/testReport/junit/org.apache.beam.examples.subprocess/ExampleEchoPipelineTest/testExampleEchoPipeline/]
> Sample relevant log:
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.Exception: 
> java.io.IOException: Cannot run program 
> "/tmp/test-Echoo1519764280436328522/test-EchoAgain3143210610074994370.sh": 
> error=26, Text file busy



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8560) Construction time schema Validation for writing to BQ

2020-06-02 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124055#comment-17124055
 ] 

Brian Hulette commented on BEAM-8560:
-

Went ahead and unassigned myself since I don't have time for this. Downgrading 
to P3 as well

> Construction time schema Validation for writing to BQ
> -
>
> Key: BEAM-8560
> URL: https://issues.apache.org/jira/browse/BEAM-8560
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Priority: P2
>
> `BigQueryIO.write` should be able to validate the schema of the input 
> PCollection against the schema for this table (either retrieved from 
> DataCatalog, or from BQ metadata). Then we can fail at pipeline construction 
> time rather than waiting until execution.
> At the very least this should work when writing Beam Rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8560) Construction time schema Validation for writing to BQ

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-8560:

Priority: P3  (was: P2)

> Construction time schema Validation for writing to BQ
> -
>
> Key: BEAM-8560
> URL: https://issues.apache.org/jira/browse/BEAM-8560
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Priority: P3
>
> `BigQueryIO.write` should be able to validate the schema of the input 
> PCollection against the schema for this table (either retrieved from 
> DataCatalog, or from BQ metadata). Then we can fail at pipeline construction 
> time rather than waiting until execution.
> At the very least this should work when writing Beam Rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8560) Construction time schema Validation for writing to BQ

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-8560:

Description: 
`BigQueryIO.write` should be able to validate the schema of the input 
PCollection against the schema for this table (either retrieved from 
DataCatalog, or from BQ metadata). Then we can fail at pipeline construction 
time rather than waiting until execution.

At the very least this should work when writing Beam Rows.

  was:
`BigQueryIO.write` should be able to validate the schema of the input 
PCollection against the schema for this table in Data Catalog. Then we can fail 
at pipeline construction time rather than waiting until execution.

At the very least this should work when writing Beam Rows.


> Construction time schema Validation for writing to BQ
> -
>
> Key: BEAM-8560
> URL: https://issues.apache.org/jira/browse/BEAM-8560
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
>  Labels: stale-assigned
>
> `BigQueryIO.write` should be able to validate the schema of the input 
> PCollection against the schema for this table (either retrieved from 
> DataCatalog, or from BQ metadata). Then we can fail at pipeline construction 
> time rather than waiting until execution.
> At the very least this should work when writing Beam Rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8560) Construction time schema Validation for writing to BQ

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-8560:

Labels:   (was: stale-assigned)

> Construction time schema Validation for writing to BQ
> -
>
> Key: BEAM-8560
> URL: https://issues.apache.org/jira/browse/BEAM-8560
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
>
> `BigQueryIO.write` should be able to validate the schema of the input 
> PCollection against the schema for this table (either retrieved from 
> DataCatalog, or from BQ metadata). Then we can fail at pipeline construction 
> time rather than waiting until execution.
> At the very least this should work when writing Beam Rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8560) Construction time schema Validation for writing to BQ

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned BEAM-8560:
---

Assignee: (was: Brian Hulette)

> Construction time schema Validation for writing to BQ
> -
>
> Key: BEAM-8560
> URL: https://issues.apache.org/jira/browse/BEAM-8560
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Priority: P2
>
> `BigQueryIO.write` should be able to validate the schema of the input 
> PCollection against the schema for this table (either retrieved from 
> DataCatalog, or from BQ metadata). Then we can fail at pipeline construction 
> time rather than waiting until execution.
> At the very least this should work when writing Beam Rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8438) Update Python/Streaming IO Documentation

2020-06-02 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124031#comment-17124031
 ] 

Brian Hulette commented on BEAM-8438:
-

I think this should still be P2. 

> Update Python/Streaming IO Documentation
> 
>
> Key: BEAM-8438
> URL: https://issues.apache.org/jira/browse/BEAM-8438
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Brian Hulette
>Priority: P2
>  Labels: stale-P2
>
> Built-in IO documentation states that Python/Streaming only supports pubsub 
> and BQ, which is out of date.
> https://beam.apache.org/documentation/io/built-in/
> This came up on 
> [slack|https://the-asf.slack.com/archives/CBDNLQZM1/p157141041000]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8438) Update Python/Streaming IO Documentation

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-8438:

Labels:   (was: stale-P2)

> Update Python/Streaming IO Documentation
> 
>
> Key: BEAM-8438
> URL: https://issues.apache.org/jira/browse/BEAM-8438
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Brian Hulette
>Priority: P2
>
> Built-in IO documentation states that Python/Streaming only supports pubsub 
> and BQ, which is out of date.
> https://beam.apache.org/documentation/io/built-in/
> This came up on 
> [slack|https://the-asf.slack.com/archives/CBDNLQZM1/p157141041000]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-7306) [SQL] Add support for distinct aggregations

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-7306:

Labels:   (was: stale-P2)

> [SQL] Add support for distinct aggregations
> ---
>
> Key: BEAM-7306
> URL: https://issues.apache.org/jira/browse/BEAM-7306
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Brian Hulette
>Priority: P3
>
> Currently we reject aggregations with the DISTINCT flag set: 
> [https://github.com/apache/beam/pull/8498]
> We should provide support for these aggregations in a scalable way. See the 
> ML discussion on this topic here: 
> [https://lists.apache.org/thread.html/24081b0d0b7f9709a5c0f574149fb6b9e9759cba06734200cf3810bf@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-7306) [SQL] Add support for distinct aggregations

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-7306:

Priority: P3  (was: P2)

> [SQL] Add support for distinct aggregations
> ---
>
> Key: BEAM-7306
> URL: https://issues.apache.org/jira/browse/BEAM-7306
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Brian Hulette
>Priority: P3
>  Labels: stale-P2
>
> Currently we reject aggregations with the DISTINCT flag set: 
> [https://github.com/apache/beam/pull/8498]
> We should provide support for these aggregations in a scalable way. See the 
> ML discussion on this topic here: 
> [https://lists.apache.org/thread.html/24081b0d0b7f9709a5c0f574149fb6b9e9759cba06734200cf3810bf@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8732) Add support for additional structured types to Schemas/RowCoders

2020-06-02 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123985#comment-17123985
 ] 

Brian Hulette commented on BEAM-8732:
-

This is still a P2.

Another thing we should think about here: schemas for functions that return 
tuples: e.g. {{beam.Map(lambda row: (row.foo, row.bar+row.baz))}}. We need to 
answer some questions there though:
* Where do field names come from?
** Could be generated: f0, f1, f2, ..
** We'd want some way for users to override those names though. Perhaps 
something like {{.with_output_field_names('foo', 'bar_and_baz')}}, or we could 
leverage with_output_type somehow, or we could give users a "magic" function 
for assigning names, like {{lambda row: Row(foo=row.foo, 
bar_and_baz=row.bar+row.baz))}}
* Where do field types come from?
** Could get them from typehints, or some kind of static analysis. Throw some 
helpful error if we can't infer.

> Add support for additional structured types to Schemas/RowCoders
> 
>
> Key: BEAM-8732
> URL: https://issues.apache.org/jira/browse/BEAM-8732
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Priority: P2
>
> Currently we can convert between a {{NamedTuple}} type and its {{Schema}} 
> protos using {{named_tuple_from_schema}} and {{named_tuple_to_schema}}. I'd 
> like to introduce a system to support additional types, starting with 
> structured types like {{attrs}}, {{dataclasses}}, and {{TypedDict}}.
> I've only just started digesting the code, but this task seems pretty 
> straightforward. For example, I think the type-to-schema code would look 
> roughly like this:
> {code:python}
> def typing_to_runner_api(type_):
>   # type: (Type) -> schema_pb2.FieldType
>   structured_handler = _get_structured_handler(type_)
>   if structured_handler:
> schema = None
> if hasattr(type_, 'id'):
>   schema = SCHEMA_REGISTRY.get_schema_by_id(type_.id)
> if schema is None:
>   fields = structured_handler.get_fields()
>   type_id = str(uuid4())
>   schema = schema_pb2.Schema(fields=fields, id=type_id)
>   SCHEMA_REGISTRY.add(type_, schema)
> return schema_pb2.FieldType(
> row_type=schema_pb2.RowType(
> schema=schema))
> {code}
> The rest of the work would be in implementing a class hierarchy for working 
> with structured types, such as getting a list of fields from an instance, and 
> instantiation from a list of fields. Eventually we can extend this behavior 
> to arbitrary, unstructured types.  
> Going in the schema-to-type direction, we have the problem of choosing which 
> type to use for a given schema. I believe that as long as 
> {{typing_to_runner_api()}} has been called on our structured type in the 
> current python session, it should be added to the registry and thus round 
> trip ok, so I think we just need a public function for registering schemas 
> for structured types.
> [~bhulette] Did you want to tackle this or are you ok with me going after it?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8732) Add support for additional structured types to Schemas/RowCoders

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-8732:

Labels:   (was: stale-P2)

> Add support for additional structured types to Schemas/RowCoders
> 
>
> Key: BEAM-8732
> URL: https://issues.apache.org/jira/browse/BEAM-8732
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Priority: P2
>
> Currently we can convert between a {{NamedTuple}} type and its {{Schema}} 
> protos using {{named_tuple_from_schema}} and {{named_tuple_to_schema}}. I'd 
> like to introduce a system to support additional types, starting with 
> structured types like {{attrs}}, {{dataclasses}}, and {{TypedDict}}.
> I've only just started digesting the code, but this task seems pretty 
> straightforward. For example, I think the type-to-schema code would look 
> roughly like this:
> {code:python}
> def typing_to_runner_api(type_):
>   # type: (Type) -> schema_pb2.FieldType
>   structured_handler = _get_structured_handler(type_)
>   if structured_handler:
> schema = None
> if hasattr(type_, 'id'):
>   schema = SCHEMA_REGISTRY.get_schema_by_id(type_.id)
> if schema is None:
>   fields = structured_handler.get_fields()
>   type_id = str(uuid4())
>   schema = schema_pb2.Schema(fields=fields, id=type_id)
>   SCHEMA_REGISTRY.add(type_, schema)
> return schema_pb2.FieldType(
> row_type=schema_pb2.RowType(
> schema=schema))
> {code}
> The rest of the work would be in implementing a class hierarchy for working 
> with structured types, such as getting a list of fields from an instance, and 
> instantiation from a list of fields. Eventually we can extend this behavior 
> to arbitrary, unstructured types.  
> Going in the schema-to-type direction, we have the problem of choosing which 
> type to use for a given schema. I believe that as long as 
> {{typing_to_runner_api()}} has been called on our structured type in the 
> current python session, it should be added to the registry and thus round 
> trip ok, so I think we just need a public function for registering schemas 
> for structured types.
> [~bhulette] Did you want to tackle this or are you ok with me going after it?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8732) Add support for additional structured types to Schemas/RowCoders

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-8732:

Status: Open  (was: Triage Needed)

> Add support for additional structured types to Schemas/RowCoders
> 
>
> Key: BEAM-8732
> URL: https://issues.apache.org/jira/browse/BEAM-8732
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Priority: P2
>  Labels: stale-P2
>
> Currently we can convert between a {{NamedTuple}} type and its {{Schema}} 
> protos using {{named_tuple_from_schema}} and {{named_tuple_to_schema}}. I'd 
> like to introduce a system to support additional types, starting with 
> structured types like {{attrs}}, {{dataclasses}}, and {{TypedDict}}.
> I've only just started digesting the code, but this task seems pretty 
> straightforward. For example, I think the type-to-schema code would look 
> roughly like this:
> {code:python}
> def typing_to_runner_api(type_):
>   # type: (Type) -> schema_pb2.FieldType
>   structured_handler = _get_structured_handler(type_)
>   if structured_handler:
> schema = None
> if hasattr(type_, 'id'):
>   schema = SCHEMA_REGISTRY.get_schema_by_id(type_.id)
> if schema is None:
>   fields = structured_handler.get_fields()
>   type_id = str(uuid4())
>   schema = schema_pb2.Schema(fields=fields, id=type_id)
>   SCHEMA_REGISTRY.add(type_, schema)
> return schema_pb2.FieldType(
> row_type=schema_pb2.RowType(
> schema=schema))
> {code}
> The rest of the work would be in implementing a class hierarchy for working 
> with structured types, such as getting a list of fields from an instance, and 
> instantiation from a list of fields. Eventually we can extend this behavior 
> to arbitrary, unstructured types.  
> Going in the schema-to-type direction, we have the problem of choosing which 
> type to use for a given schema. I believe that as long as 
> {{typing_to_runner_api()}} has been called on our structured type in the 
> current python session, it should be added to the registry and thus round 
> trip ok, so I think we just need a public function for registering schemas 
> for structured types.
> [~bhulette] Did you want to tackle this or are you ok with me going after it?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10143) ClassCastException in GROUP BY with non-global window

2020-06-02 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123961#comment-17123961
 ] 

Brian Hulette commented on BEAM-10143:
--

[~mxm] you haven't seen this happen from Java right? I'm assuming it's a 
python-specific issue.

> ClassCastException in GROUP BY with non-global window
> -
>
> Key: BEAM-10143
> URL: https://issues.apache.org/jira/browse/BEAM-10143
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql, sdk-py-core
>Reporter: Maximilian Michels
>Priority: P1
>
> I'm using the SqlTransform as an external transform from within a Python
> pipeline. I apply windowing before a GROUP BY query as mentioned as the first 
> option in 
> https://beam.apache.org/documentation/dsls/sql/extensions/windowing-and-triggering/:
> {code:python}
>   input
>   | "Window" >> beam.WindowInto(window.FixedWindows(30))
>   | "Aggregate" >>
>   SqlTransform("""Select field, count(field) from PCOLLECTION
>   WHERE ...
> GROUP BY field
>""")
> {code}
> This results in an exception:
> {noformat}
> Caused by: java.lang.ClassCastException: 
> org.apache.beam.sdk.transforms.windowing.IntervalWindow cannot be cast to 
> org.apache.beam.sdk.transforms.windowing.GlobalWindow
>   at 
> org.apache.beam.sdk.transforms.windowing.GlobalWindow$Coder.encode(GlobalWindow.java:59)
>   at 
> org.apache.beam.sdk.coders.IterableLikeCoder.encode(IterableLikeCoder.java:98)
>   at 
> org.apache.beam.sdk.coders.IterableLikeCoder.encode(IterableLikeCoder.java:60)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:588)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541)
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109)
>   at 
> org.apache.beam.fn.harness.BeamFnDataWriteRunner.consume(BeamFnDataWriteRunner.java:154)
>   at 
> org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:216)
>   at 
> org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:179)
>   at 
> org.apache.beam.runners.fnexecution.control.FnApiControlClient$ResponseStreamObserver.onNext(FnApiControlClient.java:178)
>   at 
> org.apache.beam.runners.fnexecution.control.FnApiControlClient$ResponseStreamObserver.onNext(FnApiControlClient.java:158)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:251)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Contexts$ContextualizedServerCallListener.onMessage(Contexts.java:76)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:309)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:292)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:782)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   ... 1 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10143) ClassCastException in GROUP BY with non-global window

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-10143:
-
Component/s: sdk-py-core

> ClassCastException in GROUP BY with non-global window
> -
>
> Key: BEAM-10143
> URL: https://issues.apache.org/jira/browse/BEAM-10143
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql, sdk-py-core
>Reporter: Maximilian Michels
>Priority: P1
>
> I'm using the SqlTransform as an external transform from within a Python
> pipeline. I apply windowing before a GROUP BY query as mentioned as the first 
> option in 
> https://beam.apache.org/documentation/dsls/sql/extensions/windowing-and-triggering/:
> {code:python}
>   input
>   | "Window" >> beam.WindowInto(window.FixedWindows(30))
>   | "Aggregate" >>
>   SqlTransform("""Select field, count(field) from PCOLLECTION
>   WHERE ...
> GROUP BY field
>""")
> {code}
> This results in an exception:
> {noformat}
> Caused by: java.lang.ClassCastException: 
> org.apache.beam.sdk.transforms.windowing.IntervalWindow cannot be cast to 
> org.apache.beam.sdk.transforms.windowing.GlobalWindow
>   at 
> org.apache.beam.sdk.transforms.windowing.GlobalWindow$Coder.encode(GlobalWindow.java:59)
>   at 
> org.apache.beam.sdk.coders.IterableLikeCoder.encode(IterableLikeCoder.java:98)
>   at 
> org.apache.beam.sdk.coders.IterableLikeCoder.encode(IterableLikeCoder.java:60)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:588)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541)
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109)
>   at 
> org.apache.beam.fn.harness.BeamFnDataWriteRunner.consume(BeamFnDataWriteRunner.java:154)
>   at 
> org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:216)
>   at 
> org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:179)
>   at 
> org.apache.beam.runners.fnexecution.control.FnApiControlClient$ResponseStreamObserver.onNext(FnApiControlClient.java:178)
>   at 
> org.apache.beam.runners.fnexecution.control.FnApiControlClient$ResponseStreamObserver.onNext(FnApiControlClient.java:158)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:251)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Contexts$ContextualizedServerCallListener.onMessage(Contexts.java:76)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:309)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:292)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:782)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>   at 
> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   ... 1 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9547) Raise NotImplementedError for remaining pandas methods

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-9547:

Labels:   (was: stale-assigned)

> Raise NotImplementedError for remaining pandas methods
> --
>
> Key: BEAM-9547
> URL: https://issues.apache.org/jira/browse/BEAM-9547
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Robert Bradshaw
>Priority: P2
>
> We should have an implementation for every DataFrame, Series, and GroupBy 
> method. Everything that's not actually implemented should get a default 
> implementation that raises NotImplementedError.
> See https://github.com/apache/beam/pull/10757#discussion_r389132292



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9547) Raise NotImplementedError for remaining pandas methods

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned BEAM-9547:
---

Assignee: Robert Bradshaw  (was: Brian Hulette)

> Raise NotImplementedError for remaining pandas methods
> --
>
> Key: BEAM-9547
> URL: https://issues.apache.org/jira/browse/BEAM-9547
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Robert Bradshaw
>Priority: P2
>  Labels: stale-assigned
>
> We should have an implementation for every DataFrame, Series, and GroupBy 
> method. Everything that's not actually implemented should get a default 
> implementation that raises NotImplementedError.
> See https://github.com/apache/beam/pull/10757#discussion_r389132292



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8828) BigQueryTableProvider should allow configuration of write disposition

2020-06-02 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123940#comment-17123940
 ] 

Brian Hulette commented on BEAM-8828:
-

Still P2

> BigQueryTableProvider should allow configuration of write disposition
> -
>
> Key: BEAM-8828
> URL: https://issues.apache.org/jira/browse/BEAM-8828
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Priority: P2
>
> It should be possible to set BigQueryIO's 
> [writeDisposition|https://github.com/apache/beam/blob/b446304f75078ca9c97437e685409c31ceab7503/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L2122-L2125]
>  in a Beam SQL big query table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8828) BigQueryTableProvider should allow configuration of write disposition

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-8828:

Labels:   (was: stale-P2)

> BigQueryTableProvider should allow configuration of write disposition
> -
>
> Key: BEAM-8828
> URL: https://issues.apache.org/jira/browse/BEAM-8828
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Priority: P2
>
> It should be possible to set BigQueryIO's 
> [writeDisposition|https://github.com/apache/beam/blob/b446304f75078ca9c97437e685409c31ceab7503/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L2122-L2125]
>  in a Beam SQL big query table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-7345) Add support for generics in schema inference

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-7345:

Labels:   (was: stale-P2)

> Add support for generics in schema inference
> 
>
> Key: BEAM-7345
> URL: https://issues.apache.org/jira/browse/BEAM-7345
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Priority: P3
>
> Currently schema inference doesn't work for getters that return a 
> parameterized type. Fixing this would most likely involve plumbing 
> TypeDescriptor through FieldValueTypeSupplier, FieldValueTypeInformation, 
> StaticSchemaInference, etc.. rather than Class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7345) Add support for generics in schema inference

2020-06-02 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123936#comment-17123936
 ] 

Brian Hulette commented on BEAM-7345:
-

I think this is legitimately a P3 "nice-to-have"

cc: [~kenn] [~tysonjh] since this came up in some offline discussions.

> Add support for generics in schema inference
> 
>
> Key: BEAM-7345
> URL: https://issues.apache.org/jira/browse/BEAM-7345
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Priority: P2
>  Labels: stale-P2
>
> Currently schema inference doesn't work for getters that return a 
> parameterized type. Fixing this would most likely involve plumbing 
> TypeDescriptor through FieldValueTypeSupplier, FieldValueTypeInformation, 
> StaticSchemaInference, etc.. rather than Class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-7345) Add support for generics in schema inference

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-7345:

Priority: P3  (was: P2)

> Add support for generics in schema inference
> 
>
> Key: BEAM-7345
> URL: https://issues.apache.org/jira/browse/BEAM-7345
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Priority: P3
>  Labels: stale-P2
>
> Currently schema inference doesn't work for getters that return a 
> parameterized type. Fixing this would most likely involve plumbing 
> TypeDescriptor through FieldValueTypeSupplier, FieldValueTypeInformation, 
> StaticSchemaInference, etc.. rather than Class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9189) Add Daffodil IO for Apache Beam

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-9189:

Priority: P3  (was: P2)

> Add Daffodil IO for Apache Beam
> ---
>
> Key: BEAM-9189
> URL: https://issues.apache.org/jira/browse/BEAM-9189
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Priority: P3
>  Labels: gsoc, stale-P2
>
> From https://daffodil.apache.org/:
> {quote}Daffodil is an open source implementation of the DFDL specification 
> that uses these DFDL schemas to parse fixed format data into an infoset, 
> which is most commonly represented as either XML or JSON. This allows the use 
> of well-established XML or JSON technologies and libraries to consume, 
> inspect, and manipulate fixed format data in existing solutions. Daffodil is 
> also capable of the reverse by serializing or “unparsing” an XML or JSON 
> infoset back to the original data format.
> {quote}
> We should create a Beam IO that accepts a DFDL schema as an argument and can 
> then produce and consume data in the specified format. I think it would be 
> most natural for Beam users if this IO could produce Beam Rows, but an 
> initial version that just operates with Infosets could be useful as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9189) Add Daffodil IO for Apache Beam

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-9189:

Status: Open  (was: Triage Needed)

> Add Daffodil IO for Apache Beam
> ---
>
> Key: BEAM-9189
> URL: https://issues.apache.org/jira/browse/BEAM-9189
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Priority: P2
>  Labels: gsoc, stale-P2
>
> From https://daffodil.apache.org/:
> {quote}Daffodil is an open source implementation of the DFDL specification 
> that uses these DFDL schemas to parse fixed format data into an infoset, 
> which is most commonly represented as either XML or JSON. This allows the use 
> of well-established XML or JSON technologies and libraries to consume, 
> inspect, and manipulate fixed format data in existing solutions. Daffodil is 
> also capable of the reverse by serializing or “unparsing” an XML or JSON 
> infoset back to the original data format.
> {quote}
> We should create a Beam IO that accepts a DFDL schema as an argument and can 
> then produce and consume data in the specified format. I think it would be 
> most natural for Beam users if this IO could produce Beam Rows, but an 
> initial version that just operates with Infosets could be useful as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9189) Add Daffodil IO for Apache Beam

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-9189:

Labels: gsoc stale-P2  (was: gsoc gsoc2020 mentor stale-P2)

> Add Daffodil IO for Apache Beam
> ---
>
> Key: BEAM-9189
> URL: https://issues.apache.org/jira/browse/BEAM-9189
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Priority: P2
>  Labels: gsoc, stale-P2
>
> From https://daffodil.apache.org/:
> {quote}Daffodil is an open source implementation of the DFDL specification 
> that uses these DFDL schemas to parse fixed format data into an infoset, 
> which is most commonly represented as either XML or JSON. This allows the use 
> of well-established XML or JSON technologies and libraries to consume, 
> inspect, and manipulate fixed format data in existing solutions. Daffodil is 
> also capable of the reverse by serializing or “unparsing” an XML or JSON 
> infoset back to the original data format.
> {quote}
> We should create a Beam IO that accepts a DFDL schema as an argument and can 
> then produce and consume data in the specified format. I think it would be 
> most natural for Beam users if this IO could produce Beam Rows, but an 
> initial version that just operates with Infosets could be useful as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9043) BigQueryIO fails cryptically if gcpTempLocation is set and tempLocation is not

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-9043:

Status: Open  (was: Triage Needed)

> BigQueryIO fails cryptically if gcpTempLocation is set and tempLocation is not
> --
>
> Key: BEAM-9043
> URL: https://issues.apache.org/jira/browse/BEAM-9043
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Brian Hulette
>Priority: P2
>  Labels: stale-P2
>
> The following error arises when running a pipeline that uses BigQueryIO with 
> gcpTempLocation set and tempLocation not set. We should either handle this 
> case gracefully, or throw a more helpful error like "please specify 
> tempLocation".
> {code:java}
> 2019-12-24 13:06:18 WARN  UnboundedReadFromBoundedSource:152 - Exception 
> while splitting 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource@5d21202d, skips the 
> initial splits.
> java.lang.NullPointerException
> at java.util.regex.Matcher.getTextLength(Matcher.java:1283)
> at java.util.regex.Matcher.reset(Matcher.java:309)
> at java.util.regex.Matcher.(Matcher.java:229)
> at java.util.regex.Pattern.matcher(Pattern.java:1093)
> at 
> org.apache.beam.sdk.io.FileSystems.parseScheme(FileSystems.java:447)
> at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:533)
> at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.resolveTempLocation(BigQueryHelpers.java:706)
> at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.extractFiles(BigQuerySourceBase.java:125)
> at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.split(BigQuerySourceBase.java:148)
> at 
> org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter.split(UnboundedReadFromBoundedSource.java:144)
> at 
> org.apache.beam.runners.dataflow.internal.CustomSources.serializeToCloudSource(CustomSources.java:87)
> at 
> org.apache.beam.runners.dataflow.ReadTranslator.translateReadHelper(ReadTranslator.java:51)
> at 
> org.apache.beam.runners.dataflow.DataflowRunner$StreamingUnboundedRead$ReadWithIdsTranslator.translate(DataflowRunner.java:1590)
> at 
> org.apache.beam.runners.dataflow.DataflowRunner$StreamingUnboundedRead$ReadWithIdsTranslator.translate(DataflowRunner.java:1587)
> at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.visitPrimitiveTransform(DataflowPipelineTranslator.java:475)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:665)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:317)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:251)
> at 
> org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:460)
> at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:414)
> at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:173)
> at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:763)
> at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:186)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:315)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:301)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9043) BigQueryIO fails cryptically if gcpTempLocation is set and tempLocation is not

2020-06-02 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123932#comment-17123932
 ] 

Brian Hulette commented on BEAM-9043:
-

I think this should remain P2

> BigQueryIO fails cryptically if gcpTempLocation is set and tempLocation is not
> --
>
> Key: BEAM-9043
> URL: https://issues.apache.org/jira/browse/BEAM-9043
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Brian Hulette
>Priority: P2
>  Labels: stale-P2
>
> The following error arises when running a pipeline that uses BigQueryIO with 
> gcpTempLocation set and tempLocation not set. We should either handle this 
> case gracefully, or throw a more helpful error like "please specify 
> tempLocation".
> {code:java}
> 2019-12-24 13:06:18 WARN  UnboundedReadFromBoundedSource:152 - Exception 
> while splitting 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource@5d21202d, skips the 
> initial splits.
> java.lang.NullPointerException
> at java.util.regex.Matcher.getTextLength(Matcher.java:1283)
> at java.util.regex.Matcher.reset(Matcher.java:309)
> at java.util.regex.Matcher.(Matcher.java:229)
> at java.util.regex.Pattern.matcher(Pattern.java:1093)
> at 
> org.apache.beam.sdk.io.FileSystems.parseScheme(FileSystems.java:447)
> at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:533)
> at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.resolveTempLocation(BigQueryHelpers.java:706)
> at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.extractFiles(BigQuerySourceBase.java:125)
> at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.split(BigQuerySourceBase.java:148)
> at 
> org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter.split(UnboundedReadFromBoundedSource.java:144)
> at 
> org.apache.beam.runners.dataflow.internal.CustomSources.serializeToCloudSource(CustomSources.java:87)
> at 
> org.apache.beam.runners.dataflow.ReadTranslator.translateReadHelper(ReadTranslator.java:51)
> at 
> org.apache.beam.runners.dataflow.DataflowRunner$StreamingUnboundedRead$ReadWithIdsTranslator.translate(DataflowRunner.java:1590)
> at 
> org.apache.beam.runners.dataflow.DataflowRunner$StreamingUnboundedRead$ReadWithIdsTranslator.translate(DataflowRunner.java:1587)
> at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.visitPrimitiveTransform(DataflowPipelineTranslator.java:475)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:665)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:317)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:251)
> at 
> org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:460)
> at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:414)
> at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:173)
> at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:763)
> at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:186)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:315)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:301)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-7974) Make RowCoder @Internal

2020-06-02 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-7974:

Summary: Make RowCoder @Internal  (was: Make RowCoder package-private)

> Make RowCoder @Internal
> ---
>
> Key: BEAM-7974
> URL: https://issues.apache.org/jira/browse/BEAM-7974
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P3
>  Labels: stale-assigned
>
> RowCoder is currently public in sdk.coders, tempting people to use it 
> directly. But the Schemas API is written such that everyone should be using 
> SchemaCoder, and RowCoder should be an implementation detail.
> Unfortunately this isn't a trivial change, I tried to do it and resolve the 
> few dependencies that cropped up, but running RowCoderTest yielded the 
> following error:
> {code:java}
> tried to access class 
> org.apache.beam.sdk.schemas.RowCoderGenerator$EncodeInstruction from class 
> org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3
> java.lang.IllegalAccessError: tried to access class 
> org.apache.beam.sdk.schemas.RowCoderGenerator$EncodeInstruction from class 
> org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3
>   at org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3.encode(Unknown 
> Source)
>   at org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3.encode(Unknown 
> Source)
>   at org.apache.beam.sdk.schemas.RowCoder.encode(RowCoder.java:159)
>   at org.apache.beam.sdk.schemas.RowCoder.encode(RowCoder.java:54)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.testing.CoderProperties.encode(CoderProperties.java:334)
>   at 
> org.apache.beam.sdk.testing.CoderProperties.decodeEncode(CoderProperties.java:362)
>   at 
> org.apache.beam.sdk.testing.CoderProperties.coderDecodeEncodeEqualInContext(CoderProperties.java:104)
>   at 
> org.apache.beam.sdk.testing.CoderProperties.coderDecodeEncodeEqual(CoderProperties.java:94)
> {code}
> My attempt is available at 
> https://github.com/TheNeuralBit/beam/commit/869b8c6ba2f554bf56d8df70a754b76ef38dbc89



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7974) Make RowCoder package-private

2020-06-02 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123929#comment-17123929
 ] 

Brian Hulette commented on BEAM-7974:
-

I don't think this is feasible, maybe we should annotate it @Internal instead?

> Make RowCoder package-private
> -
>
> Key: BEAM-7974
> URL: https://issues.apache.org/jira/browse/BEAM-7974
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P3
>  Labels: stale-assigned
>
> RowCoder is currently public in sdk.coders, tempting people to use it 
> directly. But the Schemas API is written such that everyone should be using 
> SchemaCoder, and RowCoder should be an implementation detail.
> Unfortunately this isn't a trivial change, I tried to do it and resolve the 
> few dependencies that cropped up, but running RowCoderTest yielded the 
> following error:
> {code:java}
> tried to access class 
> org.apache.beam.sdk.schemas.RowCoderGenerator$EncodeInstruction from class 
> org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3
> java.lang.IllegalAccessError: tried to access class 
> org.apache.beam.sdk.schemas.RowCoderGenerator$EncodeInstruction from class 
> org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3
>   at org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3.encode(Unknown 
> Source)
>   at org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3.encode(Unknown 
> Source)
>   at org.apache.beam.sdk.schemas.RowCoder.encode(RowCoder.java:159)
>   at org.apache.beam.sdk.schemas.RowCoder.encode(RowCoder.java:54)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.testing.CoderProperties.encode(CoderProperties.java:334)
>   at 
> org.apache.beam.sdk.testing.CoderProperties.decodeEncode(CoderProperties.java:362)
>   at 
> org.apache.beam.sdk.testing.CoderProperties.coderDecodeEncodeEqualInContext(CoderProperties.java:104)
>   at 
> org.apache.beam.sdk.testing.CoderProperties.coderDecodeEncodeEqual(CoderProperties.java:94)
> {code}
> My attempt is available at 
> https://github.com/TheNeuralBit/beam/commit/869b8c6ba2f554bf56d8df70a754b76ef38dbc89



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9722) Add batch SnowflakeIO.Read to Java SDK

2020-06-01 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17121246#comment-17121246
 ] 

Brian Hulette commented on BEAM-9722:
-

Updated fix version to 2.23.0 since this wasn't in the 2.22.0 cut, see 
https://github.com/apache/beam/pull/11360#issuecomment-637027656

> Add batch SnowflakeIO.Read to Java SDK
> --
>
> Key: BEAM-9722
> URL: https://issues.apache.org/jira/browse/BEAM-9722
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Kasia Kucharczyk
>Assignee: Dariusz Aniszewski
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9722) Add batch SnowflakeIO.Read to Java SDK

2020-06-01 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-9722:

Fix Version/s: (was: 2.22.0)
   2.23.0

> Add batch SnowflakeIO.Read to Java SDK
> --
>
> Key: BEAM-9722
> URL: https://issues.apache.org/jira/browse/BEAM-9722
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Kasia Kucharczyk
>Assignee: Dariusz Aniszewski
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10024) Spark runner failing testOutputTimestampDefault

2020-06-01 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17121239#comment-17121239
 ] 

Brian Hulette commented on BEAM-10024:
--

Agree it's not great. I would definitely cherrypick a fix if/when one is merged 
into master. I just don't want to block the release until that happens.

> Spark runner failing testOutputTimestampDefault
> ---
>
> Key: BEAM-10024
> URL: https://issues.apache.org/jira/browse/BEAM-10024
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: currently-failing
> Fix For: 2.22.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is causing postcommit to fail
> java.lang.UnsupportedOperationException: Found TimerId annotations on 
> org.apache.beam.sdk.transforms.ParDoTest$TimerTests$12, but DoFn cannot yet 
> be used with timers in the SparkRunner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10024) Spark runner failing testOutputTimestampDefault

2020-06-01 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17121176#comment-17121176
 ] 

Brian Hulette commented on BEAM-10024:
--

I don't think this is worth blocking the release over since it doesn't 
represent a regression, just a newly added incorrectly annotated test. Do you 
disagree [~iemejia]?

> Spark runner failing testOutputTimestampDefault
> ---
>
> Key: BEAM-10024
> URL: https://issues.apache.org/jira/browse/BEAM-10024
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: currently-failing
> Fix For: 2.22.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is causing postcommit to fail
> java.lang.UnsupportedOperationException: Found TimerId annotations on 
> org.apache.beam.sdk.transforms.ParDoTest$TimerTests$12, but DoFn cannot yet 
> be used with timers in the SparkRunner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-10121) Python RowCoder doesn't support nested structs

2020-05-29 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved BEAM-10121.
--
Resolution: Fixed

> Python RowCoder doesn't support nested structs
> --
>
> Key: BEAM-10121
> URL: https://issues.apache.org/jira/browse/BEAM-10121
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.20.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10121) Python RowCoder doesn't support nested structs

2020-05-29 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-10121:
-
Fix Version/s: 2.22.0

> Python RowCoder doesn't support nested structs
> --
>
> Key: BEAM-10121
> URL: https://issues.apache.org/jira/browse/BEAM-10121
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.20.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-10122) Python RowCoder throws NotImplementedError in DataflowRunner

2020-05-29 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette closed BEAM-10122.

Resolution: Fixed

> Python RowCoder throws NotImplementedError in DataflowRunner
> 
>
> Key: BEAM-10122
> URL: https://issues.apache.org/jira/browse/BEAM-10122
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.20.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> ... because it overrides as_cloud_object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-10078) uniquify Dataflow specific jars when staging

2020-05-29 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved BEAM-10078.
--
Resolution: Fixed

> uniquify Dataflow specific jars when staging
> 
>
> Key: BEAM-10078
> URL: https://issues.apache.org/jira/browse/BEAM-10078
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language, runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> After BEAM-9383, Dataflow specific jars (dataflow-worker.jar, windmill_main) 
> could be overwritten when two or more jobs share the same staging location. 
> Since they 1) should have specific predefined names AND 2) should have unique 
> location for avoiding collision, they need special handling when staging 
> artifacts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9621) Python SqlTransform follow-ups

2020-05-29 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-9621:

Component/s: cross-language

> Python SqlTransform follow-ups
> --
>
> Key: BEAM-9621
> URL: https://issues.apache.org/jira/browse/BEAM-9621
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language, dsl-sql, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
>
> Tracking JIRA for follow-up work to improve SqlTransform in Python



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9989) Python Schemas: error encoding from generated user types with an "id" field

2020-05-29 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved BEAM-9989.
-
Fix Version/s: 2.22.0
   Resolution: Fixed

> Python Schemas: error encoding from generated user types with an "id" field
> ---
>
> Key: BEAM-9989
> URL: https://issues.apache.org/jira/browse/BEAM-9989
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
> Fix For: 2.22.0
>
>
> This reveals a couple of issues:
> - We probably shouldn't store the schema id in an "id" attribute on the type
> - We should just treat the named tuple instance as a tuple when reading 
> rather than calling getattr for each schema field



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work stopped] (BEAM-10010) Test Python SqlTransform on fn_api_runner

2020-05-29 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-10010 stopped by Brian Hulette.

> Test Python SqlTransform on fn_api_runner
> -
>
> Key: BEAM-10010
> URL: https://issues.apache.org/jira/browse/BEAM-10010
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
>
> It should be possible to run with the fn_api_runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work stopped] (BEAM-9989) Python Schemas: error encoding from generated user types with an "id" field

2020-05-29 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-9989 stopped by Brian Hulette.
---
> Python Schemas: error encoding from generated user types with an "id" field
> ---
>
> Key: BEAM-9989
> URL: https://issues.apache.org/jira/browse/BEAM-9989
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
>
> This reveals a couple of issues:
> - We probably shouldn't store the schema id in an "id" attribute on the type
> - We should just treat the named tuple instance as a tuple when reading 
> rather than calling getattr for each schema field



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10122) Python RowCoder throws NotImplementedError in DataflowRunner

2020-05-29 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-10122:
-
Fix Version/s: 2.22.0

> Python RowCoder throws NotImplementedError in DataflowRunner
> 
>
> Key: BEAM-10122
> URL: https://issues.apache.org/jira/browse/BEAM-10122
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.20.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> ... because it overrides as_cloud_object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10077) using filename + hash instead of UUID for staging name

2020-05-28 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-10077:
-
Status: Open  (was: Triage Needed)

> using filename + hash instead of UUID for staging name
> --
>
> Key: BEAM-10077
> URL: https://issues.apache.org/jira/browse/BEAM-10077
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Recent change BEAM-9383 disabled the artifact caching logic for GCS by object 
> names. Changing staging name generation from UUID to filename + hash will 
> re-enable the artifact caching so we can avoid re-uploading same artifact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-10121) Python RowCoder doesn't support nested structs

2020-05-27 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-10121 started by Brian Hulette.

> Python RowCoder doesn't support nested structs
> --
>
> Key: BEAM-10121
> URL: https://issues.apache.org/jira/browse/BEAM-10121
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.20.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10122) Python RowCoder throws NotImplementedError in DataflowRunner

2020-05-27 Thread Brian Hulette (Jira)
Brian Hulette created BEAM-10122:


 Summary: Python RowCoder throws NotImplementedError in 
DataflowRunner
 Key: BEAM-10122
 URL: https://issues.apache.org/jira/browse/BEAM-10122
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Affects Versions: 2.20.0
Reporter: Brian Hulette
Assignee: Brian Hulette


... because it overrides as_cloud_object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-10122) Python RowCoder throws NotImplementedError in DataflowRunner

2020-05-27 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-10122 started by Brian Hulette.

> Python RowCoder throws NotImplementedError in DataflowRunner
> 
>
> Key: BEAM-10122
> URL: https://issues.apache.org/jira/browse/BEAM-10122
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.20.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
>
> ... because it overrides as_cloud_object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10121) Python RowCoder doesn't support nested structs

2020-05-27 Thread Brian Hulette (Jira)
Brian Hulette created BEAM-10121:


 Summary: Python RowCoder doesn't support nested structs
 Key: BEAM-10121
 URL: https://issues.apache.org/jira/browse/BEAM-10121
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Affects Versions: 2.20.0
Reporter: Brian Hulette
Assignee: Brian Hulette






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10054) Direct Runner execution stalls with test pipeline

2020-05-27 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118058#comment-17118058
 ] 

Brian Hulette commented on BEAM-10054:
--

Should this block the 2.22 release?

> Direct Runner execution stalls with test pipeline
> -
>
> Key: BEAM-10054
> URL: https://issues.apache.org/jira/browse/BEAM-10054
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Internally, we have a test pipeline which runs with the DirectRunner. When 
> upgrading from 2.18.0 to 2.21.0 the test failed with the following exception:
> {noformat}
> tp = Exception('Monitor task detected a pipeline stall.',), value = None, tb 
> = None
> def raise_(tp, value=None, tb=None):
> """
> A function that matches the Python 2.x ``raise`` statement. This
> allows re-raising exceptions with the cls value and traceback on
> Python 2 and 3.
> """
> if value is not None and isinstance(tp, Exception):
> raise TypeError("instance exception may not have a separate 
> value")
> if value is not None:
> exc = tp(value)
> else:
> exc = tp
> if exc.__traceback__ is not tb:
> raise exc.with_traceback(tb)
> >   raise exc
> E   Exception: Monitor task detected a pipeline stall.
> {noformat}
> I was able to bisect the error. This commit introduced the failure: 
> https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731
> If the following conditions evaluates to False, the pipeline runs correctly: 
> https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731#diff-2bb845e226f3a97c0f0f737d0558c5dbR1273



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9974) beam_PostRelease_NightlySnapshot failing

2020-05-27 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved BEAM-9974.
-
Fix Version/s: 2.23.0
   Resolution: Fixed

> beam_PostRelease_NightlySnapshot failing
> 
>
> Key: BEAM-9974
> URL: https://issues.apache.org/jira/browse/BEAM-9974
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Brian Hulette
>Priority: P1
>  Labels: currently-failing
> Fix For: 2.23.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Another failure mode:
> 07:02:29 > Task 
> :runners:google-cloud-dataflow-java:runMobileGamingJavaDataflow
> 07:02:29 [ERROR] Failed to execute goal 
> org.codehaus.mojo:exec-maven-plugin:1.6.0:java (default-cli) on project 
> word-count-beam: An exception occured while executing the Java class. No 
> filesystem found for scheme gs -> [Help 1]
> 07:02:29 [ERROR] 
> 07:02:29 [ERROR] To see the full stack trace of the errors, re-run Maven with 
> the -e switch.
> 07:02:29 [ERROR] Re-run Maven using the -X switch to enable full debug 
> logging.
> 07:02:29 [ERROR] 
> 07:02:29 [ERROR] For more information about the errors and possible 
> solutions, please read the following articles:
> 07:02:29 [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> 07:02:29 [ERROR] Failed to execute goal 
> org.codehaus.mojo:exec-maven-plugin:1.6.0:java (default-cli) on project 
> word-count-beam: An exception occured while executing the Java class. No 
> filesystem found for scheme gs -> [Help 1]
> 07:02:29 [ERROR] 
> 07:02:29 [ERROR] To see the full stack trace of the errors, re-run Maven with 
> the -e switch.
> 07:02:29 [ERROR] Re-run Maven using the -X switch to enable full debug 
> logging.
> 07:02:29 [ERROR] 
> 07:02:29 [ERROR] For more information about the errors and possible 
> solutions, please read the following articles:
> 07:02:29 [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> 07:02:29 [ERROR] Failed command
> 07:02:29 
> 07:02:29 > Task 
> :runners:google-cloud-dataflow-java:runMobileGamingJavaDataflow FAILED



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9974) beam_PostRelease_NightlySnapshot failing

2020-05-27 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117977#comment-17117977
 ] 

Brian Hulette commented on BEAM-9974:
-

Looks like this was resolved by #11786: 
https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/999/

> beam_PostRelease_NightlySnapshot failing
> 
>
> Key: BEAM-9974
> URL: https://issues.apache.org/jira/browse/BEAM-9974
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Brian Hulette
>Priority: P1
>  Labels: currently-failing
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Another failure mode:
> 07:02:29 > Task 
> :runners:google-cloud-dataflow-java:runMobileGamingJavaDataflow
> 07:02:29 [ERROR] Failed to execute goal 
> org.codehaus.mojo:exec-maven-plugin:1.6.0:java (default-cli) on project 
> word-count-beam: An exception occured while executing the Java class. No 
> filesystem found for scheme gs -> [Help 1]
> 07:02:29 [ERROR] 
> 07:02:29 [ERROR] To see the full stack trace of the errors, re-run Maven with 
> the -e switch.
> 07:02:29 [ERROR] Re-run Maven using the -X switch to enable full debug 
> logging.
> 07:02:29 [ERROR] 
> 07:02:29 [ERROR] For more information about the errors and possible 
> solutions, please read the following articles:
> 07:02:29 [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> 07:02:29 [ERROR] Failed to execute goal 
> org.codehaus.mojo:exec-maven-plugin:1.6.0:java (default-cli) on project 
> word-count-beam: An exception occured while executing the Java class. No 
> filesystem found for scheme gs -> [Help 1]
> 07:02:29 [ERROR] 
> 07:02:29 [ERROR] To see the full stack trace of the errors, re-run Maven with 
> the -e switch.
> 07:02:29 [ERROR] Re-run Maven using the -X switch to enable full debug 
> logging.
> 07:02:29 [ERROR] 
> 07:02:29 [ERROR] For more information about the errors and possible 
> solutions, please read the following articles:
> 07:02:29 [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> 07:02:29 [ERROR] Failed command
> 07:02:29 
> 07:02:29 > Task 
> :runners:google-cloud-dataflow-java:runMobileGamingJavaDataflow FAILED



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10077) using filename + hash instead of UUID for staging name

2020-05-27 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117928#comment-17117928
 ] 

Brian Hulette commented on BEAM-10077:
--

[~heejong] can we close this now that https://github.com/apache/beam/pull/11813 
has been merged into master and releae-2.22.0?

> using filename + hash instead of UUID for staging name
> --
>
> Key: BEAM-10077
> URL: https://issues.apache.org/jira/browse/BEAM-10077
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Recent change BEAM-9383 disabled the artifact caching logic for GCS by object 
> names. Changing staging name generation from UUID to filename + hash will 
> re-enable the artifact caching so we can avoid re-uploading same artifact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   >