[jira] [Commented] (BEAM-10027) Support for Kotlin-based Beam Katas

2020-06-03 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125203#comment-17125203
 ] 

Pablo Estrada commented on BEAM-10027:
--

It's technically not a part of the release, so it's not that necessary to 
specify, but changes merged now will be in Beam 2.23.0 - so I've added it. 
Thanks!

> Support for Kotlin-based Beam Katas
> ---
>
> Key: BEAM-10027
> URL: https://issues.apache.org/jira/browse/BEAM-10027
> Project: Beam
>  Issue Type: Improvement
>  Components: katas
>Reporter: Rion Williams
>Assignee: Rion Williams
>Priority: P2
> Fix For: 2.23.0
>
>   Original Estimate: 8h
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>
> Currently, there are a series of examples available demonstrating the use of 
> Apache Beam with Kotlin. It would be nice to have support for the same Beam 
> Katas that exist for Python, Go, and Java to also support Kotlin. 
> The port itself shouldn't be that involved since it can still target the JVM, 
> so it would likely just require the inclusion for Kotlin dependencies and a 
> conversion for all of the existing Java examples. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10027) Support for Kotlin-based Beam Katas

2020-06-03 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-10027:
-
Fix Version/s: 2.23.0

> Support for Kotlin-based Beam Katas
> ---
>
> Key: BEAM-10027
> URL: https://issues.apache.org/jira/browse/BEAM-10027
> Project: Beam
>  Issue Type: Improvement
>  Components: katas
>Reporter: Rion Williams
>Assignee: Rion Williams
>Priority: P2
> Fix For: 2.23.0
>
>   Original Estimate: 8h
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>
> Currently, there are a series of examples available demonstrating the use of 
> Apache Beam with Kotlin. It would be nice to have support for the same Beam 
> Katas that exist for Python, Go, and Java to also support Kotlin. 
> The port itself shouldn't be that involved since it can still target the JVM, 
> so it would likely just require the inclusion for Kotlin dependencies and a 
> conversion for all of the existing Java examples. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10068) Modify behavior of Dynamic Destinations

2020-06-01 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17121408#comment-17121408
 ] 

Pablo Estrada commented on BEAM-10068:
--

Made this into a new feature -type issue. [~mborkar] can you tell if this 
corresponds to a feature allowing `specifying per-destination numShards`?

> Modify behavior of Dynamic Destinations
> ---
>
> Key: BEAM-10068
> URL: https://issues.apache.org/jira/browse/BEAM-10068
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Mihir Borkar
>Assignee: Reuven Lax
>Priority: P2
>
> The writeDynamic() method, implementing Dynamic Destinations writes files per 
> destination per window per pane. 
> This leads to an increase in the number of files generated.
> The request is as follows:
> A way to make it possible for the user to modify the behavior of Dynamic 
> Destinations to control the number of output files being produced.
> a.) We can consider adding user-configurable parameters like writers per 
> bundle, increasing number of records processed per bundle
> and/or
> b.) Introduce a method implementing Dynamic Destinations but more dependent 
> on the data passing through the pipeline, instead of windows/panes.
> So instead of splitting every output file into roughly the number of 
> destinations being written to, we let the user configure how output files 
> should be divided across destinations.
> Links:
> [1] 
> [https://beam.apache.org/releases/javadoc/2.19.0/index.html?org/apache/beam/sdk/io/FileIO.html]
> [2] 
> [https://github.com/apache/beam/blob/da9e17288e8473925674a4691d9e86252e67d7d7/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10068) Modify behavior of Dynamic Destinations

2020-06-01 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada reassigned BEAM-10068:


Assignee: Reuven Lax  (was: Pablo Estrada)

> Modify behavior of Dynamic Destinations
> ---
>
> Key: BEAM-10068
> URL: https://issues.apache.org/jira/browse/BEAM-10068
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Mihir Borkar
>Assignee: Reuven Lax
>Priority: P2
>
> The writeDynamic() method, implementing Dynamic Destinations writes files per 
> destination per window per pane. 
> This leads to an increase in the number of files generated.
> The request is as follows:
> A way to make it possible for the user to modify the behavior of Dynamic 
> Destinations to control the number of output files being produced.
> a.) We can consider adding user-configurable parameters like writers per 
> bundle, increasing number of records processed per bundle
> and/or
> b.) Introduce a method implementing Dynamic Destinations but more dependent 
> on the data passing through the pipeline, instead of windows/panes.
> So instead of splitting every output file into roughly the number of 
> destinations being written to, we let the user configure how output files 
> should be divided across destinations.
> Links:
> [1] 
> [https://beam.apache.org/releases/javadoc/2.19.0/index.html?org/apache/beam/sdk/io/FileIO.html]
> [2] 
> [https://github.com/apache/beam/blob/da9e17288e8473925674a4691d9e86252e67d7d7/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10068) Modify behavior of Dynamic Destinations

2020-06-01 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-10068:
-
Issue Type: New Feature  (was: Improvement)

> Modify behavior of Dynamic Destinations
> ---
>
> Key: BEAM-10068
> URL: https://issues.apache.org/jira/browse/BEAM-10068
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Mihir Borkar
>Priority: P2
>
> The writeDynamic() method, implementing Dynamic Destinations writes files per 
> destination per window per pane. 
> This leads to an increase in the number of files generated.
> The request is as follows:
> A way to make it possible for the user to modify the behavior of Dynamic 
> Destinations to control the number of output files being produced.
> a.) We can consider adding user-configurable parameters like writers per 
> bundle, increasing number of records processed per bundle
> and/or
> b.) Introduce a method implementing Dynamic Destinations but more dependent 
> on the data passing through the pipeline, instead of windows/panes.
> So instead of splitting every output file into roughly the number of 
> destinations being written to, we let the user configure how output files 
> should be divided across destinations.
> Links:
> [1] 
> [https://beam.apache.org/releases/javadoc/2.19.0/index.html?org/apache/beam/sdk/io/FileIO.html]
> [2] 
> [https://github.com/apache/beam/blob/da9e17288e8473925674a4691d9e86252e67d7d7/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10068) Modify behavior of Dynamic Destinations

2020-06-01 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada reassigned BEAM-10068:


Assignee: Pablo Estrada

> Modify behavior of Dynamic Destinations
> ---
>
> Key: BEAM-10068
> URL: https://issues.apache.org/jira/browse/BEAM-10068
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Mihir Borkar
>Assignee: Pablo Estrada
>Priority: P2
>
> The writeDynamic() method, implementing Dynamic Destinations writes files per 
> destination per window per pane. 
> This leads to an increase in the number of files generated.
> The request is as follows:
> A way to make it possible for the user to modify the behavior of Dynamic 
> Destinations to control the number of output files being produced.
> a.) We can consider adding user-configurable parameters like writers per 
> bundle, increasing number of records processed per bundle
> and/or
> b.) Introduce a method implementing Dynamic Destinations but more dependent 
> on the data passing through the pipeline, instead of windows/panes.
> So instead of splitting every output file into roughly the number of 
> destinations being written to, we let the user configure how output files 
> should be divided across destinations.
> Links:
> [1] 
> [https://beam.apache.org/releases/javadoc/2.19.0/index.html?org/apache/beam/sdk/io/FileIO.html]
> [2] 
> [https://github.com/apache/beam/blob/da9e17288e8473925674a4691d9e86252e67d7d7/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-1438) The default behavior for the Write transform doesn't work well with the Dataflow streaming runner

2020-05-28 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118985#comment-17118985
 ] 

Pablo Estrada commented on BEAM-1438:
-

Ah it looks like it's just a matter of removing the check. 
[https://github.com/apache/beam/pull/11850] is out to fix this.

> The default behavior for the Write transform doesn't work well with the 
> Dataflow streaming runner
> -
>
> Key: BEAM-1438
> URL: https://issues.apache.org/jira/browse/BEAM-1438
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: P2
> Fix For: 2.5.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If a Write specifies 0 output shards, that implies the runner should pick an 
> appropriate sharding. The default behavior is to write one shard per input 
> bundle. This works well with the Dataflow batch runner, but not with the 
> streaming runner which produces large numbers of small bundles.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-1438) The default behavior for the Write transform doesn't work well with the Dataflow streaming runner

2020-05-28 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-1438:

Status: Open  (was: Triage Needed)

> The default behavior for the Write transform doesn't work well with the 
> Dataflow streaming runner
> -
>
> Key: BEAM-1438
> URL: https://issues.apache.org/jira/browse/BEAM-1438
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: P2
> Fix For: 2.5.0
>
>
> If a Write specifies 0 output shards, that implies the runner should pick an 
> appropriate sharding. The default behavior is to write one shard per input 
> bundle. This works well with the Dataflow batch runner, but not with the 
> streaming runner which produces large numbers of small bundles.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-1438) The default behavior for the Write transform doesn't work well with the Dataflow streaming runner

2020-05-28 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118980#comment-17118980
 ] 

Pablo Estrada commented on BEAM-1438:
-

[~reuvenlax] are you able to take a look at this?

> The default behavior for the Write transform doesn't work well with the 
> Dataflow streaming runner
> -
>
> Key: BEAM-1438
> URL: https://issues.apache.org/jira/browse/BEAM-1438
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: P2
> Fix For: 2.5.0
>
>
> If a Write specifies 0 output shards, that implies the runner should pick an 
> appropriate sharding. The default behavior is to write one shard per input 
> bundle. This works well with the Dataflow batch runner, but not with the 
> streaming runner which produces large numbers of small bundles.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-1438) The default behavior for the Write transform doesn't work well with the Dataflow streaming runner

2020-05-28 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118979#comment-17118979
 ] 

Pablo Estrada commented on BEAM-1438:
-

Reopening this issue, as this will not work on Dataflow, as appropriately 
pointed out by others.

> The default behavior for the Write transform doesn't work well with the 
> Dataflow streaming runner
> -
>
> Key: BEAM-1438
> URL: https://issues.apache.org/jira/browse/BEAM-1438
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: P2
> Fix For: 2.5.0
>
>
> If a Write specifies 0 output shards, that implies the runner should pick an 
> appropriate sharding. The default behavior is to write one shard per input 
> bundle. This works well with the Dataflow batch runner, but not with the 
> streaming runner which produces large numbers of small bundles.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-1438) The default behavior for the Write transform doesn't work well with the Dataflow streaming runner

2020-05-28 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada reopened BEAM-1438:
-

> The default behavior for the Write transform doesn't work well with the 
> Dataflow streaming runner
> -
>
> Key: BEAM-1438
> URL: https://issues.apache.org/jira/browse/BEAM-1438
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: P2
> Fix For: 2.5.0
>
>
> If a Write specifies 0 output shards, that implies the runner should pick an 
> appropriate sharding. The default behavior is to write one shard per input 
> bundle. This works well with the Dataflow batch runner, but not with the 
> streaming runner which produces large numbers of small bundles.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10098) Javadoc export deactivated for RabbitMqIO and KuduIO

2020-05-27 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada reassigned BEAM-10098:


Assignee: Pablo Estrada  (was: Alexey Romanenko)

> Javadoc export deactivated for RabbitMqIO and KuduIO
> 
>
> Key: BEAM-10098
> URL: https://issues.apache.org/jira/browse/BEAM-10098
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Ashwin Ramaswami
>Assignee: Pablo Estrada
>Priority: P2
>
>  Javadoc export is deactivated for RabbitMqIO and KuduIO. We should enable 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10098) Javadoc export deactivated for RabbitMqIO and KuduIO

2020-05-27 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-10098:
-
Status: Open  (was: Triage Needed)

> Javadoc export deactivated for RabbitMqIO and KuduIO
> 
>
> Key: BEAM-10098
> URL: https://issues.apache.org/jira/browse/BEAM-10098
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Ashwin Ramaswami
>Assignee: Pablo Estrada
>Priority: P2
>
>  Javadoc export is deactivated for RabbitMqIO and KuduIO. We should enable 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10099) Add FhirIO and HL7v2IO to I/O matrix

2020-05-27 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-10099:
-
Status: Open  (was: Triage Needed)

> Add FhirIO and HL7v2IO to I/O matrix
> 
>
> Key: BEAM-10099
> URL: https://issues.apache.org/jira/browse/BEAM-10099
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Ashwin Ramaswami
>Assignee: Jacob Ferriero
>Priority: P2
>
> We should do this after the next release is out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-3759) Add support for PaneInfo descriptor in Python SDK

2020-05-26 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116969#comment-17116969
 ] 

Pablo Estrada commented on BEAM-3759:
-

Unfortunately this is not fixed. PR 4763 only adds encoding support, but as 
Charles pointed out, it does not add support for adding the PaneInfo after GBK 
firings.

> Add support for PaneInfo descriptor in Python SDK
> -
>
> Key: BEAM-3759
> URL: https://issues.apache.org/jira/browse/BEAM-3759
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.3.0
>Reporter: Charles Chen
>Priority: P2
> Fix For: 2.19.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> The PaneInfo descriptor allows a user to determine which particular 
> triggering emitted a value.  This allows the user to differentiate between 
> speculative (early), on-time (at end of window) and late value emissions 
> coming out of a GroupByKey.  We should add support for this feature in the 
> Python SDK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-3759) Add support for PaneInfo descriptor in Python SDK

2020-05-26 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada reopened BEAM-3759:
-
  Assignee: (was: Tanay Tummalapalli)

> Add support for PaneInfo descriptor in Python SDK
> -
>
> Key: BEAM-3759
> URL: https://issues.apache.org/jira/browse/BEAM-3759
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.3.0
>Reporter: Charles Chen
>Priority: P2
> Fix For: 2.19.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> The PaneInfo descriptor allows a user to determine which particular 
> triggering emitted a value.  This allows the user to differentiate between 
> speculative (early), on-time (at end of window) and late value emissions 
> coming out of a GroupByKey.  We should add support for this feature in the 
> Python SDK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-3767) A Complex Event Processing (CEP) library/extension for Apache Beam

2020-05-21 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-3767:

Labels: gsoc gsoc2021  (was: )

> A Complex Event Processing (CEP) library/extension for Apache Beam
> --
>
> Key: BEAM-3767
> URL: https://issues.apache.org/jira/browse/BEAM-3767
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-ideas
>Reporter: Ismaël Mejía
>Priority: P3
>  Labels: gsoc, gsoc2021
>
> Apache Beam [1] is a unified and portable programming model for data 
> processing jobs. The Beam model [2, 3, 4] has rich mechanisms to process 
> endless streams of events.
> Complex Event Processing [5] lets you match patterns of events in streams to 
> detect important patterns in data and react to them.
> Some examples of uses of CEP are fraud detection for example by detecting 
> unusual behavior (patterns of activity), e.g. network intrusion, suspicious 
> banking transactions, etc. Also trend detection is another interesting use 
> case in the context of sensors and IoT.
> The goal of this issue is to implement an efficient pattern matching library 
> inspired by [6] and existing libraries like Apache Flink CEP [7] using the 
> Apache Beam Java SDK and the Beam style guides [8]. Because of the time 
> constraints of GSoC we will probably try to cover first simple patterns of 
> the ‘a followed by b followed by c’ kind, and then if there is still time try 
> to cover more advanced ones e.g. optional, atLeastOne, oneOrMore, etc.
> [1] [https://beam.apache.org/]
>  [2] [https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101]
>  [3] [https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102]
>  [4] 
> [https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43864.pdf]
>  [5] [https://en.wikipedia.org/wiki/Complex_event_processing]
>  [6] [https://people.cs.umass.edu/~yanlei/publications/sase-sigmod08.pdf]
>  [7] 
> [https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/cep.html]
>  [8] [https://beam.apache.org/contribute/ptransform-style-guide/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9148) test flakiness: BigQueryQueryToTableIT.test_big_query_standard_sql

2020-05-18 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-9148.
-
Fix Version/s: Not applicable
   Resolution: Cannot Reproduce

Closing as obsolete.

> test flakiness: BigQueryQueryToTableIT.test_big_query_standard_sql
> --
>
> Key: BEAM-9148
> URL: https://issues.apache.org/jira/browse/BEAM-9148
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, sdk-py-core, test-failures
>Reporter: Udi Meiri
>Assignee: Pablo Estrada
>Priority: P2
> Fix For: Not applicable
>
>
> There might be other flaky test cases from the same class, but I'm focusing 
> on test_big_query_standard_sql here.
> {code}
> 19:39:12  
> ==
> 19:39:12  FAIL: test_big_query_standard_sql 
> (apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT)
> 19:39:12  
> --
> 19:39:12  Traceback (most recent call last):
> 19:39:12File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py",
>  line 172, in test_big_query_standard_sql
> 19:39:12  big_query_query_to_table_pipeline.run_bq_pipeline(options)
> 19:39:12File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py",
>  line 84, in run_bq_pipeline
> 19:39:12  result = p.run()
> 19:39:12File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 112, in run
> 19:39:12  else test_runner_api))
> 19:39:12File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/sdks/python/apache_beam/pipeline.py",
>  line 461, in run
> 19:39:12  self._options).run(False)
> 19:39:12File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/sdks/python/apache_beam/pipeline.py",
>  line 474, in run
> 19:39:12  return self.runner.run_pipeline(self, self._options)
> 19:39:12File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
>  line 53, in run_pipeline
> 19:39:12  hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 19:39:12  AssertionError: 
> 19:39:12  Expected: (Test pipeline expected terminated in state: DONE and 
> Expected checksum is 158a8ea1c254fcf40d4ed3e7c0242c3ea0a29e72)
> 19:39:12   but: Expected checksum is 
> 158a8ea1c254fcf40d4ed3e7c0242c3ea0a29e72 Actual checksum is 
> da39a3ee5e6b4b0d3255bfef95601890afd80709
> 19:39:12  
> 19:39:12   >> begin captured logging << 
> 
> 19:39:12  root: DEBUG: Unhandled type_constraint: Union[]
> 19:39:12  root: DEBUG: Unhandled type_constraint: Union[]
> 19:39:12  apache_beam.runners.direct.direct_runner: INFO: Running pipeline 
> with DirectRunner.
> 19:39:12  apache_beam.io.gcp.bigquery_tools: DEBUG: Query SELECT * FROM 
> (SELECT "apple" as fruit) UNION ALL (SELECT "orange" as fruit) does not 
> reference any tables.
> 19:39:12  apache_beam.io.gcp.bigquery_tools: WARNING: Dataset 
> apache-beam-testing:temp_dataset_90f5797bdb5f4137af750399f91a8e66 does not 
> exist so we will create it as temporary with location=None
> 19:39:12  apache_beam.io.gcp.bigquery: DEBUG: Creating or getting table 
>  19:39:12   datasetId: 'python_query_to_table_15792323245106'
> 19:39:12   projectId: 'apache-beam-testing'
> 19:39:12   tableId: 'output_table'> with schema {'fields': [{'name': 'fruit', 
> 'type': 'STRING', 'mode': 'NULLABLE'}]}.
> 19:39:12  apache_beam.io.gcp.bigquery_tools: DEBUG: Created the table with id 
> output_table
> 19:39:12  apache_beam.io.gcp.bigquery_tools: INFO: Created table 
> apache-beam-testing.python_query_to_table_15792323245106.output_table with 
> schema  19:39:12   fields: [ 19:39:12   fields: []
> 19:39:12   mode: 'NULLABLE'
> 19:39:12   name: 'fruit'
> 19:39:12   type: 'STRING'>]>. Result:  19:39:12   creationTime: 1579232328576
> 19:39:12   etag: 'WYysl6UIvc8IWMmTiiKhbg=='
> 19:39:12   id: 
> 'apache-beam-testing:python_query_to_table_15792323245106.output_table'
> 19:39:12   kind: 'bigquery#table'
> 19:39:12   lastModifiedTime: 1579232328629
> 19:39:12   location: 'US'
> 19:39:12   numBytes: 0
> 19:39:12   numLongTermBytes: 0
> 19:39:12   numRows: 0
> 19:39:12   schema:  19:39:12   fields: [ 19:39:12   fields: []
> 19:39:12   mode: 'NULLABLE'
> 19:39:12   name: 'fruit'
> 19:39:12   type: 'STRING'>]>
> 19:39:12   selfLink: 
> 

[jira] [Resolved] (BEAM-8197) test_big_query_write_without_schema might be flaky?

2020-05-18 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-8197.
-
Fix Version/s: Not applicable
   Resolution: Cannot Reproduce

Closing as obsolete.

> test_big_query_write_without_schema might be flaky?
> ---
>
> Key: BEAM-8197
> URL: https://issues.apache.org/jira/browse/BEAM-8197
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core, testing
>Reporter: Ahmet Altay
>Assignee: Pablo Estrada
>Priority: P1
> Fix For: Not applicable
>
>
> This failed in py 3.6 post commit test: 
> https://builds.apache.org/job/beam_PostCommit_Python36/434/console
> 18:14:58 
> ==
> 18:14:58 FAIL: test_big_query_write_without_schema 
> (apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests)
> 18:14:58 
> --
> 18:14:58 Traceback (most recent call last):
> 18:14:58   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python36/src/sdks/python/apache_beam/io/gcp/bigquery_write_it_test.py",
>  line 269, in test_big_query_write_without_schema
> 18:14:58 write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND))
> 18:14:58   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python36/src/sdks/python/apache_beam/pipeline.py",
>  line 427, in __exit__
> 18:14:58 self.run().wait_until_finish()
> 18:14:58   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python36/src/sdks/python/apache_beam/pipeline.py",
>  line 407, in run
> 18:14:58 self._options).run(False)
> 18:14:58   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python36/src/sdks/python/apache_beam/pipeline.py",
>  line 420, in run
> 18:14:58 return self.runner.run_pipeline(self, self._options)
> 18:14:58   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python36/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
>  line 51, in run_pipeline
> 18:14:58 hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 18:14:58 AssertionError: 
> 18:14:58 Expected: (Expected data is [(b'xyw', datetime.date(2011, 1, 1), 
> datetime.time(23, 59, 59, 99)), (b'abc', datetime.date(2000, 1, 1), 
> datetime.time(0, 0)), (b'\xe4\xbd\xa0\xe5\xa5\xbd', datetime.date(3000, 12, 
> 31), datetime.time(23, 59, 59)), (b'\xab\xac\xad', datetime.date(2000, 1, 1), 
> datetime.time(0, 0))])
> 18:14:58  but: Expected data is [(b'xyw', datetime.date(2011, 1, 1), 
> datetime.time(23, 59, 59, 99)), (b'abc', datetime.date(2000, 1, 1), 
> datetime.time(0, 0)), (b'\xe4\xbd\xa0\xe5\xa5\xbd',



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7514) Support streaming on the Python fn_api_runner

2020-05-15 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108614#comment-17108614
 ] 

Pablo Estrada commented on BEAM-7514:
-

Hi Clint! I've been working - albeit slowly- on this for a while. I'm making 
slow progress.

The BundleBasedDirectRunner, which is automatically selected when using 
TestStream+DirectRunner should support this fine. What issue are you running 
into?

> Support streaming on the Python fn_api_runner
> -
>
> Key: BEAM-7514
> URL: https://issues.apache.org/jira/browse/BEAM-7514
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9983) bigquery_read_it_test.ReadNewTypesTests.test_iobase_source failing

2020-05-14 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-9983.
-
Fix Version/s: Not applicable
   Resolution: Fixed

this pr was reverted and rolled-forward later on with fixes.

> bigquery_read_it_test.ReadNewTypesTests.test_iobase_source failing
> --
>
> Key: BEAM-9983
> URL: https://issues.apache.org/jira/browse/BEAM-9983
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, test-failures
>Reporter: Kyle Weaver
>Assignee: Pablo Estrada
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This failure seems to afflict all Python postcommits.
> apache_beam.io.gcp.bigquery_read_it_test.ReadNewTypesTests.test_iobase_source 
> (from nosetests)
> Failing for the past 1 build (Since Failed#2429 )
> Took 9 min 57 sec.
> Error Message
> Dataflow pipeline failed. State: FAILED, Error:
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/apache_beam/utils/retry.py", 
> line 246, in wrapper
> sleep_interval = next(retry_intervals)
> StopIteration
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 
> 647, in do_work
> work_executor.execute()
>   File "/usr/local/lib/python3.6/site-packages/dataflow_worker/executor.py", 
> line 226, in execute
> self._split_task)
>   File "/usr/local/lib/python3.6/site-packages/dataflow_worker/executor.py", 
> line 234, in _perform_source_split_considering_api_limits
> desired_bundle_size)
>   File "/usr/local/lib/python3.6/site-packages/dataflow_worker/executor.py", 
> line 271, in _perform_source_split
> for split in source.split(desired_bundle_size):
>   File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/io/gcp/bigquery.py", line 
> 698, in split
> self.table_reference = self._execute_query(bq)
>   File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/options/value_provider.py",
>  line 135, in _f
> return fnc(self, *args, **kwargs)
>   File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/io/gcp/bigquery.py", line 
> 744, in _execute_query
> job_labels=self.bigquery_job_labels)
>   File "/usr/local/lib/python3.6/site-packages/apache_beam/utils/retry.py", 
> line 249, in wrapper
> raise_with_traceback(exn, exn_traceback)
>   File "/usr/local/lib/python3.6/site-packages/future/utils/__init__.py", 
> line 446, in raise_with_traceback
> raise exc.with_traceback(traceback)
>   File "/usr/local/lib/python3.6/site-packages/apache_beam/utils/retry.py", 
> line 236, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/io/gcp/bigquery_tools.py",
>  line 415, in _start_query_job
> labels=job_labels or {},
>   File 
> "/usr/local/lib/python3.6/site-packages/apitools/base/protorpclite/messages.py",
>  line 791, in __init__
> setattr(self, name, value)
>   File 
> "/usr/local/lib/python3.6/site-packages/apitools/base/protorpclite/messages.py",
>  line 973, in __setattr__
> object.__setattr__(self, name, value)
>   File 
> "/usr/local/lib/python3.6/site-packages/apitools/base/protorpclite/messages.py",
>  line 1651, in __set__
> value = t(**value)
>   File 
> "/usr/local/lib/python3.6/site-packages/apitools/base/protorpclite/messages.py",
>  line 791, in __init__
> setattr(self, name, value)
>   File 
> "/usr/local/lib/python3.6/site-packages/apitools/base/protorpclite/messages.py",
>  line 976, in __setattr__
> "to message %s" % (name, type(self).__name__))
> AttributeError: May not assign arbitrary value owner to message LabelsValue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9967) Add support for BigQuery job labels on ReadFrom/WriteTo(BigQuery) transforms

2020-05-12 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9967:
---

 Summary: Add support for BigQuery job labels on 
ReadFrom/WriteTo(BigQuery) transforms
 Key: BEAM-9967
 URL: https://issues.apache.org/jira/browse/BEAM-9967
 Project: Beam
  Issue Type: Bug
  Components: io-py-gcp
Reporter: Pablo Estrada
Assignee: Pablo Estrada






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6514) Dataflow Batch Job Failure is leaving Datasets/Tables behind in BigQuery

2020-05-11 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17104994#comment-17104994
 ] 

Pablo Estrada commented on BEAM-6514:
-

This seems to havbe been noticed by others here: 
[https://stackoverflow.com/questions/61658242/dataprep-is-leaving-datasets-tables-behind-in-bigquery]

> Dataflow Batch Job Failure is leaving Datasets/Tables behind in BigQuery
> 
>
> Key: BEAM-6514
> URL: https://issues.apache.org/jira/browse/BEAM-6514
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Rumeshkrishnan Mohan
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>
> Dataflow is leaving Datasets/Tables behind in BigQuery when the pipeline is 
> cancelled or when it fails. I cancelled a job or it failed at run time, and 
> it left behind a dataset and table in BigQuery.
>  # `cleanupTempResource` method involves cleaning tables and dataset after 
> batch job succeed.
>  # If job failed in the middle or cancelled explicitly, the temporary dataset 
> and tables remain exist. I do see the table expire period 1 day as per code 
> in `getTableToExtract` function written in BigQueryQuerySource.java.
>  # I can understand that, keep temp tables and dataset when failure for 
> debugging.
>  # Can we have pipeline or job optional parameters which get clean temporary 
> dataset and tables when cancel or fail ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9957) BigQueryIO does not clean up temporary tables / dataset when a pipeline fails

2020-05-11 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-9957.
-
Fix Version/s: Not applicable
   Resolution: Duplicate

BEAM-6514

> BigQueryIO does not clean up temporary tables / dataset when a pipeline fails
> -
>
> Key: BEAM-9957
> URL: https://issues.apache.org/jira/browse/BEAM-9957
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Pablo Estrada
>Priority: Major
> Fix For: Not applicable
>
>
> Some examples of this happenidng:
> [https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/609] - old 
> SDK, but with a bit more info
> [https://stackoverflow.com/questions/61658242/dataprep-is-leaving-datasets-tables-behind-in-bigquery]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9957) BigQueryIO does not clean up temporary tables / dataset when a pipeline fails

2020-05-11 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-9957:

Status: Open  (was: Triage Needed)

> BigQueryIO does not clean up temporary tables / dataset when a pipeline fails
> -
>
> Key: BEAM-9957
> URL: https://issues.apache.org/jira/browse/BEAM-9957
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Pablo Estrada
>Priority: Major
>
> Some examples of this happenidng:
> [https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/609] - old 
> SDK, but with a bit more info
> [https://stackoverflow.com/questions/61658242/dataprep-is-leaving-datasets-tables-behind-in-bigquery]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9957) BigQueryIO does not clean up temporary tables / dataset when a pipeline fails

2020-05-11 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9957:
---

 Summary: BigQueryIO does not clean up temporary tables / dataset 
when a pipeline fails
 Key: BEAM-9957
 URL: https://issues.apache.org/jira/browse/BEAM-9957
 Project: Beam
  Issue Type: Bug
  Components: io-java-gcp
Reporter: Pablo Estrada


Some examples of this happenidng:

[https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/609] - old SDK, 
but with a bit more info

[https://stackoverflow.com/questions/61658242/dataprep-is-leaving-datasets-tables-behind-in-bigquery]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8219) crossLanguagePortableWordCount seems to be flaky for beam_PostCommit_Python2

2020-05-11 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17104774#comment-17104774
 ] 

Pablo Estrada commented on BEAM-8219:
-

Thanks Kyle! I'll close this then - it is either solved or obsolete.

> crossLanguagePortableWordCount seems to be flaky for beam_PostCommit_Python2 
> -
>
> Key: BEAM-8219
> URL: https://issues.apache.org/jira/browse/BEAM-8219
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>
> For example,
> [https://builds.apache.org/job/beam_PostCommit_Python2/451/console]
> [https://builds.apache.org/job/beam_PostCommit_Python2/454/console]
> *10:37:22* * What went wrong:*10:37:22* Execution failed for task 
> ':sdks:python:test-suites:portable:py2:crossLanguagePortableWordCount'.*10:37:22*
>  > Process 'command 'sh'' finished with non-zero exit value 1*10:37:22* 
>  
> cc: [~heejong] [~mxm]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8219) crossLanguagePortableWordCount seems to be flaky for beam_PostCommit_Python2

2020-05-11 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-8219.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> crossLanguagePortableWordCount seems to be flaky for beam_PostCommit_Python2 
> -
>
> Key: BEAM-8219
> URL: https://issues.apache.org/jira/browse/BEAM-8219
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
> Fix For: Not applicable
>
>
> For example,
> [https://builds.apache.org/job/beam_PostCommit_Python2/451/console]
> [https://builds.apache.org/job/beam_PostCommit_Python2/454/console]
> *10:37:22* * What went wrong:*10:37:22* Execution failed for task 
> ':sdks:python:test-suites:portable:py2:crossLanguagePortableWordCount'.*10:37:22*
>  > Process 'command 'sh'' finished with non-zero exit value 1*10:37:22* 
>  
> cc: [~heejong] [~mxm]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8219) crossLanguagePortableWordCount seems to be flaky for beam_PostCommit_Python2

2020-05-11 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17104753#comment-17104753
 ] 

Pablo Estrada commented on BEAM-8219:
-

I see this is a bit of an old issue, but I see it making Py2 postcommit 
permared: [https://scans.gradle.com/s/dm24jlr4plnbk]

> crossLanguagePortableWordCount seems to be flaky for beam_PostCommit_Python2 
> -
>
> Key: BEAM-8219
> URL: https://issues.apache.org/jira/browse/BEAM-8219
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>
> For example,
> [https://builds.apache.org/job/beam_PostCommit_Python2/451/console]
> [https://builds.apache.org/job/beam_PostCommit_Python2/454/console]
> *10:37:22* * What went wrong:*10:37:22* Execution failed for task 
> ':sdks:python:test-suites:portable:py2:crossLanguagePortableWordCount'.*10:37:22*
>  > Process 'command 'sh'' finished with non-zero exit value 1*10:37:22* 
>  
> cc: [~heejong] [~mxm]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9886) ReadFromBigQuery should auto-infer which project to bill for BQ exports

2020-05-04 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9886:
---

 Summary: ReadFromBigQuery should auto-infer which project to bill 
for BQ exports
 Key: BEAM-9886
 URL: https://issues.apache.org/jira/browse/BEAM-9886
 Project: Beam
  Issue Type: Bug
  Components: io-py-gcp
Reporter: Pablo Estrada
Assignee: Pablo Estrada






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7885) DoFn.setup() don't run for streaming jobs on DirectRunner.

2020-04-28 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-7885.
-
Fix Version/s: 2.22.0
   Resolution: Fixed

> DoFn.setup() don't run for streaming jobs on DirectRunner. 
> ---
>
> Key: BEAM-7885
> URL: https://issues.apache.org/jira/browse/BEAM-7885
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.14.0
> Environment: Python
>Reporter: niklas Hansson
>Assignee: Pablo Estrada
>Priority: Minor
> Fix For: 2.22.0
>
>
> From version 2.14.0 Python have introduced setup and teardown for DoFn in 
> order to "Called to prepare an instance for processing bundles of 
> elements.This is a good place to initialize transient in-memory resources, 
> such as network connections."
> However when trying to use it for a unbounded job (pubsub source) it seams 
> like the DoFn.setup() is never called and the resources are never initialize. 
> [UPDATE] it is working for Dataflow runner but not for DirectRunner. For the 
> Dataflow runner the DoFn.Setup seams to be called multiple times but then 
> never again when the pipeline is processing elements [UPDATE] . For the 
> direct runner I get:
>  
> AttributeError: 'NoneType' object has no attribute 'predict' [while running 
> 'transform the data']
> """
> My source code: [https://github.com/NikeNano/DataflowSklearnStreaming]
>  
> I am happy to contribute with example code for how to use setup as soon as I 
> get it running :)  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-7885) DoFn.setup() don't run for streaming jobs on DirectRunner.

2020-04-28 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada reassigned BEAM-7885:
---

Assignee: Pablo Estrada

> DoFn.setup() don't run for streaming jobs on DirectRunner. 
> ---
>
> Key: BEAM-7885
> URL: https://issues.apache.org/jira/browse/BEAM-7885
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.14.0
> Environment: Python
>Reporter: niklas Hansson
>Assignee: Pablo Estrada
>Priority: Minor
>
> From version 2.14.0 Python have introduced setup and teardown for DoFn in 
> order to "Called to prepare an instance for processing bundles of 
> elements.This is a good place to initialize transient in-memory resources, 
> such as network connections."
> However when trying to use it for a unbounded job (pubsub source) it seams 
> like the DoFn.setup() is never called and the resources are never initialize. 
> [UPDATE] it is working for Dataflow runner but not for DirectRunner. For the 
> Dataflow runner the DoFn.Setup seams to be called multiple times but then 
> never again when the pipeline is processing elements [UPDATE] . For the 
> direct runner I get:
>  
> AttributeError: 'NoneType' object has no attribute 'predict' [while running 
> 'transform the data']
> """
> My source code: [https://github.com/NikeNano/DataflowSklearnStreaming]
>  
> I am happy to contribute with example code for how to use setup as soon as I 
> get it running :)  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7885) DoFn.setup() don't run for streaming jobs on DirectRunner.

2020-04-28 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095009#comment-17095009
 ] 

Pablo Estrada commented on BEAM-7885:
-

This is fixed in PR 11547

> DoFn.setup() don't run for streaming jobs on DirectRunner. 
> ---
>
> Key: BEAM-7885
> URL: https://issues.apache.org/jira/browse/BEAM-7885
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.14.0
> Environment: Python
>Reporter: niklas Hansson
>Priority: Minor
>
> From version 2.14.0 Python have introduced setup and teardown for DoFn in 
> order to "Called to prepare an instance for processing bundles of 
> elements.This is a good place to initialize transient in-memory resources, 
> such as network connections."
> However when trying to use it for a unbounded job (pubsub source) it seams 
> like the DoFn.setup() is never called and the resources are never initialize. 
> [UPDATE] it is working for Dataflow runner but not for DirectRunner. For the 
> Dataflow runner the DoFn.Setup seams to be called multiple times but then 
> never again when the pipeline is processing elements [UPDATE] . For the 
> direct runner I get:
>  
> AttributeError: 'NoneType' object has no attribute 'predict' [while running 
> 'transform the data']
> """
> My source code: [https://github.com/NikeNano/DataflowSklearnStreaming]
>  
> I am happy to contribute with example code for how to use setup as soon as I 
> get it running :)  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9745) [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to deserialize Custom DoFns and Custom Coders.

2020-04-28 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094957#comment-17094957
 ] 

Pablo Estrada commented on BEAM-9745:
-

I am trying to figure out whether this is a regression or not. I'll post an 
update by tomorrow morning.

So far the tests aren't passing on 2.20.0, but they throw a different error, so 
I guess it is hard to just dismiss as a regression:
{code:java}
java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.lang.RuntimeException: Error received from SDK harness for instruction 
-177: java.lang.UnsupportedOperationException: BigQuery source must be split 
before being read
at 
org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.createReader(BigQuerySourceBase.java:173)
at 
org.apache.beam.fn.harness.BoundedSourceRunner.runReadLoop(BoundedSourceRunner.java:159)
at 
org.apache.beam.fn.harness.BoundedSourceRunner.start(BoundedSourceRunner.java:146)
at 
org.apache.beam.fn.harness.data.PTransformFunctionRegistry.lambda$register$0(PTransformFunctionRegistry.java:108)
at 
org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:282)
at 
org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:160)
at 
org.apache.beam.fn.harness.control.BeamFnControlClient.lambda$processInstructionRequests$0(BeamFnControlClient.java:144)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57)
at 
org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:332)
at 
org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
at 
org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
at 
org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:412)
at 
org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:381)
at 
org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:306)
at 
org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:230)
at 
org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:138)
{code}

> [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to 
> deserialize Custom DoFns and Custom Coders.
> -
>
> Key: BEAM-9745
> URL: https://issues.apache.org/jira/browse/BEAM-9745
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, java-fn-execution, sdk-java-harness, 
> test-failures
>Reporter: Daniel Oliveira
>Assignee: Pablo Estrada
>Priority: Blocker
>  Labels: currently-failing
> Fix For: 2.21.0
>
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4657/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/c3izncsa4u24k/tests/by-project]
> Initial investigation:
> The bug appears to be popping up on BigQuery tests mostly, but also a 
> BigTable and a Datastore test.
> Here's an example stacktrace of the two errors, showing _only_ the error 
> messages themselves. Source: 
> [https://scans.gradle.com/s/c3izncsa4u24k/tests/efn4wciuamvqq-ccxt3jvofvqbe]
> {noformat}
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction -191: 
> java.lang.IllegalArgumentException: unable to deserialize Custom DoFn With 
> Execution Info
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3
> ...
> Caused by: java.lang.RuntimeException: Error received from SDK harness for 
> instruction -191: java.lang.IllegalArgumentException: unable to deserialize 
> Custom DoFn With Execution Info
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction -206: 
> 

[jira] [Resolved] (BEAM-9832) KeyError: 'No such coder: ' in fn_runner_test

2020-04-28 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-9832.
-
Fix Version/s: 2.22.0
   Resolution: Fixed

> KeyError: 'No such coder: ' in fn_runner_test
> -
>
> Key: BEAM-9832
> URL: https://issues.apache.org/jira/browse/BEAM-9832
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, test-failures
>Reporter: Ning Kang
>Assignee: Pablo Estrada
>Priority: Critical
> Fix For: 2.22.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Failed test results can be found 
> [here|[https://builds.apache.org/job/beam_PreCommit_Python_Commit/12525/]]
>  
> A stack trace:
> {code:java}
> self = 
>   testMethod=test_read>
> def test_read(self):
>   # Can't use NamedTemporaryFile as a context
>   # due to https://bugs.python.org/issue14243
>   temp_file = tempfile.NamedTemporaryFile(delete=False)
>   try:
> temp_file.write(b'a\nb\nc')
> temp_file.close()
> with self.create_pipeline() as p:
>   assert_that(
> > p | beam.io.ReadFromText(temp_file.name), equal_to(['a', 'b', 
> > 'c']))
> apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:625: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> apache_beam/pipeline.py:529: in __exit__
> self.run().wait_until_finish()
> apache_beam/pipeline.py:502: in run
> self._options).run(False)
> apache_beam/pipeline.py:515: in run
> return self.runner.run_pipeline(self, self._options)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:173: in 
> run_pipeline
> pipeline.to_runner_api(default_environment=self._default_environment))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:183: in 
> run_via_runner_api
> return self.run_stages(stage_context, stages)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:331: in run_stages
> bundle_context_manager,
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:508: in _run_stage
> bundle_manager)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:546: in _run_bundle
> data_input, data_output, input_timers, expected_timer_output)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:927: in 
> process_bundle
> timer_inputs)):
> /usr/lib/python3.5/concurrent/futures/_base.py:556: in result_iterator
> yield future.result()
> /usr/lib/python3.5/concurrent/futures/_base.py:405: in result
> return self.__get_result()
> /usr/lib/python3.5/concurrent/futures/_base.py:357: in __get_result
> raise self._exception
> apache_beam/utils/thread_pool_executor.py:44: in run
> self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:923: in execute
> dry_run)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:826: in 
> process_bundle
> result_future = self._worker_handler.control_conn.push(process_bundle_req)
> apache_beam/runners/portability/fn_api_runner/worker_handlers.py:353: in push
> response = self.worker.do_instruction(request)
> apache_beam/runners/worker/sdk_worker.py:471: in do_instruction
> getattr(request, request_type), request.instruction_id)
> apache_beam/runners/worker/sdk_worker.py:500: in process_bundle
> instruction_id, request.process_bundle_descriptor_id)
> apache_beam/runners/worker/sdk_worker.py:374: in get
> self.data_channel_factory)
> apache_beam/runners/worker/bundle_processor.py:782: in __init__
> self.ops = self.create_execution_tree(self.process_bundle_descriptor)
> apache_beam/runners/worker/bundle_processor.py:837: in create_execution_tree
> descriptor.transforms, key=topological_height, reverse=True)
> apache_beam/runners/worker/bundle_processor.py:836: in 
> (transform_id, get_operation(transform_id)) for transform_id in sorted(
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:820: in get_operation
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:820: in 
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:818: in 
> tag: [get_operation(op) for op in pcoll_consumers[pcoll_id]]
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:823: in get_operation
> transform_id, transform_consumers)
> apache_beam/runners/worker/bundle_processor.py:1108: in create_operation
> return creator(self, transform_id, transform_proto, payload, consumers)
> 

[jira] [Commented] (BEAM-9832) KeyError: 'No such coder: ' in fn_runner_test

2020-04-28 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094700#comment-17094700
 ] 

Pablo Estrada commented on BEAM-9832:
-

Okay I'll mark this as resolved then. Thanks!

> KeyError: 'No such coder: ' in fn_runner_test
> -
>
> Key: BEAM-9832
> URL: https://issues.apache.org/jira/browse/BEAM-9832
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, test-failures
>Reporter: Ning Kang
>Assignee: Pablo Estrada
>Priority: Critical
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Failed test results can be found 
> [here|[https://builds.apache.org/job/beam_PreCommit_Python_Commit/12525/]]
>  
> A stack trace:
> {code:java}
> self = 
>   testMethod=test_read>
> def test_read(self):
>   # Can't use NamedTemporaryFile as a context
>   # due to https://bugs.python.org/issue14243
>   temp_file = tempfile.NamedTemporaryFile(delete=False)
>   try:
> temp_file.write(b'a\nb\nc')
> temp_file.close()
> with self.create_pipeline() as p:
>   assert_that(
> > p | beam.io.ReadFromText(temp_file.name), equal_to(['a', 'b', 
> > 'c']))
> apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:625: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> apache_beam/pipeline.py:529: in __exit__
> self.run().wait_until_finish()
> apache_beam/pipeline.py:502: in run
> self._options).run(False)
> apache_beam/pipeline.py:515: in run
> return self.runner.run_pipeline(self, self._options)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:173: in 
> run_pipeline
> pipeline.to_runner_api(default_environment=self._default_environment))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:183: in 
> run_via_runner_api
> return self.run_stages(stage_context, stages)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:331: in run_stages
> bundle_context_manager,
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:508: in _run_stage
> bundle_manager)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:546: in _run_bundle
> data_input, data_output, input_timers, expected_timer_output)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:927: in 
> process_bundle
> timer_inputs)):
> /usr/lib/python3.5/concurrent/futures/_base.py:556: in result_iterator
> yield future.result()
> /usr/lib/python3.5/concurrent/futures/_base.py:405: in result
> return self.__get_result()
> /usr/lib/python3.5/concurrent/futures/_base.py:357: in __get_result
> raise self._exception
> apache_beam/utils/thread_pool_executor.py:44: in run
> self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:923: in execute
> dry_run)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:826: in 
> process_bundle
> result_future = self._worker_handler.control_conn.push(process_bundle_req)
> apache_beam/runners/portability/fn_api_runner/worker_handlers.py:353: in push
> response = self.worker.do_instruction(request)
> apache_beam/runners/worker/sdk_worker.py:471: in do_instruction
> getattr(request, request_type), request.instruction_id)
> apache_beam/runners/worker/sdk_worker.py:500: in process_bundle
> instruction_id, request.process_bundle_descriptor_id)
> apache_beam/runners/worker/sdk_worker.py:374: in get
> self.data_channel_factory)
> apache_beam/runners/worker/bundle_processor.py:782: in __init__
> self.ops = self.create_execution_tree(self.process_bundle_descriptor)
> apache_beam/runners/worker/bundle_processor.py:837: in create_execution_tree
> descriptor.transforms, key=topological_height, reverse=True)
> apache_beam/runners/worker/bundle_processor.py:836: in 
> (transform_id, get_operation(transform_id)) for transform_id in sorted(
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:820: in get_operation
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:820: in 
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:818: in 
> tag: [get_operation(op) for op in pcoll_consumers[pcoll_id]]
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:823: in get_operation
> transform_id, transform_consumers)
> apache_beam/runners/worker/bundle_processor.py:1108: in create_operation
> return creator(self, transform_id, transform_proto, payload, consumers)

[jira] [Commented] (BEAM-9832) KeyError: 'No such coder: ' in fn_runner_test

2020-04-27 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094101#comment-17094101
 ] 

Pablo Estrada commented on BEAM-9832:
-

[https://builds.apache.org/job/beam_PreCommit_Python_Cron/] seems to indicate 
even more recent breakage

> KeyError: 'No such coder: ' in fn_runner_test
> -
>
> Key: BEAM-9832
> URL: https://issues.apache.org/jira/browse/BEAM-9832
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, test-failures
>Reporter: Ning Kang
>Assignee: Pablo Estrada
>Priority: Critical
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Failed test results can be found 
> [here|[https://builds.apache.org/job/beam_PreCommit_Python_Commit/12525/]]
>  
> A stack trace:
> {code:java}
> self = 
>   testMethod=test_read>
> def test_read(self):
>   # Can't use NamedTemporaryFile as a context
>   # due to https://bugs.python.org/issue14243
>   temp_file = tempfile.NamedTemporaryFile(delete=False)
>   try:
> temp_file.write(b'a\nb\nc')
> temp_file.close()
> with self.create_pipeline() as p:
>   assert_that(
> > p | beam.io.ReadFromText(temp_file.name), equal_to(['a', 'b', 
> > 'c']))
> apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:625: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> apache_beam/pipeline.py:529: in __exit__
> self.run().wait_until_finish()
> apache_beam/pipeline.py:502: in run
> self._options).run(False)
> apache_beam/pipeline.py:515: in run
> return self.runner.run_pipeline(self, self._options)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:173: in 
> run_pipeline
> pipeline.to_runner_api(default_environment=self._default_environment))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:183: in 
> run_via_runner_api
> return self.run_stages(stage_context, stages)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:331: in run_stages
> bundle_context_manager,
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:508: in _run_stage
> bundle_manager)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:546: in _run_bundle
> data_input, data_output, input_timers, expected_timer_output)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:927: in 
> process_bundle
> timer_inputs)):
> /usr/lib/python3.5/concurrent/futures/_base.py:556: in result_iterator
> yield future.result()
> /usr/lib/python3.5/concurrent/futures/_base.py:405: in result
> return self.__get_result()
> /usr/lib/python3.5/concurrent/futures/_base.py:357: in __get_result
> raise self._exception
> apache_beam/utils/thread_pool_executor.py:44: in run
> self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:923: in execute
> dry_run)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:826: in 
> process_bundle
> result_future = self._worker_handler.control_conn.push(process_bundle_req)
> apache_beam/runners/portability/fn_api_runner/worker_handlers.py:353: in push
> response = self.worker.do_instruction(request)
> apache_beam/runners/worker/sdk_worker.py:471: in do_instruction
> getattr(request, request_type), request.instruction_id)
> apache_beam/runners/worker/sdk_worker.py:500: in process_bundle
> instruction_id, request.process_bundle_descriptor_id)
> apache_beam/runners/worker/sdk_worker.py:374: in get
> self.data_channel_factory)
> apache_beam/runners/worker/bundle_processor.py:782: in __init__
> self.ops = self.create_execution_tree(self.process_bundle_descriptor)
> apache_beam/runners/worker/bundle_processor.py:837: in create_execution_tree
> descriptor.transforms, key=topological_height, reverse=True)
> apache_beam/runners/worker/bundle_processor.py:836: in 
> (transform_id, get_operation(transform_id)) for transform_id in sorted(
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:820: in get_operation
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:820: in 
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:818: in 
> tag: [get_operation(op) for op in pcoll_consumers[pcoll_id]]
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:823: in get_operation
> transform_id, transform_consumers)
> apache_beam/runners/worker/bundle_processor.py:1108: in create_operation
> return 

[jira] [Commented] (BEAM-9832) KeyError: 'No such coder: ' in fn_runner_test

2020-04-27 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094091#comment-17094091
 ] 

Pablo Estrada commented on BEAM-9832:
-

Nevermind. that wasn't it : )

> KeyError: 'No such coder: ' in fn_runner_test
> -
>
> Key: BEAM-9832
> URL: https://issues.apache.org/jira/browse/BEAM-9832
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, test-failures
>Reporter: Ning Kang
>Assignee: Pablo Estrada
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Failed test results can be found 
> [here|[https://builds.apache.org/job/beam_PreCommit_Python_Commit/12525/]]
>  
> A stack trace:
> {code:java}
> self = 
>   testMethod=test_read>
> def test_read(self):
>   # Can't use NamedTemporaryFile as a context
>   # due to https://bugs.python.org/issue14243
>   temp_file = tempfile.NamedTemporaryFile(delete=False)
>   try:
> temp_file.write(b'a\nb\nc')
> temp_file.close()
> with self.create_pipeline() as p:
>   assert_that(
> > p | beam.io.ReadFromText(temp_file.name), equal_to(['a', 'b', 
> > 'c']))
> apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:625: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> apache_beam/pipeline.py:529: in __exit__
> self.run().wait_until_finish()
> apache_beam/pipeline.py:502: in run
> self._options).run(False)
> apache_beam/pipeline.py:515: in run
> return self.runner.run_pipeline(self, self._options)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:173: in 
> run_pipeline
> pipeline.to_runner_api(default_environment=self._default_environment))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:183: in 
> run_via_runner_api
> return self.run_stages(stage_context, stages)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:331: in run_stages
> bundle_context_manager,
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:508: in _run_stage
> bundle_manager)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:546: in _run_bundle
> data_input, data_output, input_timers, expected_timer_output)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:927: in 
> process_bundle
> timer_inputs)):
> /usr/lib/python3.5/concurrent/futures/_base.py:556: in result_iterator
> yield future.result()
> /usr/lib/python3.5/concurrent/futures/_base.py:405: in result
> return self.__get_result()
> /usr/lib/python3.5/concurrent/futures/_base.py:357: in __get_result
> raise self._exception
> apache_beam/utils/thread_pool_executor.py:44: in run
> self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:923: in execute
> dry_run)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:826: in 
> process_bundle
> result_future = self._worker_handler.control_conn.push(process_bundle_req)
> apache_beam/runners/portability/fn_api_runner/worker_handlers.py:353: in push
> response = self.worker.do_instruction(request)
> apache_beam/runners/worker/sdk_worker.py:471: in do_instruction
> getattr(request, request_type), request.instruction_id)
> apache_beam/runners/worker/sdk_worker.py:500: in process_bundle
> instruction_id, request.process_bundle_descriptor_id)
> apache_beam/runners/worker/sdk_worker.py:374: in get
> self.data_channel_factory)
> apache_beam/runners/worker/bundle_processor.py:782: in __init__
> self.ops = self.create_execution_tree(self.process_bundle_descriptor)
> apache_beam/runners/worker/bundle_processor.py:837: in create_execution_tree
> descriptor.transforms, key=topological_height, reverse=True)
> apache_beam/runners/worker/bundle_processor.py:836: in 
> (transform_id, get_operation(transform_id)) for transform_id in sorted(
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:820: in get_operation
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:820: in 
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:818: in 
> tag: [get_operation(op) for op in pcoll_consumers[pcoll_id]]
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:823: in get_operation
> transform_id, transform_consumers)
> apache_beam/runners/worker/bundle_processor.py:1108: in create_operation
> return creator(self, transform_id, transform_proto, payload, consumers)
> 

[jira] [Commented] (BEAM-9832) KeyError: 'No such coder: ' in fn_runner_test

2020-04-27 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094084#comment-17094084
 ] 

Pablo Estrada commented on BEAM-9832:
-

I am not 100% sure, but this may be a fix: 
[https://github.com/apache/beam/pull/11546] : P

> KeyError: 'No such coder: ' in fn_runner_test
> -
>
> Key: BEAM-9832
> URL: https://issues.apache.org/jira/browse/BEAM-9832
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, test-failures
>Reporter: Ning Kang
>Assignee: Pablo Estrada
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Failed test results can be found 
> [here|[https://builds.apache.org/job/beam_PreCommit_Python_Commit/12525/]]
>  
> A stack trace:
> {code:java}
> self = 
>   testMethod=test_read>
> def test_read(self):
>   # Can't use NamedTemporaryFile as a context
>   # due to https://bugs.python.org/issue14243
>   temp_file = tempfile.NamedTemporaryFile(delete=False)
>   try:
> temp_file.write(b'a\nb\nc')
> temp_file.close()
> with self.create_pipeline() as p:
>   assert_that(
> > p | beam.io.ReadFromText(temp_file.name), equal_to(['a', 'b', 
> > 'c']))
> apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:625: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> apache_beam/pipeline.py:529: in __exit__
> self.run().wait_until_finish()
> apache_beam/pipeline.py:502: in run
> self._options).run(False)
> apache_beam/pipeline.py:515: in run
> return self.runner.run_pipeline(self, self._options)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:173: in 
> run_pipeline
> pipeline.to_runner_api(default_environment=self._default_environment))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:183: in 
> run_via_runner_api
> return self.run_stages(stage_context, stages)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:331: in run_stages
> bundle_context_manager,
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:508: in _run_stage
> bundle_manager)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:546: in _run_bundle
> data_input, data_output, input_timers, expected_timer_output)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:927: in 
> process_bundle
> timer_inputs)):
> /usr/lib/python3.5/concurrent/futures/_base.py:556: in result_iterator
> yield future.result()
> /usr/lib/python3.5/concurrent/futures/_base.py:405: in result
> return self.__get_result()
> /usr/lib/python3.5/concurrent/futures/_base.py:357: in __get_result
> raise self._exception
> apache_beam/utils/thread_pool_executor.py:44: in run
> self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:923: in execute
> dry_run)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:826: in 
> process_bundle
> result_future = self._worker_handler.control_conn.push(process_bundle_req)
> apache_beam/runners/portability/fn_api_runner/worker_handlers.py:353: in push
> response = self.worker.do_instruction(request)
> apache_beam/runners/worker/sdk_worker.py:471: in do_instruction
> getattr(request, request_type), request.instruction_id)
> apache_beam/runners/worker/sdk_worker.py:500: in process_bundle
> instruction_id, request.process_bundle_descriptor_id)
> apache_beam/runners/worker/sdk_worker.py:374: in get
> self.data_channel_factory)
> apache_beam/runners/worker/bundle_processor.py:782: in __init__
> self.ops = self.create_execution_tree(self.process_bundle_descriptor)
> apache_beam/runners/worker/bundle_processor.py:837: in create_execution_tree
> descriptor.transforms, key=topological_height, reverse=True)
> apache_beam/runners/worker/bundle_processor.py:836: in 
> (transform_id, get_operation(transform_id)) for transform_id in sorted(
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:820: in get_operation
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:820: in 
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:818: in 
> tag: [get_operation(op) for op in pcoll_consumers[pcoll_id]]
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:823: in get_operation
> transform_id, transform_consumers)
> apache_beam/runners/worker/bundle_processor.py:1108: in create_operation
> return creator(self, 

[jira] [Commented] (BEAM-9832) KeyError: 'No such coder: ' in fn_runner_test

2020-04-27 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094078#comment-17094078
 ] 

Pablo Estrada commented on BEAM-9832:
-

hmm PR 11514 seems to only deal with monitoring infos... so I'm not sure

> KeyError: 'No such coder: ' in fn_runner_test
> -
>
> Key: BEAM-9832
> URL: https://issues.apache.org/jira/browse/BEAM-9832
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, test-failures
>Reporter: Ning Kang
>Assignee: Pablo Estrada
>Priority: Critical
>
> Failed test results can be found 
> [here|[https://builds.apache.org/job/beam_PreCommit_Python_Commit/12525/]]
>  
> A stack trace:
> {code:java}
> self = 
>   testMethod=test_read>
> def test_read(self):
>   # Can't use NamedTemporaryFile as a context
>   # due to https://bugs.python.org/issue14243
>   temp_file = tempfile.NamedTemporaryFile(delete=False)
>   try:
> temp_file.write(b'a\nb\nc')
> temp_file.close()
> with self.create_pipeline() as p:
>   assert_that(
> > p | beam.io.ReadFromText(temp_file.name), equal_to(['a', 'b', 
> > 'c']))
> apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:625: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> apache_beam/pipeline.py:529: in __exit__
> self.run().wait_until_finish()
> apache_beam/pipeline.py:502: in run
> self._options).run(False)
> apache_beam/pipeline.py:515: in run
> return self.runner.run_pipeline(self, self._options)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:173: in 
> run_pipeline
> pipeline.to_runner_api(default_environment=self._default_environment))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:183: in 
> run_via_runner_api
> return self.run_stages(stage_context, stages)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:331: in run_stages
> bundle_context_manager,
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:508: in _run_stage
> bundle_manager)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:546: in _run_bundle
> data_input, data_output, input_timers, expected_timer_output)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:927: in 
> process_bundle
> timer_inputs)):
> /usr/lib/python3.5/concurrent/futures/_base.py:556: in result_iterator
> yield future.result()
> /usr/lib/python3.5/concurrent/futures/_base.py:405: in result
> return self.__get_result()
> /usr/lib/python3.5/concurrent/futures/_base.py:357: in __get_result
> raise self._exception
> apache_beam/utils/thread_pool_executor.py:44: in run
> self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:923: in execute
> dry_run)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:826: in 
> process_bundle
> result_future = self._worker_handler.control_conn.push(process_bundle_req)
> apache_beam/runners/portability/fn_api_runner/worker_handlers.py:353: in push
> response = self.worker.do_instruction(request)
> apache_beam/runners/worker/sdk_worker.py:471: in do_instruction
> getattr(request, request_type), request.instruction_id)
> apache_beam/runners/worker/sdk_worker.py:500: in process_bundle
> instruction_id, request.process_bundle_descriptor_id)
> apache_beam/runners/worker/sdk_worker.py:374: in get
> self.data_channel_factory)
> apache_beam/runners/worker/bundle_processor.py:782: in __init__
> self.ops = self.create_execution_tree(self.process_bundle_descriptor)
> apache_beam/runners/worker/bundle_processor.py:837: in create_execution_tree
> descriptor.transforms, key=topological_height, reverse=True)
> apache_beam/runners/worker/bundle_processor.py:836: in 
> (transform_id, get_operation(transform_id)) for transform_id in sorted(
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:820: in get_operation
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:820: in 
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:818: in 
> tag: [get_operation(op) for op in pcoll_consumers[pcoll_id]]
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:823: in get_operation
> transform_id, transform_consumers)
> apache_beam/runners/worker/bundle_processor.py:1108: in create_operation
> return creator(self, transform_id, transform_proto, payload, consumers)
> 

[jira] [Commented] (BEAM-9832) KeyError: 'No such coder: ' in fn_runner_test

2020-04-27 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094074#comment-17094074
 ] 

Pablo Estrada commented on BEAM-9832:
-

I'm suspecting this one:

[https://github.com/apache/beam/pull/11514]

> KeyError: 'No such coder: ' in fn_runner_test
> -
>
> Key: BEAM-9832
> URL: https://issues.apache.org/jira/browse/BEAM-9832
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, test-failures
>Reporter: Ning Kang
>Assignee: Pablo Estrada
>Priority: Critical
>
> Failed test results can be found 
> [here|[https://builds.apache.org/job/beam_PreCommit_Python_Commit/12525/]]
>  
> A stack trace:
> {code:java}
> self = 
>   testMethod=test_read>
> def test_read(self):
>   # Can't use NamedTemporaryFile as a context
>   # due to https://bugs.python.org/issue14243
>   temp_file = tempfile.NamedTemporaryFile(delete=False)
>   try:
> temp_file.write(b'a\nb\nc')
> temp_file.close()
> with self.create_pipeline() as p:
>   assert_that(
> > p | beam.io.ReadFromText(temp_file.name), equal_to(['a', 'b', 
> > 'c']))
> apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:625: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> apache_beam/pipeline.py:529: in __exit__
> self.run().wait_until_finish()
> apache_beam/pipeline.py:502: in run
> self._options).run(False)
> apache_beam/pipeline.py:515: in run
> return self.runner.run_pipeline(self, self._options)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:173: in 
> run_pipeline
> pipeline.to_runner_api(default_environment=self._default_environment))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:183: in 
> run_via_runner_api
> return self.run_stages(stage_context, stages)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:331: in run_stages
> bundle_context_manager,
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:508: in _run_stage
> bundle_manager)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:546: in _run_bundle
> data_input, data_output, input_timers, expected_timer_output)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:927: in 
> process_bundle
> timer_inputs)):
> /usr/lib/python3.5/concurrent/futures/_base.py:556: in result_iterator
> yield future.result()
> /usr/lib/python3.5/concurrent/futures/_base.py:405: in result
> return self.__get_result()
> /usr/lib/python3.5/concurrent/futures/_base.py:357: in __get_result
> raise self._exception
> apache_beam/utils/thread_pool_executor.py:44: in run
> self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:923: in execute
> dry_run)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:826: in 
> process_bundle
> result_future = self._worker_handler.control_conn.push(process_bundle_req)
> apache_beam/runners/portability/fn_api_runner/worker_handlers.py:353: in push
> response = self.worker.do_instruction(request)
> apache_beam/runners/worker/sdk_worker.py:471: in do_instruction
> getattr(request, request_type), request.instruction_id)
> apache_beam/runners/worker/sdk_worker.py:500: in process_bundle
> instruction_id, request.process_bundle_descriptor_id)
> apache_beam/runners/worker/sdk_worker.py:374: in get
> self.data_channel_factory)
> apache_beam/runners/worker/bundle_processor.py:782: in __init__
> self.ops = self.create_execution_tree(self.process_bundle_descriptor)
> apache_beam/runners/worker/bundle_processor.py:837: in create_execution_tree
> descriptor.transforms, key=topological_height, reverse=True)
> apache_beam/runners/worker/bundle_processor.py:836: in 
> (transform_id, get_operation(transform_id)) for transform_id in sorted(
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:820: in get_operation
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:820: in 
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:818: in 
> tag: [get_operation(op) for op in pcoll_consumers[pcoll_id]]
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:823: in get_operation
> transform_id, transform_consumers)
> apache_beam/runners/worker/bundle_processor.py:1108: in create_operation
> return creator(self, transform_id, transform_proto, payload, consumers)
> 

[jira] [Commented] (BEAM-9832) KeyError: 'No such coder: ' in fn_runner_test

2020-04-27 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094075#comment-17094075
 ] 

Pablo Estrada commented on BEAM-9832:
-

(Same test failure is visible from revert 
[https://github.com/apache/beam/pull/11544] of  
[https://github.com/apache/beam/pull/11270] )

> KeyError: 'No such coder: ' in fn_runner_test
> -
>
> Key: BEAM-9832
> URL: https://issues.apache.org/jira/browse/BEAM-9832
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, test-failures
>Reporter: Ning Kang
>Assignee: Pablo Estrada
>Priority: Critical
>
> Failed test results can be found 
> [here|[https://builds.apache.org/job/beam_PreCommit_Python_Commit/12525/]]
>  
> A stack trace:
> {code:java}
> self = 
>   testMethod=test_read>
> def test_read(self):
>   # Can't use NamedTemporaryFile as a context
>   # due to https://bugs.python.org/issue14243
>   temp_file = tempfile.NamedTemporaryFile(delete=False)
>   try:
> temp_file.write(b'a\nb\nc')
> temp_file.close()
> with self.create_pipeline() as p:
>   assert_that(
> > p | beam.io.ReadFromText(temp_file.name), equal_to(['a', 'b', 
> > 'c']))
> apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:625: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> apache_beam/pipeline.py:529: in __exit__
> self.run().wait_until_finish()
> apache_beam/pipeline.py:502: in run
> self._options).run(False)
> apache_beam/pipeline.py:515: in run
> return self.runner.run_pipeline(self, self._options)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:173: in 
> run_pipeline
> pipeline.to_runner_api(default_environment=self._default_environment))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:183: in 
> run_via_runner_api
> return self.run_stages(stage_context, stages)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:331: in run_stages
> bundle_context_manager,
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:508: in _run_stage
> bundle_manager)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:546: in _run_bundle
> data_input, data_output, input_timers, expected_timer_output)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:927: in 
> process_bundle
> timer_inputs)):
> /usr/lib/python3.5/concurrent/futures/_base.py:556: in result_iterator
> yield future.result()
> /usr/lib/python3.5/concurrent/futures/_base.py:405: in result
> return self.__get_result()
> /usr/lib/python3.5/concurrent/futures/_base.py:357: in __get_result
> raise self._exception
> apache_beam/utils/thread_pool_executor.py:44: in run
> self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:923: in execute
> dry_run)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:826: in 
> process_bundle
> result_future = self._worker_handler.control_conn.push(process_bundle_req)
> apache_beam/runners/portability/fn_api_runner/worker_handlers.py:353: in push
> response = self.worker.do_instruction(request)
> apache_beam/runners/worker/sdk_worker.py:471: in do_instruction
> getattr(request, request_type), request.instruction_id)
> apache_beam/runners/worker/sdk_worker.py:500: in process_bundle
> instruction_id, request.process_bundle_descriptor_id)
> apache_beam/runners/worker/sdk_worker.py:374: in get
> self.data_channel_factory)
> apache_beam/runners/worker/bundle_processor.py:782: in __init__
> self.ops = self.create_execution_tree(self.process_bundle_descriptor)
> apache_beam/runners/worker/bundle_processor.py:837: in create_execution_tree
> descriptor.transforms, key=topological_height, reverse=True)
> apache_beam/runners/worker/bundle_processor.py:836: in 
> (transform_id, get_operation(transform_id)) for transform_id in sorted(
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:820: in get_operation
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:820: in 
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:818: in 
> tag: [get_operation(op) for op in pcoll_consumers[pcoll_id]]
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:823: in get_operation
> transform_id, transform_consumers)
> apache_beam/runners/worker/bundle_processor.py:1108: in create_operation
> return creator(self, 

[jira] [Commented] (BEAM-9832) KeyError: 'No such coder: ' in fn_runner_test

2020-04-27 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094073#comment-17094073
 ] 

Pablo Estrada commented on BEAM-9832:
-

Breakage seems to have started about 3 days ago: 
[https://builds.apache.org/job/beam_PreCommit_Python_Commit/]

> KeyError: 'No such coder: ' in fn_runner_test
> -
>
> Key: BEAM-9832
> URL: https://issues.apache.org/jira/browse/BEAM-9832
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, test-failures
>Reporter: Ning Kang
>Assignee: Pablo Estrada
>Priority: Critical
>
> Failed test results can be found 
> [here|[https://builds.apache.org/job/beam_PreCommit_Python_Commit/12525/]]
>  
> A stack trace:
> {code:java}
> self = 
>   testMethod=test_read>
> def test_read(self):
>   # Can't use NamedTemporaryFile as a context
>   # due to https://bugs.python.org/issue14243
>   temp_file = tempfile.NamedTemporaryFile(delete=False)
>   try:
> temp_file.write(b'a\nb\nc')
> temp_file.close()
> with self.create_pipeline() as p:
>   assert_that(
> > p | beam.io.ReadFromText(temp_file.name), equal_to(['a', 'b', 
> > 'c']))
> apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:625: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> apache_beam/pipeline.py:529: in __exit__
> self.run().wait_until_finish()
> apache_beam/pipeline.py:502: in run
> self._options).run(False)
> apache_beam/pipeline.py:515: in run
> return self.runner.run_pipeline(self, self._options)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:173: in 
> run_pipeline
> pipeline.to_runner_api(default_environment=self._default_environment))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:183: in 
> run_via_runner_api
> return self.run_stages(stage_context, stages)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:331: in run_stages
> bundle_context_manager,
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:508: in _run_stage
> bundle_manager)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:546: in _run_bundle
> data_input, data_output, input_timers, expected_timer_output)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:927: in 
> process_bundle
> timer_inputs)):
> /usr/lib/python3.5/concurrent/futures/_base.py:556: in result_iterator
> yield future.result()
> /usr/lib/python3.5/concurrent/futures/_base.py:405: in result
> return self.__get_result()
> /usr/lib/python3.5/concurrent/futures/_base.py:357: in __get_result
> raise self._exception
> apache_beam/utils/thread_pool_executor.py:44: in run
> self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:923: in execute
> dry_run)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:826: in 
> process_bundle
> result_future = self._worker_handler.control_conn.push(process_bundle_req)
> apache_beam/runners/portability/fn_api_runner/worker_handlers.py:353: in push
> response = self.worker.do_instruction(request)
> apache_beam/runners/worker/sdk_worker.py:471: in do_instruction
> getattr(request, request_type), request.instruction_id)
> apache_beam/runners/worker/sdk_worker.py:500: in process_bundle
> instruction_id, request.process_bundle_descriptor_id)
> apache_beam/runners/worker/sdk_worker.py:374: in get
> self.data_channel_factory)
> apache_beam/runners/worker/bundle_processor.py:782: in __init__
> self.ops = self.create_execution_tree(self.process_bundle_descriptor)
> apache_beam/runners/worker/bundle_processor.py:837: in create_execution_tree
> descriptor.transforms, key=topological_height, reverse=True)
> apache_beam/runners/worker/bundle_processor.py:836: in 
> (transform_id, get_operation(transform_id)) for transform_id in sorted(
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:820: in get_operation
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:820: in 
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:818: in 
> tag: [get_operation(op) for op in pcoll_consumers[pcoll_id]]
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:823: in get_operation
> transform_id, transform_consumers)
> apache_beam/runners/worker/bundle_processor.py:1108: in create_operation
> return creator(self, transform_id, transform_proto, 

[jira] [Commented] (BEAM-9832) KeyError: 'No such coder: ' in fn_runner_test

2020-04-27 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094043#comment-17094043
 ] 

Pablo Estrada commented on BEAM-9832:
-

Reverting PR: [https://github.com/apache/beam/pull/11544]

> KeyError: 'No such coder: ' in fn_runner_test
> -
>
> Key: BEAM-9832
> URL: https://issues.apache.org/jira/browse/BEAM-9832
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, test-failures
>Reporter: Ning Kang
>Assignee: Pablo Estrada
>Priority: Critical
>
> Failed test results can be found 
> [here|[https://builds.apache.org/job/beam_PreCommit_Python_Commit/12525/]]
>  
> A stack trace:
> {code:java}
> self = 
>   testMethod=test_read>
> def test_read(self):
>   # Can't use NamedTemporaryFile as a context
>   # due to https://bugs.python.org/issue14243
>   temp_file = tempfile.NamedTemporaryFile(delete=False)
>   try:
> temp_file.write(b'a\nb\nc')
> temp_file.close()
> with self.create_pipeline() as p:
>   assert_that(
> > p | beam.io.ReadFromText(temp_file.name), equal_to(['a', 'b', 
> > 'c']))
> apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:625: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> apache_beam/pipeline.py:529: in __exit__
> self.run().wait_until_finish()
> apache_beam/pipeline.py:502: in run
> self._options).run(False)
> apache_beam/pipeline.py:515: in run
> return self.runner.run_pipeline(self, self._options)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:173: in 
> run_pipeline
> pipeline.to_runner_api(default_environment=self._default_environment))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:183: in 
> run_via_runner_api
> return self.run_stages(stage_context, stages)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:331: in run_stages
> bundle_context_manager,
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:508: in _run_stage
> bundle_manager)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:546: in _run_bundle
> data_input, data_output, input_timers, expected_timer_output)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:927: in 
> process_bundle
> timer_inputs)):
> /usr/lib/python3.5/concurrent/futures/_base.py:556: in result_iterator
> yield future.result()
> /usr/lib/python3.5/concurrent/futures/_base.py:405: in result
> return self.__get_result()
> /usr/lib/python3.5/concurrent/futures/_base.py:357: in __get_result
> raise self._exception
> apache_beam/utils/thread_pool_executor.py:44: in run
> self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:923: in execute
> dry_run)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:826: in 
> process_bundle
> result_future = self._worker_handler.control_conn.push(process_bundle_req)
> apache_beam/runners/portability/fn_api_runner/worker_handlers.py:353: in push
> response = self.worker.do_instruction(request)
> apache_beam/runners/worker/sdk_worker.py:471: in do_instruction
> getattr(request, request_type), request.instruction_id)
> apache_beam/runners/worker/sdk_worker.py:500: in process_bundle
> instruction_id, request.process_bundle_descriptor_id)
> apache_beam/runners/worker/sdk_worker.py:374: in get
> self.data_channel_factory)
> apache_beam/runners/worker/bundle_processor.py:782: in __init__
> self.ops = self.create_execution_tree(self.process_bundle_descriptor)
> apache_beam/runners/worker/bundle_processor.py:837: in create_execution_tree
> descriptor.transforms, key=topological_height, reverse=True)
> apache_beam/runners/worker/bundle_processor.py:836: in 
> (transform_id, get_operation(transform_id)) for transform_id in sorted(
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:820: in get_operation
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:820: in 
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:818: in 
> tag: [get_operation(op) for op in pcoll_consumers[pcoll_id]]
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:823: in get_operation
> transform_id, transform_consumers)
> apache_beam/runners/worker/bundle_processor.py:1108: in create_operation
> return creator(self, transform_id, transform_proto, payload, consumers)
> 

[jira] [Commented] (BEAM-9832) KeyError: 'No such coder: ' in fn_runner_test

2020-04-27 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094037#comment-17094037
 ] 

Pablo Estrada commented on BEAM-9832:
-

Okay, I guess it makes sense to roll back. Working on that..

> KeyError: 'No such coder: ' in fn_runner_test
> -
>
> Key: BEAM-9832
> URL: https://issues.apache.org/jira/browse/BEAM-9832
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, test-failures
>Reporter: Ning Kang
>Assignee: Pablo Estrada
>Priority: Critical
>
> Failed test results can be found 
> [here|[https://builds.apache.org/job/beam_PreCommit_Python_Commit/12525/]]
>  
> A stack trace:
> {code:java}
> self = 
>   testMethod=test_read>
> def test_read(self):
>   # Can't use NamedTemporaryFile as a context
>   # due to https://bugs.python.org/issue14243
>   temp_file = tempfile.NamedTemporaryFile(delete=False)
>   try:
> temp_file.write(b'a\nb\nc')
> temp_file.close()
> with self.create_pipeline() as p:
>   assert_that(
> > p | beam.io.ReadFromText(temp_file.name), equal_to(['a', 'b', 
> > 'c']))
> apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:625: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> apache_beam/pipeline.py:529: in __exit__
> self.run().wait_until_finish()
> apache_beam/pipeline.py:502: in run
> self._options).run(False)
> apache_beam/pipeline.py:515: in run
> return self.runner.run_pipeline(self, self._options)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:173: in 
> run_pipeline
> pipeline.to_runner_api(default_environment=self._default_environment))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:183: in 
> run_via_runner_api
> return self.run_stages(stage_context, stages)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:331: in run_stages
> bundle_context_manager,
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:508: in _run_stage
> bundle_manager)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:546: in _run_bundle
> data_input, data_output, input_timers, expected_timer_output)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:927: in 
> process_bundle
> timer_inputs)):
> /usr/lib/python3.5/concurrent/futures/_base.py:556: in result_iterator
> yield future.result()
> /usr/lib/python3.5/concurrent/futures/_base.py:405: in result
> return self.__get_result()
> /usr/lib/python3.5/concurrent/futures/_base.py:357: in __get_result
> raise self._exception
> apache_beam/utils/thread_pool_executor.py:44: in run
> self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:923: in execute
> dry_run)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:826: in 
> process_bundle
> result_future = self._worker_handler.control_conn.push(process_bundle_req)
> apache_beam/runners/portability/fn_api_runner/worker_handlers.py:353: in push
> response = self.worker.do_instruction(request)
> apache_beam/runners/worker/sdk_worker.py:471: in do_instruction
> getattr(request, request_type), request.instruction_id)
> apache_beam/runners/worker/sdk_worker.py:500: in process_bundle
> instruction_id, request.process_bundle_descriptor_id)
> apache_beam/runners/worker/sdk_worker.py:374: in get
> self.data_channel_factory)
> apache_beam/runners/worker/bundle_processor.py:782: in __init__
> self.ops = self.create_execution_tree(self.process_bundle_descriptor)
> apache_beam/runners/worker/bundle_processor.py:837: in create_execution_tree
> descriptor.transforms, key=topological_height, reverse=True)
> apache_beam/runners/worker/bundle_processor.py:836: in 
> (transform_id, get_operation(transform_id)) for transform_id in sorted(
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:820: in get_operation
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:820: in 
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:818: in 
> tag: [get_operation(op) for op in pcoll_consumers[pcoll_id]]
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:823: in get_operation
> transform_id, transform_consumers)
> apache_beam/runners/worker/bundle_processor.py:1108: in create_operation
> return creator(self, transform_id, transform_proto, payload, consumers)
> 

[jira] [Commented] (BEAM-9832) KeyError: 'No such coder: ' in fn_runner_test

2020-04-27 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094030#comment-17094030
 ] 

Pablo Estrada commented on BEAM-9832:
-

Hm I haven't been able to figure out how an erroneous output is appearing 
here...

> KeyError: 'No such coder: ' in fn_runner_test
> -
>
> Key: BEAM-9832
> URL: https://issues.apache.org/jira/browse/BEAM-9832
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, test-failures
>Reporter: Ning Kang
>Assignee: Pablo Estrada
>Priority: Critical
>
> Failed test results can be found 
> [here|[https://builds.apache.org/job/beam_PreCommit_Python_Commit/12525/]]
>  
> A stack trace:
> {code:java}
> self = 
>   testMethod=test_read>
> def test_read(self):
>   # Can't use NamedTemporaryFile as a context
>   # due to https://bugs.python.org/issue14243
>   temp_file = tempfile.NamedTemporaryFile(delete=False)
>   try:
> temp_file.write(b'a\nb\nc')
> temp_file.close()
> with self.create_pipeline() as p:
>   assert_that(
> > p | beam.io.ReadFromText(temp_file.name), equal_to(['a', 'b', 
> > 'c']))
> apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:625: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> apache_beam/pipeline.py:529: in __exit__
> self.run().wait_until_finish()
> apache_beam/pipeline.py:502: in run
> self._options).run(False)
> apache_beam/pipeline.py:515: in run
> return self.runner.run_pipeline(self, self._options)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:173: in 
> run_pipeline
> pipeline.to_runner_api(default_environment=self._default_environment))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:183: in 
> run_via_runner_api
> return self.run_stages(stage_context, stages)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:331: in run_stages
> bundle_context_manager,
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:508: in _run_stage
> bundle_manager)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:546: in _run_bundle
> data_input, data_output, input_timers, expected_timer_output)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:927: in 
> process_bundle
> timer_inputs)):
> /usr/lib/python3.5/concurrent/futures/_base.py:556: in result_iterator
> yield future.result()
> /usr/lib/python3.5/concurrent/futures/_base.py:405: in result
> return self.__get_result()
> /usr/lib/python3.5/concurrent/futures/_base.py:357: in __get_result
> raise self._exception
> apache_beam/utils/thread_pool_executor.py:44: in run
> self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:923: in execute
> dry_run)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:826: in 
> process_bundle
> result_future = self._worker_handler.control_conn.push(process_bundle_req)
> apache_beam/runners/portability/fn_api_runner/worker_handlers.py:353: in push
> response = self.worker.do_instruction(request)
> apache_beam/runners/worker/sdk_worker.py:471: in do_instruction
> getattr(request, request_type), request.instruction_id)
> apache_beam/runners/worker/sdk_worker.py:500: in process_bundle
> instruction_id, request.process_bundle_descriptor_id)
> apache_beam/runners/worker/sdk_worker.py:374: in get
> self.data_channel_factory)
> apache_beam/runners/worker/bundle_processor.py:782: in __init__
> self.ops = self.create_execution_tree(self.process_bundle_descriptor)
> apache_beam/runners/worker/bundle_processor.py:837: in create_execution_tree
> descriptor.transforms, key=topological_height, reverse=True)
> apache_beam/runners/worker/bundle_processor.py:836: in 
> (transform_id, get_operation(transform_id)) for transform_id in sorted(
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:820: in get_operation
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:820: in 
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:818: in 
> tag: [get_operation(op) for op in pcoll_consumers[pcoll_id]]
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:823: in get_operation
> transform_id, transform_consumers)
> apache_beam/runners/worker/bundle_processor.py:1108: in create_operation
> return creator(self, transform_id, transform_proto, payload, consumers)
> 

[jira] [Commented] (BEAM-9832) KeyError: 'No such coder: ' in fn_runner_test

2020-04-27 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093973#comment-17093973
 ] 

Pablo Estrada commented on BEAM-9832:
-

Working on this. Interestingly enough, it's very hard to reproduce this on my 
machine. It takes about 15 runs per failure.

> KeyError: 'No such coder: ' in fn_runner_test
> -
>
> Key: BEAM-9832
> URL: https://issues.apache.org/jira/browse/BEAM-9832
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, test-failures
>Reporter: Ning Kang
>Assignee: Pablo Estrada
>Priority: Critical
>
> Failed test results can be found 
> [here|[https://builds.apache.org/job/beam_PreCommit_Python_Commit/12525/]]
>  
> A stack trace:
> {code:java}
> self = 
>   testMethod=test_read>
> def test_read(self):
>   # Can't use NamedTemporaryFile as a context
>   # due to https://bugs.python.org/issue14243
>   temp_file = tempfile.NamedTemporaryFile(delete=False)
>   try:
> temp_file.write(b'a\nb\nc')
> temp_file.close()
> with self.create_pipeline() as p:
>   assert_that(
> > p | beam.io.ReadFromText(temp_file.name), equal_to(['a', 'b', 
> > 'c']))
> apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:625: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> apache_beam/pipeline.py:529: in __exit__
> self.run().wait_until_finish()
> apache_beam/pipeline.py:502: in run
> self._options).run(False)
> apache_beam/pipeline.py:515: in run
> return self.runner.run_pipeline(self, self._options)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:173: in 
> run_pipeline
> pipeline.to_runner_api(default_environment=self._default_environment))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:183: in 
> run_via_runner_api
> return self.run_stages(stage_context, stages)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:331: in run_stages
> bundle_context_manager,
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:508: in _run_stage
> bundle_manager)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:546: in _run_bundle
> data_input, data_output, input_timers, expected_timer_output)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:927: in 
> process_bundle
> timer_inputs)):
> /usr/lib/python3.5/concurrent/futures/_base.py:556: in result_iterator
> yield future.result()
> /usr/lib/python3.5/concurrent/futures/_base.py:405: in result
> return self.__get_result()
> /usr/lib/python3.5/concurrent/futures/_base.py:357: in __get_result
> raise self._exception
> apache_beam/utils/thread_pool_executor.py:44: in run
> self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:923: in execute
> dry_run)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:826: in 
> process_bundle
> result_future = self._worker_handler.control_conn.push(process_bundle_req)
> apache_beam/runners/portability/fn_api_runner/worker_handlers.py:353: in push
> response = self.worker.do_instruction(request)
> apache_beam/runners/worker/sdk_worker.py:471: in do_instruction
> getattr(request, request_type), request.instruction_id)
> apache_beam/runners/worker/sdk_worker.py:500: in process_bundle
> instruction_id, request.process_bundle_descriptor_id)
> apache_beam/runners/worker/sdk_worker.py:374: in get
> self.data_channel_factory)
> apache_beam/runners/worker/bundle_processor.py:782: in __init__
> self.ops = self.create_execution_tree(self.process_bundle_descriptor)
> apache_beam/runners/worker/bundle_processor.py:837: in create_execution_tree
> descriptor.transforms, key=topological_height, reverse=True)
> apache_beam/runners/worker/bundle_processor.py:836: in 
> (transform_id, get_operation(transform_id)) for transform_id in sorted(
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:820: in get_operation
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:820: in 
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:818: in 
> tag: [get_operation(op) for op in pcoll_consumers[pcoll_id]]
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:823: in get_operation
> transform_id, transform_consumers)
> apache_beam/runners/worker/bundle_processor.py:1108: in create_operation
> return creator(self, transform_id, 

[jira] [Commented] (BEAM-9832) KeyError: 'No such coder: ' in fn_runner_test

2020-04-27 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093947#comment-17093947
 ] 

Pablo Estrada commented on BEAM-9832:
-

Hm sorry about the trouble. Starting to take a look...

> KeyError: 'No such coder: ' in fn_runner_test
> -
>
> Key: BEAM-9832
> URL: https://issues.apache.org/jira/browse/BEAM-9832
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, test-failures
>Reporter: Ning Kang
>Assignee: Pablo Estrada
>Priority: Critical
>
> Failed test results can be found 
> [here|[https://builds.apache.org/job/beam_PreCommit_Python_Commit/12525/]]
>  
> A stack trace:
> {code:java}
> self = 
>   testMethod=test_read>
> def test_read(self):
>   # Can't use NamedTemporaryFile as a context
>   # due to https://bugs.python.org/issue14243
>   temp_file = tempfile.NamedTemporaryFile(delete=False)
>   try:
> temp_file.write(b'a\nb\nc')
> temp_file.close()
> with self.create_pipeline() as p:
>   assert_that(
> > p | beam.io.ReadFromText(temp_file.name), equal_to(['a', 'b', 
> > 'c']))
> apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:625: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> apache_beam/pipeline.py:529: in __exit__
> self.run().wait_until_finish()
> apache_beam/pipeline.py:502: in run
> self._options).run(False)
> apache_beam/pipeline.py:515: in run
> return self.runner.run_pipeline(self, self._options)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:173: in 
> run_pipeline
> pipeline.to_runner_api(default_environment=self._default_environment))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:183: in 
> run_via_runner_api
> return self.run_stages(stage_context, stages)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:331: in run_stages
> bundle_context_manager,
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:508: in _run_stage
> bundle_manager)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:546: in _run_bundle
> data_input, data_output, input_timers, expected_timer_output)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:927: in 
> process_bundle
> timer_inputs)):
> /usr/lib/python3.5/concurrent/futures/_base.py:556: in result_iterator
> yield future.result()
> /usr/lib/python3.5/concurrent/futures/_base.py:405: in result
> return self.__get_result()
> /usr/lib/python3.5/concurrent/futures/_base.py:357: in __get_result
> raise self._exception
> apache_beam/utils/thread_pool_executor.py:44: in run
> self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:923: in execute
> dry_run)
> apache_beam/runners/portability/fn_api_runner/fn_runner.py:826: in 
> process_bundle
> result_future = self._worker_handler.control_conn.push(process_bundle_req)
> apache_beam/runners/portability/fn_api_runner/worker_handlers.py:353: in push
> response = self.worker.do_instruction(request)
> apache_beam/runners/worker/sdk_worker.py:471: in do_instruction
> getattr(request, request_type), request.instruction_id)
> apache_beam/runners/worker/sdk_worker.py:500: in process_bundle
> instruction_id, request.process_bundle_descriptor_id)
> apache_beam/runners/worker/sdk_worker.py:374: in get
> self.data_channel_factory)
> apache_beam/runners/worker/bundle_processor.py:782: in __init__
> self.ops = self.create_execution_tree(self.process_bundle_descriptor)
> apache_beam/runners/worker/bundle_processor.py:837: in create_execution_tree
> descriptor.transforms, key=topological_height, reverse=True)
> apache_beam/runners/worker/bundle_processor.py:836: in 
> (transform_id, get_operation(transform_id)) for transform_id in sorted(
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:820: in get_operation
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:820: in 
> pcoll_id in descriptor.transforms[transform_id].outputs.items()
> apache_beam/runners/worker/bundle_processor.py:818: in 
> tag: [get_operation(op) for op in pcoll_consumers[pcoll_id]]
> apache_beam/runners/worker/bundle_processor.py:726: in wrapper
> result = cache[args] = func(*args)
> apache_beam/runners/worker/bundle_processor.py:823: in get_operation
> transform_id, transform_consumers)
> apache_beam/runners/worker/bundle_processor.py:1108: in create_operation
> return creator(self, transform_id, transform_proto, payload, consumers)
> 

[jira] [Updated] (BEAM-9812) WriteToBigQuery issue causes pipelines with multiple load jobs to work erroneously

2020-04-23 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-9812:

Fix Version/s: 2.21.0

> WriteToBigQuery issue causes pipelines with multiple load jobs to work 
> erroneously
> --
>
> Key: BEAM-9812
> URL: https://issues.apache.org/jira/browse/BEAM-9812
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9812) WriteToBigQuery issue causes pipelines with multiple load jobs to work erroneously

2020-04-23 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-9812:

Status: Open  (was: Triage Needed)

> WriteToBigQuery issue causes pipelines with multiple load jobs to work 
> erroneously
> --
>
> Key: BEAM-9812
> URL: https://issues.apache.org/jira/browse/BEAM-9812
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9812) WriteToBigQuery issue causes pipelines with multiple load jobs to work erroneously

2020-04-23 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9812:
---

 Summary: WriteToBigQuery issue causes pipelines with multiple load 
jobs to work erroneously
 Key: BEAM-9812
 URL: https://issues.apache.org/jira/browse/BEAM-9812
 Project: Beam
  Issue Type: Bug
  Components: io-py-gcp
Reporter: Pablo Estrada
Assignee: Pablo Estrada






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9745) [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to deserialize Custom DoFns and Custom Coders.

2020-04-22 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090157#comment-17090157
 ] 

Pablo Estrada commented on BEAM-9745:
-

Nevermind, on a723148858b421f7df08c74f11432b4aeb2f561c the test still didn't 
pass, but this time they hung for a long time.

> [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to 
> deserialize Custom DoFns and Custom Coders.
> -
>
> Key: BEAM-9745
> URL: https://issues.apache.org/jira/browse/BEAM-9745
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, java-fn-execution, sdk-java-harness, 
> test-failures
>Reporter: Daniel Oliveira
>Assignee: Pablo Estrada
>Priority: Blocker
>  Labels: currently-failing
> Fix For: 2.21.0
>
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4657/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/c3izncsa4u24k/tests/by-project]
> Initial investigation:
> The bug appears to be popping up on BigQuery tests mostly, but also a 
> BigTable and a Datastore test.
> Here's an example stacktrace of the two errors, showing _only_ the error 
> messages themselves. Source: 
> [https://scans.gradle.com/s/c3izncsa4u24k/tests/efn4wciuamvqq-ccxt3jvofvqbe]
> {noformat}
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction -191: 
> java.lang.IllegalArgumentException: unable to deserialize Custom DoFn With 
> Execution Info
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3
> ...
> Caused by: java.lang.RuntimeException: Error received from SDK harness for 
> instruction -191: java.lang.IllegalArgumentException: unable to deserialize 
> Custom DoFn With Execution Info
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction -206: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: java.lang.IllegalArgumentException: unable to deserialize Custom 
> Coder Bytes
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder
> ...
> Caused by: java.lang.RuntimeException: Error received from SDK harness for 
> instruction -206: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: java.lang.IllegalArgumentException: unable to deserialize Custom 
> Coder Bytes
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder
> ...
> {noformat}
> Update: Looks like this has been failing as far back as [Apr 
> 4|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4566/] 
> after a long period where the test was consistently timing out since [Mar 
> 31|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4546/]. 
> So it's hard to narrow down what commit may have caused this. Plus, the test 
> was failing due to a completely different BigQuery failure before anyway, so 
> it seems like this test will need to be completely fixed from scratch, 
> instead of tracking down a specific breaking change.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message 

[jira] [Commented] (BEAM-9745) [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to deserialize Custom DoFns and Custom Coders.

2020-04-22 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090105#comment-17090105
 ] 

Pablo Estrada commented on BEAM-9745:
-

FWIW, this issue only seems to be affecting the Java SDK harness when running 
in portable mode, which is not a use case that we currently support, so I'm 
leaning towards "not a release blocker".

 

I'm bisecting the commit which causes this issue. For my records:
 * Running on a723148858b421f7df08c74f11432b4aeb2f561c , the issue does not 
reproduce.
 * Running on 7869455ff38ce4791c5531022ffb75e7f007e06e, the issue reproduces

> [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to 
> deserialize Custom DoFns and Custom Coders.
> -
>
> Key: BEAM-9745
> URL: https://issues.apache.org/jira/browse/BEAM-9745
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, java-fn-execution, sdk-java-harness, 
> test-failures
>Reporter: Daniel Oliveira
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Blocker
>  Labels: currently-failing
> Fix For: 2.21.0
>
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4657/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/c3izncsa4u24k/tests/by-project]
> Initial investigation:
> The bug appears to be popping up on BigQuery tests mostly, but also a 
> BigTable and a Datastore test.
> Here's an example stacktrace of the two errors, showing _only_ the error 
> messages themselves. Source: 
> [https://scans.gradle.com/s/c3izncsa4u24k/tests/efn4wciuamvqq-ccxt3jvofvqbe]
> {noformat}
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction -191: 
> java.lang.IllegalArgumentException: unable to deserialize Custom DoFn With 
> Execution Info
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3
> ...
> Caused by: java.lang.RuntimeException: Error received from SDK harness for 
> instruction -191: java.lang.IllegalArgumentException: unable to deserialize 
> Custom DoFn With Execution Info
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction -206: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: java.lang.IllegalArgumentException: unable to deserialize Custom 
> Coder Bytes
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder
> ...
> Caused by: java.lang.RuntimeException: Error received from SDK harness for 
> instruction -206: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: java.lang.IllegalArgumentException: unable to deserialize Custom 
> Coder Bytes
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder
> ...
> {noformat}
> Update: Looks like this has been failing as far back as [Apr 
> 4|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4566/] 
> after a long period where the test was consistently timing out since [Mar 
> 31|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4546/]. 
> So it's hard to narrow down what commit may have caused this. Plus, the test 
> was failing due to a completely different BigQuery failure before anyway, so 
> it seems like this test will need to be completely fixed from scratch, 
> instead of tracking down a specific breaking change.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  

[jira] [Created] (BEAM-9787) Send clear error to users trying to use BigQuerySource on FnApi pipelines on Python SDK

2020-04-20 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9787:
---

 Summary: Send clear error to users trying to use BigQuerySource on 
FnApi pipelines on Python SDK
 Key: BEAM-9787
 URL: https://issues.apache.org/jira/browse/BEAM-9787
 Project: Beam
  Issue Type: Bug
  Components: io-py-gcp
Reporter: Pablo Estrada
Assignee: Pablo Estrada






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9745) [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to deserialize Custom DoFns and Custom Coders.

2020-04-17 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086076#comment-17086076
 ] 

Pablo Estrada edited comment on BEAM-9745 at 4/17/20, 9:24 PM:
---

This suite seems to have been broken since April 3, if not more. See 
:[https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/]

 


was (Author: pabloem):
This suite seems to have been broken since April 3, if not more. See 
:[https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/]

!https://screenshot.googleplex.com/94iMFxiE3X5.png!

> [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to 
> deserialize Custom DoFns and Custom Coders.
> -
>
> Key: BEAM-9745
> URL: https://issues.apache.org/jira/browse/BEAM-9745
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, java-fn-execution, sdk-java-harness, 
> test-failures
>Reporter: Daniel Oliveira
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Blocker
>  Labels: currently-failing
> Fix For: 2.21.0
>
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4657/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/c3izncsa4u24k/tests/by-project]
> Initial investigation:
> The bug appears to be popping up on BigQuery tests mostly, but also a 
> BigTable and a Datastore test.
> Here's an example stacktrace of the two errors, showing _only_ the error 
> messages themselves. Source: 
> [https://scans.gradle.com/s/c3izncsa4u24k/tests/efn4wciuamvqq-ccxt3jvofvqbe]
> {noformat}
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction -191: 
> java.lang.IllegalArgumentException: unable to deserialize Custom DoFn With 
> Execution Info
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3
> ...
> Caused by: java.lang.RuntimeException: Error received from SDK harness for 
> instruction -191: java.lang.IllegalArgumentException: unable to deserialize 
> Custom DoFn With Execution Info
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction -206: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: java.lang.IllegalArgumentException: unable to deserialize Custom 
> Coder Bytes
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder
> ...
> Caused by: java.lang.RuntimeException: Error received from SDK harness for 
> instruction -206: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: java.lang.IllegalArgumentException: unable to deserialize Custom 
> Coder Bytes
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder
> ...
> {noformat}
> Update: Looks like this has been failing as far back as [Apr 
> 4|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4566/] 
> after a long period where the test was consistently timing out since [Mar 
> 31|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4546/]. 
> So it's hard to narrow down what commit may have caused this. Plus, the test 
> was failing due to a completely different BigQuery failure before anyway, so 
> it seems like this test will need to be completely fixed from scratch, 
> instead of tracking down a specific breaking change.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should 

[jira] [Updated] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python

2020-04-15 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-9769:

Fix Version/s: 2.21.0

> Ensure JSON imports are the default behavior for BigQuerySink and 
> WriteToBigQuery in Python
> ---
>
> Key: BEAM-9769
> URL: https://issues.apache.org/jira/browse/BEAM-9769
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
> Fix For: 2.21.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python

2020-04-15 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-9769:

Status: Open  (was: Triage Needed)

> Ensure JSON imports are the default behavior for BigQuerySink and 
> WriteToBigQuery in Python
> ---
>
> Key: BEAM-9769
> URL: https://issues.apache.org/jira/browse/BEAM-9769
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python

2020-04-15 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9769:
---

 Summary: Ensure JSON imports are the default behavior for 
BigQuerySink and WriteToBigQuery in Python
 Key: BEAM-9769
 URL: https://issues.apache.org/jira/browse/BEAM-9769
 Project: Beam
  Issue Type: Bug
  Components: io-py-gcp
Reporter: Pablo Estrada
Assignee: Pablo Estrada






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9763) Make _ReadFromBigQuery public

2020-04-15 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-9763:

Description: This means removing the underscore from it, but keeping it 
tagged as experimental.

> Make _ReadFromBigQuery public
> -
>
> Key: BEAM-9763
> URL: https://issues.apache.org/jira/browse/BEAM-9763
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>
> This means removing the underscore from it, but keeping it tagged as 
> experimental.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9763) Make _ReadFromBigQuery public

2020-04-15 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9763:
---

 Summary: Make _ReadFromBigQuery public
 Key: BEAM-9763
 URL: https://issues.apache.org/jira/browse/BEAM-9763
 Project: Beam
  Issue Type: Sub-task
  Components: io-py-gcp
Reporter: Pablo Estrada
Assignee: Pablo Estrada






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9762) Production-ready BigQuery IOs for Python SDK

2020-04-15 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-9762:

Status: Open  (was: Triage Needed)

> Production-ready BigQuery IOs for Python SDK
> 
>
> Key: BEAM-9762
> URL: https://issues.apache.org/jira/browse/BEAM-9762
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>
> This issue is to track the elements of work necessary to make the new 
> BigQuery IOs (WriteToBigQuery/FileLoads, WriteToBigQuery/StreamingInserts, 
> _ReadFromBigQuery)  fully productionized, and the main, encouraged way of 
> using BQ with Beam Python SDK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9763) Make _ReadFromBigQuery public

2020-04-15 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-9763:

Status: Open  (was: Triage Needed)

> Make _ReadFromBigQuery public
> -
>
> Key: BEAM-9763
> URL: https://issues.apache.org/jira/browse/BEAM-9763
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9762) Production-ready BigQuery IOs for Python SDK

2020-04-15 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9762:
---

 Summary: Production-ready BigQuery IOs for Python SDK
 Key: BEAM-9762
 URL: https://issues.apache.org/jira/browse/BEAM-9762
 Project: Beam
  Issue Type: Bug
  Components: io-py-gcp
Reporter: Pablo Estrada
Assignee: Pablo Estrada


This issue is to track the elements of work necessary to make the new BigQuery 
IOs (WriteToBigQuery/FileLoads, WriteToBigQuery/StreamingInserts, 
_ReadFromBigQuery)  fully productionized, and the main, encouraged way of using 
BQ with Beam Python SDK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9741) Split timers to multiple workers in ParallelBundleManager

2020-04-10 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9741:
---

 Summary: Split timers to multiple workers in ParallelBundleManager
 Key: BEAM-9741
 URL: https://issues.apache.org/jira/browse/BEAM-9741
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-py-core
Reporter: Pablo Estrada






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9494) Remove workaround for BQ transform for Dataflow

2020-04-08 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078776#comment-17078776
 ] 

Pablo Estrada commented on BEAM-9494:
-

Not resolved, but we can remove the fix version.

> Remove workaround for BQ transform for Dataflow
> ---
>
> Key: BEAM-9494
> URL: https://issues.apache.org/jira/browse/BEAM-9494
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Luke Cwik
>Assignee: Pablo Estrada
>Priority: Minor
> Fix For: 2.21.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Dataflow incorrectly uses the Flatten input PCollection coder when it 
> performs an optimization instead of the output PCollection coder which can 
> lead to issues if these coders differ.
>  
> The workaround was introduced in [https://github.com/apache/beam/pull/11103]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9494) Remove workaround for BQ transform for Dataflow

2020-04-08 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-9494:

Fix Version/s: (was: 2.21.0)

> Remove workaround for BQ transform for Dataflow
> ---
>
> Key: BEAM-9494
> URL: https://issues.apache.org/jira/browse/BEAM-9494
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Luke Cwik
>Assignee: Pablo Estrada
>Priority: Minor
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Dataflow incorrectly uses the Flatten input PCollection coder when it 
> performs an optimization instead of the output PCollection coder which can 
> lead to issues if these coders differ.
>  
> The workaround was introduced in [https://github.com/apache/beam/pull/11103]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9691) Ensure Dataflow BQ Native sink are not used on FnApi

2020-04-07 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-9691.
-
Fix Version/s: 2.21.0
   Resolution: Fixed

> Ensure Dataflow BQ Native sink are not used on FnApi
> 
>
> Key: BEAM-9691
> URL: https://issues.apache.org/jira/browse/BEAM-9691
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9715) annotations_test fails in some environmens

2020-04-06 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-9715:

Priority: Minor  (was: Major)

> annotations_test fails in some environmens
> --
>
> Key: BEAM-9715
> URL: https://issues.apache.org/jira/browse/BEAM-9715
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9715) annotations_test fails in some environmens

2020-04-06 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-9715:

Status: Open  (was: Triage Needed)

> annotations_test fails in some environmens
> --
>
> Key: BEAM-9715
> URL: https://issues.apache.org/jira/browse/BEAM-9715
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9715) annotations_test fails in some environmens

2020-04-06 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9715:
---

 Summary: annotations_test fails in some environmens
 Key: BEAM-9715
 URL: https://issues.apache.org/jira/browse/BEAM-9715
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Pablo Estrada
Assignee: Pablo Estrada






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9691) Ensure Dataflow BQ Native sink are not used on FnApi

2020-04-03 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-9691:

Status: Open  (was: Triage Needed)

> Ensure Dataflow BQ Native sink are not used on FnApi
> 
>
> Key: BEAM-9691
> URL: https://issues.apache.org/jira/browse/BEAM-9691
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9691) Ensure Dataflow BQ Native sink are not used on FnApi

2020-04-03 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074895#comment-17074895
 ] 

Pablo Estrada commented on BEAM-9691:
-

Ah, in fact this is for native sink, not source.

I have a PR to have avro export for the custom source, but it changes the 
transform somewhat, so it wouldn't be backwards compatible: 
[https://github.com/apache/beam/pull/11086]

We're still discussing what should be the approach for that.

> Ensure Dataflow BQ Native sink are not used on FnApi
> 
>
> Key: BEAM-9691
> URL: https://issues.apache.org/jira/browse/BEAM-9691
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9691) Ensure Dataflow BQ Native sink are not used on FnApi

2020-04-03 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-9691:

Summary: Ensure Dataflow BQ Native sink are not used on FnApi  (was: Ensure 
Dataflow BQ Native sources are not used on FnApi)

> Ensure Dataflow BQ Native sink are not used on FnApi
> 
>
> Key: BEAM-9691
> URL: https://issues.apache.org/jira/browse/BEAM-9691
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9691) Ensure Dataflow BQ Native sources are not used on FnApi

2020-04-03 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074892#comment-17074892
 ] 

Pablo Estrada commented on BEAM-9691:
-

I think the title of the issue is too wide. Sorry about that. This is only 
meant for now, and for BQSource. - but for BQ IO we're using a Beam Custom 
source, as the native source is not supported by UW. I do not know if it's 
supported by the java runner harness.

> Ensure Dataflow BQ Native sources are not used on FnApi
> ---
>
> Key: BEAM-9691
> URL: https://issues.apache.org/jira/browse/BEAM-9691
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9691) Ensure Dataflow BQ Native sources are not used on FnApi

2020-04-03 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-9691:

Summary: Ensure Dataflow BQ Native sources are not used on FnApi  (was: 
Ensure Dataflow Native sources are not used on FnApi)

> Ensure Dataflow BQ Native sources are not used on FnApi
> ---
>
> Key: BEAM-9691
> URL: https://issues.apache.org/jira/browse/BEAM-9691
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9691) Ensure Dataflow Native sources are not used on FnApi

2020-04-03 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-9691:

Summary: Ensure Dataflow Native sources are not used on FnApi  (was: Ensure 
Dataflow Native sources are not used on FnApi ever)

> Ensure Dataflow Native sources are not used on FnApi
> 
>
> Key: BEAM-9691
> URL: https://issues.apache.org/jira/browse/BEAM-9691
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9691) Ensure Dataflow Native sources are not used on FnApi ever

2020-04-03 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9691:
---

 Summary: Ensure Dataflow Native sources are not used on FnApi ever
 Key: BEAM-9691
 URL: https://issues.apache.org/jira/browse/BEAM-9691
 Project: Beam
  Issue Type: Bug
  Components: io-py-gcp
Reporter: Pablo Estrada
Assignee: Pablo Estrada






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9640) Track PCollection watermark across bundle executions

2020-03-30 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9640:
---

 Summary: Track PCollection watermark across bundle executions
 Key: BEAM-9640
 URL: https://issues.apache.org/jira/browse/BEAM-9640
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-py-core
Reporter: Pablo Estrada
Assignee: Pablo Estrada


This can be done without relying on the watermark manager for execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9639) Abstract bundle execution logic from stage execution logic

2020-03-30 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-9639:

Status: Open  (was: Triage Needed)

> Abstract bundle execution logic from stage execution logic
> --
>
> Key: BEAM-9639
> URL: https://issues.apache.org/jira/browse/BEAM-9639
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>
> The FnApiRunner currently works on a per-stage manner, and does not abstract 
> single-bundle execution much. This work item is to clearly define the code to 
> execute a single bundle.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9639) Abstract bundle execution logic from stage execution logic

2020-03-30 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9639:
---

 Summary: Abstract bundle execution logic from stage execution logic
 Key: BEAM-9639
 URL: https://issues.apache.org/jira/browse/BEAM-9639
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-py-core
Reporter: Pablo Estrada
Assignee: Pablo Estrada


The FnApiRunner currently works on a per-stage manner, and does not abstract 
single-bundle execution much. This work item is to clearly define the code to 
execute a single bundle.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9608) Add context managers for FnApiRunner to manage execution of each bundle

2020-03-30 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-9608.
-
Fix Version/s: 2.21.0
   Resolution: Fixed

> Add context managers for FnApiRunner to manage execution of each bundle
> ---
>
> Key: BEAM-9608
> URL: https://issues.apache.org/jira/browse/BEAM-9608
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9625) StateServicer should be owned by FnApiRunnerContextManager

2020-03-27 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-9625:

Status: Open  (was: Triage Needed)

> StateServicer should be owned by FnApiRunnerContextManager
> --
>
> Key: BEAM-9625
> URL: https://issues.apache.org/jira/browse/BEAM-9625
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9625) StateServicer should be owned by FnApiRunnerContextManager

2020-03-27 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9625:
---

 Summary: StateServicer should be owned by FnApiRunnerContextManager
 Key: BEAM-9625
 URL: https://issues.apache.org/jira/browse/BEAM-9625
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-py-core
Reporter: Pablo Estrada
Assignee: Pablo Estrada






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9608) Add context managers for FnApiRunner to manage execution of each bundle

2020-03-25 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-9608:

Status: Open  (was: Triage Needed)

> Add context managers for FnApiRunner to manage execution of each bundle
> ---
>
> Key: BEAM-9608
> URL: https://issues.apache.org/jira/browse/BEAM-9608
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9537) Refactor FnApiRunner into its own package

2020-03-25 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-9537.
-
Fix Version/s: 2.21.0
   Resolution: Fixed

> Refactor FnApiRunner into its own package
> -
>
> Key: BEAM-9537
> URL: https://issues.apache.org/jira/browse/BEAM-9537
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Minor
> Fix For: 2.21.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9608) Add context managers for FnApiRunner to manage execution of each bundle

2020-03-25 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9608:
---

 Summary: Add context managers for FnApiRunner to manage execution 
of each bundle
 Key: BEAM-9608
 URL: https://issues.apache.org/jira/browse/BEAM-9608
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-py-core
Reporter: Pablo Estrada
Assignee: Pablo Estrada






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9601) Interactive test_streaming_wordcount failing

2020-03-24 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9601:
---

 Summary: Interactive test_streaming_wordcount failing
 Key: BEAM-9601
 URL: https://issues.apache.org/jira/browse/BEAM-9601
 Project: Beam
  Issue Type: Bug
  Components: runner-py-interactive, test-failures
Reporter: Pablo Estrada
Assignee: Sam Rohde






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9598) _CustomBigQuerySource checks valueprovider when it's not needed

2020-03-24 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9598:
---

 Summary: _CustomBigQuerySource checks valueprovider when it's not 
needed
 Key: BEAM-9598
 URL: https://issues.apache.org/jira/browse/BEAM-9598
 Project: Beam
  Issue Type: Bug
  Components: io-py-gcp, test-failures
Reporter: Pablo Estrada
Assignee: Pablo Estrada






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9572) Allow RuntimeValueProviders for WriteToBigQuery transform (dataflow templates)

2020-03-24 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-9572:

Priority: Minor  (was: Critical)

> Allow RuntimeValueProviders for WriteToBigQuery transform (dataflow templates)
> --
>
> Key: BEAM-9572
> URL: https://issues.apache.org/jira/browse/BEAM-9572
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.19.0
>Reporter: José Miguel Rebelo
>Assignee: Pablo Estrada
>Priority: Minor
> Attachments: error.PNG
>
>
> the *WriteToBigQuery* transform should be able to receive the *table* 
> parameter as a *ValueProvider.* 
>  
> When I try to *create and stage* the pipeline with the table parameter as a 
> ValueProvider, I get the error attached
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9572) Allow RuntimeValueProviders for WriteToBigQuery transform (dataflow templates)

2020-03-24 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066016#comment-17066016
 ] 

Pablo Estrada commented on BEAM-9572:
-

Did you pass the pipeline option `experiments=use_beam_bq_sink`? If so, the 
pipeline should work. You shouldn't be calling `.get` on the value provider, 
just passing it to the transform.

By passing `WriteToBigQuery(table=extraction_options.tableId)`, it should be 
enough.

> Allow RuntimeValueProviders for WriteToBigQuery transform (dataflow templates)
> --
>
> Key: BEAM-9572
> URL: https://issues.apache.org/jira/browse/BEAM-9572
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.19.0
>Reporter: José Miguel Rebelo
>Assignee: Pablo Estrada
>Priority: Critical
> Attachments: error.PNG
>
>
> the *WriteToBigQuery* transform should be able to receive the *table* 
> parameter as a *ValueProvider.* 
>  
> When I try to *create and stage* the pipeline with the table parameter as a 
> ValueProvider, I get the error attached
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9572) Allow RuntimeValueProviders for WriteToBigQuery transform (dataflow templates)

2020-03-24 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-9572.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> Allow RuntimeValueProviders for WriteToBigQuery transform (dataflow templates)
> --
>
> Key: BEAM-9572
> URL: https://issues.apache.org/jira/browse/BEAM-9572
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.19.0
>Reporter: José Miguel Rebelo
>Assignee: Pablo Estrada
>Priority: Minor
> Fix For: Not applicable
>
> Attachments: error.PNG
>
>
> the *WriteToBigQuery* transform should be able to receive the *table* 
> parameter as a *ValueProvider.* 
>  
> When I try to *create and stage* the pipeline with the table parameter as a 
> ValueProvider, I get the error attached
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9572) Allow RuntimeValueProviders for WriteToBigQuery transform (dataflow templates)

2020-03-24 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-9572:

Component/s: (was: beam-model)
 (was: beam-community)
 io-py-gcp

> Allow RuntimeValueProviders for WriteToBigQuery transform (dataflow templates)
> --
>
> Key: BEAM-9572
> URL: https://issues.apache.org/jira/browse/BEAM-9572
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.19.0
>Reporter: José Miguel Rebelo
>Assignee: Pablo Estrada
>Priority: Critical
> Attachments: error.PNG
>
>
> the *WriteToBigQuery* transform should be able to receive the *table* 
> parameter as a *ValueProvider.* 
>  
> When I try to *create and stage* the pipeline with the table parameter as a 
> ValueProvider, I get the error attached
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-2572) Implement an S3 filesystem for Python SDK

2020-03-18 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061974#comment-17061974
 ] 

Pablo Estrada commented on BEAM-2572:
-

Thanks for catching that Badrul! Would you file a JIRA to update them or
just update them?

On Mon, Mar 16, 2020 at 2:32 PM Badrul Chowdhury (Jira) 



> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>  Labels: GSoC2019, gsoc, gsoc2019, mentor, outreachy19dec
> Fix For: 2.19.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9537) Refactor FnApiRunner into its own package

2020-03-17 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9537:
---

 Summary: Refactor FnApiRunner into its own package
 Key: BEAM-9537
 URL: https://issues.apache.org/jira/browse/BEAM-9537
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-py-core
Reporter: Pablo Estrada
Assignee: Pablo Estrada






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9533) Replace *-gcp/*-aws tox suites with *-cloud suites to run unit tests for both

2020-03-17 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9533:
---

 Summary: Replace *-gcp/*-aws tox suites with *-cloud suites to run 
unit tests for both
 Key: BEAM-9533
 URL: https://issues.apache.org/jira/browse/BEAM-9533
 Project: Beam
  Issue Type: Bug
  Components: testing
Reporter: Pablo Estrada
Assignee: Pablo Estrada


Currently there are `py37-gcp`, py37-aws test suites. Let's consolidate all of 
them into py37-cloud, along with other py35-gcp, py27-gcp, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9484) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests is flaky in DirectRunner Postcommits

2020-03-17 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061257#comment-17061257
 ] 

Pablo Estrada edited comment on BEAM-9484 at 3/17/20, 11:34 PM:


[~chamikara] lmk if you need a hand looking into this, or if you got it. Feel 
free to assign to me too.


was (Author: pabloem):
[~chamikara] lmk if you need a hand looking into this, or if you got it.

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests is 
> flaky in DirectRunner Postcommits
> 
>
> Key: BEAM-9484
> URL: https://issues.apache.org/jira/browse/BEAM-9484
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>
> From https://builds.apache.org/job/beam_PostCommit_Python37_PR/99/: 
> {noformat}
>  ==
> 04:40:28  FAIL: test_big_query_write 
> (apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests)
> 04:40:28  
> --
> 04:40:28  Traceback (most recent call last):
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/bigquery_write_it_test.py",
>  line 167, in test_big_query_write
> 04:40:28  write_disposition=beam.io.BigQueryDisposition.WRITE_EMPTY))
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 522, in __exit__
> 04:40:28  self.run().wait_until_finish()
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 495, in run
> 04:40:28  self._options).run(False)
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 508, in run
> 04:40:28  return self.runner.run_pipeline(self, self._options)
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
>  line 53, in run_pipeline
> 04:40:28  hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 04:40:28  AssertionError: 
> 04:40:28  Expected: (Expected data is [(1, 'abc'), (2, 'def'), (3, '你好'), (4, 
> 'привет')])
> 04:40:28   but: Expected data is [(1, 'abc'), (2, 'def'), (3, '你好'), (4, 
> 'привет')] Actual data is []
> 04:40:28  
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9484) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests is flaky in DirectRunner Postcommits

2020-03-17 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061257#comment-17061257
 ] 

Pablo Estrada commented on BEAM-9484:
-

[~chamikara] lmk if you need a hand looking into this, or if you got it.

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests is 
> flaky in DirectRunner Postcommits
> 
>
> Key: BEAM-9484
> URL: https://issues.apache.org/jira/browse/BEAM-9484
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>
> From https://builds.apache.org/job/beam_PostCommit_Python37_PR/99/: 
> {noformat}
>  ==
> 04:40:28  FAIL: test_big_query_write 
> (apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests)
> 04:40:28  
> --
> 04:40:28  Traceback (most recent call last):
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/bigquery_write_it_test.py",
>  line 167, in test_big_query_write
> 04:40:28  write_disposition=beam.io.BigQueryDisposition.WRITE_EMPTY))
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 522, in __exit__
> 04:40:28  self.run().wait_until_finish()
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 495, in run
> 04:40:28  self._options).run(False)
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 508, in run
> 04:40:28  return self.runner.run_pipeline(self, self._options)
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
>  line 53, in run_pipeline
> 04:40:28  hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 04:40:28  AssertionError: 
> 04:40:28  Expected: (Expected data is [(1, 'abc'), (2, 'def'), (3, '你好'), (4, 
> 'привет')])
> 04:40:28   but: Expected data is [(1, 'abc'), (2, 'def'), (3, '你好'), (4, 
> 'привет')] Actual data is []
> 04:40:28  
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9532) test_last_updated is failing in s3io_test

2020-03-17 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061253#comment-17061253
 ] 

Pablo Estrada commented on BEAM-9532:
-

cc: [~mattmorgis] fyi

> test_last_updated is failing in s3io_test
> -
>
> Key: BEAM-9532
> URL: https://issues.apache.org/jira/browse/BEAM-9532
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-aws
>Reporter: Pablo Estrada
>Priority: Major
>
> The timestamps are not set appropriately. For some reason they are one hour 
> away from each other:
>  
> ==
> FAIL: test_last_updated (apache_beam.io.aws.s3io_test.TestS3IO)
> --
> Traceback (most recent call last):
>  File "/home/pabloem/codes/meam/sdks/python/apache_beam/io/aws/s3io_test.py", 
> line 125, in test_last_updated
>  self.assertAlmostEqual(result, time.time(), delta=tolerance)
> AssertionError: 1584481946.282874 != 1584485546.2829826 within 300 delta
> --
>  
> Note that 1584481946.282874 - 1584485546.2829826 is 3600



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9532) test_last_updated is failing in s3io_test

2020-03-17 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9532:
---

 Summary: test_last_updated is failing in s3io_test
 Key: BEAM-9532
 URL: https://issues.apache.org/jira/browse/BEAM-9532
 Project: Beam
  Issue Type: Bug
  Components: io-py-aws
Reporter: Pablo Estrada


The timestamps are not set appropriately. For some reason they are one hour 
away from each other:

 

==
FAIL: test_last_updated (apache_beam.io.aws.s3io_test.TestS3IO)
--
Traceback (most recent call last):
 File "/home/pabloem/codes/meam/sdks/python/apache_beam/io/aws/s3io_test.py", 
line 125, in test_last_updated
 self.assertAlmostEqual(result, time.time(), delta=tolerance)
AssertionError: 1584481946.282874 != 1584485546.2829826 within 300 delta

--

 

Note that 1584481946.282874 - 1584485546.2829826 is 3600



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9494) Remove workaround for BQ transform for Dataflow

2020-03-17 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-9494:

Fix Version/s: (was: 2.20.0)
   2.21.0

> Remove workaround for BQ transform for Dataflow
> ---
>
> Key: BEAM-9494
> URL: https://issues.apache.org/jira/browse/BEAM-9494
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Luke Cwik
>Assignee: Pablo Estrada
>Priority: Minor
> Fix For: 2.21.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Dataflow incorrectly uses the Flatten input PCollection coder when it 
> performs an optimization instead of the output PCollection coder which can 
> lead to issues if these coders differ.
>  
> The workaround was introduced in [https://github.com/apache/beam/pull/11103]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   >