[jira] [Updated] (BEAM-2595) WriteToBigQuery does not work with nested json schema

2017-07-11 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-2595:
--
Fix Version/s: 2.1.0

> WriteToBigQuery does not work with nested json schema
> -
>
> Key: BEAM-2595
> URL: https://issues.apache.org/jira/browse/BEAM-2595
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 2.1.0
> Environment: mac os local runner, Python
>Reporter: Andrea Pierleoni
>Assignee: Sourabh Bajaj
>Priority: Minor
>  Labels: gcp
> Fix For: 2.1.0
>
>
> I am trying to use the new `WriteToBigQuery` PTransform added to 
> `apache_beam.io.gcp.bigquery` in version 2.1.0-RC1
> I need to write to a bigquery table with nested fields.
> The only way to specify nested schemas in bigquery is with teh json schema.
> None of the classes in `apache_beam.io.gcp.bigquery` are able to parse the 
> json schema, but they accept a schema as an instance of the class 
> `apache_beam.io.gcp.internal.clients.bigquery.TableFieldSchema`
> I am composing the `TableFieldSchema` as suggested here 
> [https://stackoverflow.com/questions/36127537/json-table-schema-to-bigquery-tableschema-for-bigquerysink/45039436#45039436],
>  and it looks fine when passed to the PTransform `WriteToBigQuery`. 
> The problem is that the base class `PTransformWithSideInputs` try to pickle 
> and unpickle the function 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/ptransform.py#L515]
>   (that includes the TableFieldSchema instance) and for some reason when the 
> class is unpickled some `FieldList` instance are converted to simple lists, 
> and the pickling validation fails.
> Would it be possible to extend the test coverage to nested json objects for 
> bigquery?
> They are also relatively easy to parse into a TableFieldSchema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-2595) WriteToBigQuery does not work with nested json schema

2017-07-11 Thread Andrea Pierleoni (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrea Pierleoni updated BEAM-2595:
---
Description: 
I am trying to use the new `WriteToBigQuery` PTransform added to 
`apache_beam.io.gcp.bigquery` in version 2.1.0-RC1

I need to write to a bigquery table with nested fields.
The only way to specify nested schemas in bigquery is with teh json schema.
None of the classes in `apache_beam.io.gcp.bigquery` are able to parse the json 
schema, but they accept a schema as an instance of the class 
`apache_beam.io.gcp.internal.clients.bigquery.TableFieldSchema`

I am composing the `TableFieldSchema` as suggested here 
[https://stackoverflow.com/questions/36127537/json-table-schema-to-bigquery-tableschema-for-bigquerysink/45039436#45039436],
 and it looks fine when passed to the PTransform `WriteToBigQuery`. 

The problem is that the base class `PTransformWithSideInputs` try to pickle and 
unpickle the function 
[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/ptransform.py#L515]
  (that includes the TableFieldSchema instance) and for some reason when the 
class is unpickled some `FieldList` instance are converted to simple lists, and 
the pickling validation fails.

Would it be possible to extend the test coverage to nested json objects for 
bigquery?
They are also relatively easy to parse into a TableFieldSchema.


  was:
I am trying to use the new `WriteToBigQuery` PTransform added to 
`apache_beam.io.gcp.bigquery` in version 2.1.0-RC1

I need to write to a bigquery table with nested fields.
The only way to specify nested schemas in bigquery is with teh json schema.
None of the classes in `apache_beam.io.gcp.bigquery` are able to parse the json 
schema, but they accept a schema as an instance of the class 
`apache_beam.io.gcp.internal.clients.bigquery.TableFieldSchema`

I am composing the `TableFieldSchema` as suggested 
[here](https://stackoverflow.com/questions/36127537/json-table-schema-to-bigquery-tableschema-for-bigquerysink/45039436#45039436),
 and it looks fine when passed to the PTransform `WriteToBigQuery`. 

The problem is that the base class `PTransformWithSideInputs` try to [pickle 
and unpickle the 
function](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/ptransform.py#L515)
  (that includes the TableFieldSchema instance) and for some reason when the 
class is unpickled some `FieldList` instance are converted to simple lists, and 
the pickling validation fails.

Would it be possible to extend the test coverage to nested json objects for 
bigquery?
They are also relatively easy to parse into a TableFieldSchema.



> WriteToBigQuery does not work with nested json schema
> -
>
> Key: BEAM-2595
> URL: https://issues.apache.org/jira/browse/BEAM-2595
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.1.0
> Environment: mac os local runner, Python
>Reporter: Andrea Pierleoni
>Assignee: Thomas Groh
>Priority: Minor
>  Labels: gcp
>
> I am trying to use the new `WriteToBigQuery` PTransform added to 
> `apache_beam.io.gcp.bigquery` in version 2.1.0-RC1
> I need to write to a bigquery table with nested fields.
> The only way to specify nested schemas in bigquery is with teh json schema.
> None of the classes in `apache_beam.io.gcp.bigquery` are able to parse the 
> json schema, but they accept a schema as an instance of the class 
> `apache_beam.io.gcp.internal.clients.bigquery.TableFieldSchema`
> I am composing the `TableFieldSchema` as suggested here 
> [https://stackoverflow.com/questions/36127537/json-table-schema-to-bigquery-tableschema-for-bigquerysink/45039436#45039436],
>  and it looks fine when passed to the PTransform `WriteToBigQuery`. 
> The problem is that the base class `PTransformWithSideInputs` try to pickle 
> and unpickle the function 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/ptransform.py#L515]
>   (that includes the TableFieldSchema instance) and for some reason when the 
> class is unpickled some `FieldList` instance are converted to simple lists, 
> and the pickling validation fails.
> Would it be possible to extend the test coverage to nested json objects for 
> bigquery?
> They are also relatively easy to parse into a TableFieldSchema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)