[jira] [Commented] (ARROW-6737) Nested column branch had multiple children

2019-10-02 Thread Joris Van den Bossche (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942567#comment-16942567
 ] 

Joris Van den Bossche commented on ARROW-6737:
--

I noticed that reading this file on master actually gives problems, while it 
works on 0.14.1, so opened ARROW-6762 for that.

> Nested column branch had multiple children
> --
>
> Key: ARROW-6737
> URL: https://issues.apache.org/jira/browse/ARROW-6737
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: harikrishnan
>Priority: Major
> Attachments: SampleRecord.jl
>
>
> {code}
> from pyarrow import json
> import pyarrow.parquet as pq
> r = json.read_json('example.jl')
> pq.write_table(r, 'example.parquet')
> {code}
> Doing the above operation resulting in {{ArrowInvalid: Nested column branch 
> had multiple children}}
> Posting it here as per the request from 
> https://github.com/apache/arrow/issues/4045#issuecomment-535867640
> The sample schema looks like this
> {code}
> package_version: string
> source_version: string
> uuid: string
> _type: string
> position: struct draught_raw: null, heading: double, lat: double, lon: double, nav_state: 
> int64, received_time: timestamp[s], speed: double>
>  child 0, ais_type: string
>  child 1, course: double
>  child 2, draught: double
>  child 3, draught_raw: null
>  child 4, heading: double
>  child 5, lat: double
>  child 6, lon: double
>  child 7, nav_state: int64
>  child 8, received_time: timestamp[s]
>  child 9, speed: double
> provider_name: string
> vessel: struct null, dwt: null, flag_code: null, flag_name: string, gross_tonnage: null, 
> imo: string, length: null, mmsi: string, name: string, type: null, 
> vessel_type: string>
>  child 0, beam: null
>  child 1, build_year: null
>  child 2, call_sign: string
>  child 3, dead_weight: null
>  child 4, dwt: null
>  child 5, flag_code: null
>  child 6, flag_name: string
>  child 7, gross_tonnage: null
>  child 8, imo: string
>  child 9, length: null
>  child 10, mmsi: string
>  child 11, name: string
>  child 12, type: null
>  child 13, vessel_type: string
> source_provider: string
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6737) Nested column branch had multiple children

2019-10-02 Thread Joris Van den Bossche (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942560#comment-16942560
 ] 

Joris Van den Bossche commented on ARROW-6737:
--

Thanks for providing the sample file. This is indeed a duplicate of ARROW-1644. 
Nested lists/structs are currently not yet supported in the Arrow parquet IO 
implementation.

> Nested column branch had multiple children
> --
>
> Key: ARROW-6737
> URL: https://issues.apache.org/jira/browse/ARROW-6737
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: harikrishnan
>Priority: Major
> Attachments: SampleRecord.jl
>
>
> {code}
> from pyarrow import json
> import pyarrow.parquet as pq
> r = json.read_json('example.jl')
> pq.write_table(r, 'example.parquet')
> {code}
> Doing the above operation resulting in {{ArrowInvalid: Nested column branch 
> had multiple children}}
> Posting it here as per the request from 
> https://github.com/apache/arrow/issues/4045#issuecomment-535867640
> The sample schema looks like this
> {code}
> package_version: string
> source_version: string
> uuid: string
> _type: string
> position: struct draught_raw: null, heading: double, lat: double, lon: double, nav_state: 
> int64, received_time: timestamp[s], speed: double>
>  child 0, ais_type: string
>  child 1, course: double
>  child 2, draught: double
>  child 3, draught_raw: null
>  child 4, heading: double
>  child 5, lat: double
>  child 6, lon: double
>  child 7, nav_state: int64
>  child 8, received_time: timestamp[s]
>  child 9, speed: double
> provider_name: string
> vessel: struct null, dwt: null, flag_code: null, flag_name: string, gross_tonnage: null, 
> imo: string, length: null, mmsi: string, name: string, type: null, 
> vessel_type: string>
>  child 0, beam: null
>  child 1, build_year: null
>  child 2, call_sign: string
>  child 3, dead_weight: null
>  child 4, dwt: null
>  child 5, flag_code: null
>  child 6, flag_name: string
>  child 7, gross_tonnage: null
>  child 8, imo: string
>  child 9, length: null
>  child 10, mmsi: string
>  child 11, name: string
>  child 12, type: null
>  child 13, vessel_type: string
> source_provider: string
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6737) Nested column branch had multiple children

2019-10-01 Thread harikrishnan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942461#comment-16942461
 ] 

harikrishnan commented on ARROW-6737:
-

Not sure https://jira.apache.org/jira/browse/ARROW-6760# this is also related 
to this. But facing this when I am trying to do a similar option with a slighty 
different Json format

> Nested column branch had multiple children
> --
>
> Key: ARROW-6737
> URL: https://issues.apache.org/jira/browse/ARROW-6737
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: harikrishnan
>Priority: Major
> Attachments: SampleRecord.jl
>
>
> {code}
> from pyarrow import json
> import pyarrow.parquet as pq
> r = json.read_json('example.jl')
> pq.write_table(r, 'example.parquet')
> {code}
> Doing the above operation resulting in {{ArrowInvalid: Nested column branch 
> had multiple children}}
> Posting it here as per the request from 
> https://github.com/apache/arrow/issues/4045#issuecomment-535867640
> The sample schema looks like this
> {code}
> package_version: string
> source_version: string
> uuid: string
> _type: string
> position: struct draught_raw: null, heading: double, lat: double, lon: double, nav_state: 
> int64, received_time: timestamp[s], speed: double>
>  child 0, ais_type: string
>  child 1, course: double
>  child 2, draught: double
>  child 3, draught_raw: null
>  child 4, heading: double
>  child 5, lat: double
>  child 6, lon: double
>  child 7, nav_state: int64
>  child 8, received_time: timestamp[s]
>  child 9, speed: double
> provider_name: string
> vessel: struct null, dwt: null, flag_code: null, flag_name: string, gross_tonnage: null, 
> imo: string, length: null, mmsi: string, name: string, type: null, 
> vessel_type: string>
>  child 0, beam: null
>  child 1, build_year: null
>  child 2, call_sign: string
>  child 3, dead_weight: null
>  child 4, dwt: null
>  child 5, flag_code: null
>  child 6, flag_name: string
>  child 7, gross_tonnage: null
>  child 8, imo: string
>  child 9, length: null
>  child 10, mmsi: string
>  child 11, name: string
>  child 12, type: null
>  child 13, vessel_type: string
> source_provider: string
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6737) Nested column branch had multiple children

2019-10-01 Thread harikrishnan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942452#comment-16942452
 ] 

harikrishnan commented on ARROW-6737:
-

[~jorisvandenbossche]Please use the attached file for testing. The attached 
file is not the same one as the schema I posted above. But this would result in 
the same error as I posted above. Let me know if you need any more info!

> Nested column branch had multiple children
> --
>
> Key: ARROW-6737
> URL: https://issues.apache.org/jira/browse/ARROW-6737
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: harikrishnan
>Priority: Major
> Attachments: SampleRecord.jl
>
>
> {code}
> from pyarrow import json
> import pyarrow.parquet as pq
> r = json.read_json('example.jl')
> pq.write_table(r, 'example.parquet')
> {code}
> Doing the above operation resulting in {{ArrowInvalid: Nested column branch 
> had multiple children}}
> Posting it here as per the request from 
> https://github.com/apache/arrow/issues/4045#issuecomment-535867640
> The sample schema looks like this
> {code}
> package_version: string
> source_version: string
> uuid: string
> _type: string
> position: struct draught_raw: null, heading: double, lat: double, lon: double, nav_state: 
> int64, received_time: timestamp[s], speed: double>
>  child 0, ais_type: string
>  child 1, course: double
>  child 2, draught: double
>  child 3, draught_raw: null
>  child 4, heading: double
>  child 5, lat: double
>  child 6, lon: double
>  child 7, nav_state: int64
>  child 8, received_time: timestamp[s]
>  child 9, speed: double
> provider_name: string
> vessel: struct null, dwt: null, flag_code: null, flag_name: string, gross_tonnage: null, 
> imo: string, length: null, mmsi: string, name: string, type: null, 
> vessel_type: string>
>  child 0, beam: null
>  child 1, build_year: null
>  child 2, call_sign: string
>  child 3, dead_weight: null
>  child 4, dwt: null
>  child 5, flag_code: null
>  child 6, flag_name: string
>  child 7, gross_tonnage: null
>  child 8, imo: string
>  child 9, length: null
>  child 10, mmsi: string
>  child 11, name: string
>  child 12, type: null
>  child 13, vessel_type: string
> source_provider: string
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6737) Nested column branch had multiple children

2019-10-01 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942097#comment-16942097
 ] 

Wes McKinney commented on ARROW-6737:
-

Is this ARROW-1644?

> Nested column branch had multiple children
> --
>
> Key: ARROW-6737
> URL: https://issues.apache.org/jira/browse/ARROW-6737
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: harikrishnan
>Priority: Major
>
> {code}
> from pyarrow import json
> import pyarrow.parquet as pq
> r = json.read_json('example.jl')
> pq.write_table(r, 'example.parquet')
> {code}
> Doing the above operation resulting in {{ArrowInvalid: Nested column branch 
> had multiple children}}
> Posting it here as per the request from 
> https://github.com/apache/arrow/issues/4045#issuecomment-535867640
> The sample schema looks like this
> {code}
> package_version: string
> source_version: string
> uuid: string
> _type: string
> position: struct draught_raw: null, heading: double, lat: double, lon: double, nav_state: 
> int64, received_time: timestamp[s], speed: double>
>  child 0, ais_type: string
>  child 1, course: double
>  child 2, draught: double
>  child 3, draught_raw: null
>  child 4, heading: double
>  child 5, lat: double
>  child 6, lon: double
>  child 7, nav_state: int64
>  child 8, received_time: timestamp[s]
>  child 9, speed: double
> provider_name: string
> vessel: struct null, dwt: null, flag_code: null, flag_name: string, gross_tonnage: null, 
> imo: string, length: null, mmsi: string, name: string, type: null, 
> vessel_type: string>
>  child 0, beam: null
>  child 1, build_year: null
>  child 2, call_sign: string
>  child 3, dead_weight: null
>  child 4, dwt: null
>  child 5, flag_code: null
>  child 6, flag_name: string
>  child 7, gross_tonnage: null
>  child 8, imo: string
>  child 9, length: null
>  child 10, mmsi: string
>  child 11, name: string
>  child 12, type: null
>  child 13, vessel_type: string
> source_provider: string
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6737) Nested column branch had multiple children

2019-10-01 Thread Joris Van den Bossche (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941683#comment-16941683
 ] 

Joris Van den Bossche commented on ARROW-6737:
--

[~harish1792] would you be able to provide a reproducible example? (so we can 
run the code and investigate the issue)  
For example, a small json file that shows the problem? 

> Nested column branch had multiple children
> --
>
> Key: ARROW-6737
> URL: https://issues.apache.org/jira/browse/ARROW-6737
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: harikrishnan
>Priority: Major
>
> {code}
> from pyarrow import json
> import pyarrow.parquet as pq
> r = json.read_json('example.jl')
> pq.write_table(r, 'example.parquet')
> {code}
> Doing the above operation resulting in {{ArrowInvalid: Nested column branch 
> had multiple children}}
> Posting it here as per the request from 
> https://github.com/apache/arrow/issues/4045#issuecomment-535867640
> The sample schema looks like this
> {code}
> package_version: string
> source_version: string
> uuid: string
> _type: string
> position: struct draught_raw: null, heading: double, lat: double, lon: double, nav_state: 
> int64, received_time: timestamp[s], speed: double>
>  child 0, ais_type: string
>  child 1, course: double
>  child 2, draught: double
>  child 3, draught_raw: null
>  child 4, heading: double
>  child 5, lat: double
>  child 6, lon: double
>  child 7, nav_state: int64
>  child 8, received_time: timestamp[s]
>  child 9, speed: double
> provider_name: string
> vessel: struct null, dwt: null, flag_code: null, flag_name: string, gross_tonnage: null, 
> imo: string, length: null, mmsi: string, name: string, type: null, 
> vessel_type: string>
>  child 0, beam: null
>  child 1, build_year: null
>  child 2, call_sign: string
>  child 3, dead_weight: null
>  child 4, dwt: null
>  child 5, flag_code: null
>  child 6, flag_name: string
>  child 7, gross_tonnage: null
>  child 8, imo: string
>  child 9, length: null
>  child 10, mmsi: string
>  child 11, name: string
>  child 12, type: null
>  child 13, vessel_type: string
> source_provider: string
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)