[jira] [Commented] (ARROW-6737) Nested column branch had multiple children
[ https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942567#comment-16942567 ] Joris Van den Bossche commented on ARROW-6737: -- I noticed that reading this file on master actually gives problems, while it works on 0.14.1, so opened ARROW-6762 for that. > Nested column branch had multiple children > -- > > Key: ARROW-6737 > URL: https://issues.apache.org/jira/browse/ARROW-6737 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: harikrishnan >Priority: Major > Attachments: SampleRecord.jl > > > {code} > from pyarrow import json > import pyarrow.parquet as pq > r = json.read_json('example.jl') > pq.write_table(r, 'example.parquet') > {code} > Doing the above operation resulting in {{ArrowInvalid: Nested column branch > had multiple children}} > Posting it here as per the request from > https://github.com/apache/arrow/issues/4045#issuecomment-535867640 > The sample schema looks like this > {code} > package_version: string > source_version: string > uuid: string > _type: string > position: struct draught_raw: null, heading: double, lat: double, lon: double, nav_state: > int64, received_time: timestamp[s], speed: double> > child 0, ais_type: string > child 1, course: double > child 2, draught: double > child 3, draught_raw: null > child 4, heading: double > child 5, lat: double > child 6, lon: double > child 7, nav_state: int64 > child 8, received_time: timestamp[s] > child 9, speed: double > provider_name: string > vessel: struct null, dwt: null, flag_code: null, flag_name: string, gross_tonnage: null, > imo: string, length: null, mmsi: string, name: string, type: null, > vessel_type: string> > child 0, beam: null > child 1, build_year: null > child 2, call_sign: string > child 3, dead_weight: null > child 4, dwt: null > child 5, flag_code: null > child 6, flag_name: string > child 7, gross_tonnage: null > child 8, imo: string > child 9, length: null > child 10, mmsi: string > child 11, name: string > child 12, type: null > child 13, vessel_type: string > source_provider: string > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6737) Nested column branch had multiple children
[ https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942560#comment-16942560 ] Joris Van den Bossche commented on ARROW-6737: -- Thanks for providing the sample file. This is indeed a duplicate of ARROW-1644. Nested lists/structs are currently not yet supported in the Arrow parquet IO implementation. > Nested column branch had multiple children > -- > > Key: ARROW-6737 > URL: https://issues.apache.org/jira/browse/ARROW-6737 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: harikrishnan >Priority: Major > Attachments: SampleRecord.jl > > > {code} > from pyarrow import json > import pyarrow.parquet as pq > r = json.read_json('example.jl') > pq.write_table(r, 'example.parquet') > {code} > Doing the above operation resulting in {{ArrowInvalid: Nested column branch > had multiple children}} > Posting it here as per the request from > https://github.com/apache/arrow/issues/4045#issuecomment-535867640 > The sample schema looks like this > {code} > package_version: string > source_version: string > uuid: string > _type: string > position: struct draught_raw: null, heading: double, lat: double, lon: double, nav_state: > int64, received_time: timestamp[s], speed: double> > child 0, ais_type: string > child 1, course: double > child 2, draught: double > child 3, draught_raw: null > child 4, heading: double > child 5, lat: double > child 6, lon: double > child 7, nav_state: int64 > child 8, received_time: timestamp[s] > child 9, speed: double > provider_name: string > vessel: struct null, dwt: null, flag_code: null, flag_name: string, gross_tonnage: null, > imo: string, length: null, mmsi: string, name: string, type: null, > vessel_type: string> > child 0, beam: null > child 1, build_year: null > child 2, call_sign: string > child 3, dead_weight: null > child 4, dwt: null > child 5, flag_code: null > child 6, flag_name: string > child 7, gross_tonnage: null > child 8, imo: string > child 9, length: null > child 10, mmsi: string > child 11, name: string > child 12, type: null > child 13, vessel_type: string > source_provider: string > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6737) Nested column branch had multiple children
[ https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942461#comment-16942461 ] harikrishnan commented on ARROW-6737: - Not sure https://jira.apache.org/jira/browse/ARROW-6760# this is also related to this. But facing this when I am trying to do a similar option with a slighty different Json format > Nested column branch had multiple children > -- > > Key: ARROW-6737 > URL: https://issues.apache.org/jira/browse/ARROW-6737 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: harikrishnan >Priority: Major > Attachments: SampleRecord.jl > > > {code} > from pyarrow import json > import pyarrow.parquet as pq > r = json.read_json('example.jl') > pq.write_table(r, 'example.parquet') > {code} > Doing the above operation resulting in {{ArrowInvalid: Nested column branch > had multiple children}} > Posting it here as per the request from > https://github.com/apache/arrow/issues/4045#issuecomment-535867640 > The sample schema looks like this > {code} > package_version: string > source_version: string > uuid: string > _type: string > position: struct draught_raw: null, heading: double, lat: double, lon: double, nav_state: > int64, received_time: timestamp[s], speed: double> > child 0, ais_type: string > child 1, course: double > child 2, draught: double > child 3, draught_raw: null > child 4, heading: double > child 5, lat: double > child 6, lon: double > child 7, nav_state: int64 > child 8, received_time: timestamp[s] > child 9, speed: double > provider_name: string > vessel: struct null, dwt: null, flag_code: null, flag_name: string, gross_tonnage: null, > imo: string, length: null, mmsi: string, name: string, type: null, > vessel_type: string> > child 0, beam: null > child 1, build_year: null > child 2, call_sign: string > child 3, dead_weight: null > child 4, dwt: null > child 5, flag_code: null > child 6, flag_name: string > child 7, gross_tonnage: null > child 8, imo: string > child 9, length: null > child 10, mmsi: string > child 11, name: string > child 12, type: null > child 13, vessel_type: string > source_provider: string > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6737) Nested column branch had multiple children
[ https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942452#comment-16942452 ] harikrishnan commented on ARROW-6737: - [~jorisvandenbossche]Please use the attached file for testing. The attached file is not the same one as the schema I posted above. But this would result in the same error as I posted above. Let me know if you need any more info! > Nested column branch had multiple children > -- > > Key: ARROW-6737 > URL: https://issues.apache.org/jira/browse/ARROW-6737 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: harikrishnan >Priority: Major > Attachments: SampleRecord.jl > > > {code} > from pyarrow import json > import pyarrow.parquet as pq > r = json.read_json('example.jl') > pq.write_table(r, 'example.parquet') > {code} > Doing the above operation resulting in {{ArrowInvalid: Nested column branch > had multiple children}} > Posting it here as per the request from > https://github.com/apache/arrow/issues/4045#issuecomment-535867640 > The sample schema looks like this > {code} > package_version: string > source_version: string > uuid: string > _type: string > position: struct draught_raw: null, heading: double, lat: double, lon: double, nav_state: > int64, received_time: timestamp[s], speed: double> > child 0, ais_type: string > child 1, course: double > child 2, draught: double > child 3, draught_raw: null > child 4, heading: double > child 5, lat: double > child 6, lon: double > child 7, nav_state: int64 > child 8, received_time: timestamp[s] > child 9, speed: double > provider_name: string > vessel: struct null, dwt: null, flag_code: null, flag_name: string, gross_tonnage: null, > imo: string, length: null, mmsi: string, name: string, type: null, > vessel_type: string> > child 0, beam: null > child 1, build_year: null > child 2, call_sign: string > child 3, dead_weight: null > child 4, dwt: null > child 5, flag_code: null > child 6, flag_name: string > child 7, gross_tonnage: null > child 8, imo: string > child 9, length: null > child 10, mmsi: string > child 11, name: string > child 12, type: null > child 13, vessel_type: string > source_provider: string > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6737) Nested column branch had multiple children
[ https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942097#comment-16942097 ] Wes McKinney commented on ARROW-6737: - Is this ARROW-1644? > Nested column branch had multiple children > -- > > Key: ARROW-6737 > URL: https://issues.apache.org/jira/browse/ARROW-6737 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: harikrishnan >Priority: Major > > {code} > from pyarrow import json > import pyarrow.parquet as pq > r = json.read_json('example.jl') > pq.write_table(r, 'example.parquet') > {code} > Doing the above operation resulting in {{ArrowInvalid: Nested column branch > had multiple children}} > Posting it here as per the request from > https://github.com/apache/arrow/issues/4045#issuecomment-535867640 > The sample schema looks like this > {code} > package_version: string > source_version: string > uuid: string > _type: string > position: struct draught_raw: null, heading: double, lat: double, lon: double, nav_state: > int64, received_time: timestamp[s], speed: double> > child 0, ais_type: string > child 1, course: double > child 2, draught: double > child 3, draught_raw: null > child 4, heading: double > child 5, lat: double > child 6, lon: double > child 7, nav_state: int64 > child 8, received_time: timestamp[s] > child 9, speed: double > provider_name: string > vessel: struct null, dwt: null, flag_code: null, flag_name: string, gross_tonnage: null, > imo: string, length: null, mmsi: string, name: string, type: null, > vessel_type: string> > child 0, beam: null > child 1, build_year: null > child 2, call_sign: string > child 3, dead_weight: null > child 4, dwt: null > child 5, flag_code: null > child 6, flag_name: string > child 7, gross_tonnage: null > child 8, imo: string > child 9, length: null > child 10, mmsi: string > child 11, name: string > child 12, type: null > child 13, vessel_type: string > source_provider: string > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6737) Nested column branch had multiple children
[ https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941683#comment-16941683 ] Joris Van den Bossche commented on ARROW-6737: -- [~harish1792] would you be able to provide a reproducible example? (so we can run the code and investigate the issue) For example, a small json file that shows the problem? > Nested column branch had multiple children > -- > > Key: ARROW-6737 > URL: https://issues.apache.org/jira/browse/ARROW-6737 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: harikrishnan >Priority: Major > > {code} > from pyarrow import json > import pyarrow.parquet as pq > r = json.read_json('example.jl') > pq.write_table(r, 'example.parquet') > {code} > Doing the above operation resulting in {{ArrowInvalid: Nested column branch > had multiple children}} > Posting it here as per the request from > https://github.com/apache/arrow/issues/4045#issuecomment-535867640 > The sample schema looks like this > {code} > package_version: string > source_version: string > uuid: string > _type: string > position: struct draught_raw: null, heading: double, lat: double, lon: double, nav_state: > int64, received_time: timestamp[s], speed: double> > child 0, ais_type: string > child 1, course: double > child 2, draught: double > child 3, draught_raw: null > child 4, heading: double > child 5, lat: double > child 6, lon: double > child 7, nav_state: int64 > child 8, received_time: timestamp[s] > child 9, speed: double > provider_name: string > vessel: struct null, dwt: null, flag_code: null, flag_name: string, gross_tonnage: null, > imo: string, length: null, mmsi: string, name: string, type: null, > vessel_type: string> > child 0, beam: null > child 1, build_year: null > child 2, call_sign: string > child 3, dead_weight: null > child 4, dwt: null > child 5, flag_code: null > child 6, flag_name: string > child 7, gross_tonnage: null > child 8, imo: string > child 9, length: null > child 10, mmsi: string > child 11, name: string > child 12, type: null > child 13, vessel_type: string > source_provider: string > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)