Felipe Santos created ARROW-9020: ------------------------------------ Summary: read_json won't respect explicit_schema in parse_options Key: ARROW-9020 URL: https://issues.apache.org/jira/browse/ARROW-9020 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.17.1 Environment: CPython 3.8.2, MacOS Mojave 10.14.6 Reporter: Felipe Santos Fix For: 0.17.1
I am trying to read a json file using an explicit schema but it looks like the schema is ignored. Moreover, if the my schema contains a field not present in the json file, then the output table contains all the fields in the json file plus the fields of my schema not found in the file. A minimal example: {code:python} import pyarrow as pa from pyarrow import json # allowing for type inference print(json.read_json('tmp.json')) # prints: # pyarrow.Table # foo: string # baz: string # using an explicit schema that would read only "foo" schema = pa.schema([('foo', pa.string())]) print(json.read_json('tmp.json', parse_options=json.ParseOptions(explicit_schema=schema))) # prints: # pyarrow.Table # foo: string # baz: string # using an explicit schema that would read only "not_a_field", # which is not present in the json file schema = pa.schema([('not_a_field', pa.string())]) print(json.read_json('tmp.json', parse_options=json.ParseOptions(explicit_schema=schema))) # prints: # pyarrow.Table # not_a_field: string # foo: string # baz: string {code} And the tmp.json file looks like: {code:json} {"foo": "bar", "baz": "1"} {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)