Joris Van den Bossche created ARROW-6762:
--------------------------------------------

             Summary: [C++] JSON reader segfaults on newline
                 Key: ARROW-6762
                 URL: https://issues.apache.org/jira/browse/ARROW-6762
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
            Reporter: Joris Van den Bossche


Using the {{SampleRecord.jl}} attachment from ARROW-6737, I notice that trying 
to read this file on master results in a segfault:

{code}
In [1]: from pyarrow import json 
   ...: import pyarrow.parquet as pq 
   ...:  
   ...: r = json.read_json('SampleRecord.jl') 
WARNING: Logging before InitGoogleLogging() is written to STDERR
F1002 09:56:55.362766 13035 reader.cc:93]  Check failed: 
(string_view(*next_partial).find_first_not_of(" \t\n\r")) == 
(string_view::npos) 
*** Check failure stack trace: ***
Aborted (core dumped)
{code}

while with 0.14.1 this works fine:

{code}
In [24]: from pyarrow import json 
    ...: import pyarrow.parquet as pq 
    ...:  
    ...: r = json.read_json('SampleRecord.jl')                                  
                                                                                
                                                   

In [25]: r                                                                      
                                                                                
                                                   
Out[25]: 
pyarrow.Table
_type: string
provider_name: string
arrival: timestamp[s]
berthed: timestamp[s]
berth: null
cargoes: list<item: struct<movement: string, product: string, volume: string, 
volume_unit: string, buyer: null, seller: null>>
  child 0, item: struct<movement: string, product: string, volume: string, 
volume_unit: string, buyer: null, seller: null>
      child 0, movement: string
      child 1, product: string
      child 2, volume: string
      child 3, volume_unit: string
      child 4, buyer: null
      child 5, seller: null
departure: timestamp[s]
eta: null
installation: null
port_name: string
next_zone: null
reported_date: timestamp[s]
shipping_agent: null
vessel: struct<beam: null, build_year: null, call_sign: null, dead_weight: 
null, dwt: null, flag_code: null, flag_name: null, gross_tonnage: null, imo: 
string, length: int64, mmsi: null, name: string, type: null, vessel_type: null>
  child 0, beam: null
  child 1, build_year: null
  child 2, call_sign: null
  child 3, dead_weight: null
  child 4, dwt: null
  child 5, flag_code: null
  child 6, flag_name: null
  child 7, gross_tonnage: null
  child 8, imo: string
  child 9, length: int64
  child 10, mmsi: null
  child 11, name: string
  child 12, type: null
  child 13, vessel_type: null

In [26]: pa.__version__                                                         
                                                                                
                                                   
Out[26]: '0.14.1'
{code}

cc [~apitrou] [~bkietz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to