[ 
https://issues.apache.org/jira/browse/ARROW-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16209315#comment-16209315
 ] 

ASF GitHub Bot commented on ARROW-1681:
---------------------------------------

Github user wesm commented on the issue:

    https://github.com/apache/arrow/issues/1208
  
    Thanks for the report. I created a JIRA ARROW-1681 for this 
https://issues.apache.org/jira/browse/ARROW-1681. If you specify the data type 
for a list array, there shouldn't be any problem with empty lists. We can 
investigate and let you know what's going on so we can fix, and see if there is 
a workaround in the meantime


> [Python] Error writing with nulls in lists
> ------------------------------------------
>
>                 Key: ARROW-1681
>                 URL: https://issues.apache.org/jira/browse/ARROW-1681
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.7.1
>            Reporter: Wes McKinney
>              Labels: pull-request-available
>             Fix For: 0.8.0
>
>
> Created from https://github.com/apache/arrow/issues/1208
> Hi,
> Not sure if this is related or the same as ARROW-1584, but I can't seem to 
> find a way to handle arrays of lists which occasionally consist of empty 
> lists only.
> To reproduce:
> {code}
> na = [] # None, [""]
> arrays = {
>     'c1': pa.array([["test"], na, na], type=pa.list_(pa.string())),
>     'c2': pa.array([na, na, na], type=pa.list_(pa.string())),
> }
> rb = pa.RecordBatch.from_arrays(list(arrays.values()), list(arrays.keys()))
> df = rb.to_pandas()
> pa.serialize_pandas(df)
> # > ArrowNotImplementedError: Unable to convert type: null
> tbl = pa.Table.from_pandas(df)
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchFileWriter(sink, tbl.schema)
> writer.write_table(tbl)
> # > ArrowNotImplementedError: Unable to convert type: null
> {code}
> In my use case I'm processing data in batches where individual fields contain 
> lists of strings. Some of the batches may, however, contain empty lists only. 
> And there doesn't seem to be any representation in Arrow at the moment to 
> deal with this situation.
> Also, since I'm serializing the batches into a single file/stream, their 
> schemas need to be consistent, which is why I tried explicitly specifying the 
> type of the array as list_(string). The only workaround I've found is to 
> replace empty lists with [""], but that implies lots of unnecessary glue code 
> on the client side. Is there a better workaround until this is fixed in an 
> official conda release?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to