[ https://issues.apache.org/jira/browse/ARROW-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16209315#comment-16209315 ]
ASF GitHub Bot commented on ARROW-1681: --------------------------------------- Github user wesm commented on the issue: https://github.com/apache/arrow/issues/1208 Thanks for the report. I created a JIRA ARROW-1681 for this https://issues.apache.org/jira/browse/ARROW-1681. If you specify the data type for a list array, there shouldn't be any problem with empty lists. We can investigate and let you know what's going on so we can fix, and see if there is a workaround in the meantime > [Python] Error writing with nulls in lists > ------------------------------------------ > > Key: ARROW-1681 > URL: https://issues.apache.org/jira/browse/ARROW-1681 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.7.1 > Reporter: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > > Created from https://github.com/apache/arrow/issues/1208 > Hi, > Not sure if this is related or the same as ARROW-1584, but I can't seem to > find a way to handle arrays of lists which occasionally consist of empty > lists only. > To reproduce: > {code} > na = [] # None, [""] > arrays = { > 'c1': pa.array([["test"], na, na], type=pa.list_(pa.string())), > 'c2': pa.array([na, na, na], type=pa.list_(pa.string())), > } > rb = pa.RecordBatch.from_arrays(list(arrays.values()), list(arrays.keys())) > df = rb.to_pandas() > pa.serialize_pandas(df) > # > ArrowNotImplementedError: Unable to convert type: null > tbl = pa.Table.from_pandas(df) > sink = pa.BufferOutputStream() > writer = pa.RecordBatchFileWriter(sink, tbl.schema) > writer.write_table(tbl) > # > ArrowNotImplementedError: Unable to convert type: null > {code} > In my use case I'm processing data in batches where individual fields contain > lists of strings. Some of the batches may, however, contain empty lists only. > And there doesn't seem to be any representation in Arrow at the moment to > deal with this situation. > Also, since I'm serializing the batches into a single file/stream, their > schemas need to be consistent, which is why I tried explicitly specifying the > type of the array as list_(string). The only workaround I've found is to > replace empty lists with [""], but that implies lots of unnecessary glue code > on the client side. Is there a better workaround until this is fixed in an > official conda release? -- This message was sent by Atlassian JIRA (v6.4.14#64029)