It looks like pyarrow.Table.from_pydict() cuts off binary data after an
embedded 00 byte. Is this a known bug?
(py3) C:\>python
Python 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] ::
Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pyarrow as pa
>>>
>>> data = np.array([b'', b'', b'', b'Foo!!', b'Bar!!',
.. b'\x00Baz!', b'half\x00baked', b''], dtype='|S13')
>>> t = pa.Table.from_pydict({'data':data})
>>> t.to_pandas()
data
0 b''
1 b''
2 b''
3 b'Foo!!'
4 b'Bar!!'
5 b''
6 b'half'
7 b''
>>> import pandas as pd
>>> pd.DataFrame(data)
0
0 b''
1 b''
2 b''
3 b'Foo!!'
4 b'Bar!!'
5 b'\x00Baz!'
6 b'half\x00baked'
7 b''
>>>