> Seems a bit buggy Yeah that's a bit of an understatement :/
Done. https://issues.apache.org/jira/browse/ARROW-10498 I'm trying to poke around, but it looks like it may affect all of the from_* methods. I don't grok Cython very well, so am not sure I can get to a root cause easily. On 2020/11/04 23:09:37, Wes McKinney <[email protected]> wrote: > Seems a bit buggy, can you open a Jira issue? Thanks > > On Wed, Nov 4, 2020 at 5:05 PM Jason Sachs <[email protected]> wrote: > > > > It looks like pyarrow.Table.from_pydict() cuts off binary data after an > > embedded 00 byte. Is this a known bug? > > > > (py3) C:\>python > > Python 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] > > :: Anaconda, Inc. on win32 > > Type "help", "copyright", "credits" or "license" for more information. > > >>> import numpy as np > > >>> import pyarrow as pa > > >>> > > >>> data = np.array([b'', b'', b'', b'Foo!!', b'Bar!!', > > .. b'\x00Baz!', b'half\x00baked', b''], dtype='|S13') > > >>> t = pa.Table.from_pydict({'data':data}) > > >>> t.to_pandas() > > data > > 0 b'' > > 1 b'' > > 2 b'' > > 3 b'Foo!!' > > 4 b'Bar!!' > > 5 b'' > > 6 b'half' > > 7 b'' > > >>> import pandas as pd > > >>> pd.DataFrame(data) > > 0 > > 0 b'' > > 1 b'' > > 2 b'' > > 3 b'Foo!!' > > 4 b'Bar!!' > > 5 b'\x00Baz!' > > 6 b'half\x00baked' > > 7 b'' > > >>> >
