> Seems a bit buggy

Yeah that's a bit of an understatement :/ 

Done. https://issues.apache.org/jira/browse/ARROW-10498

I'm trying to poke around, but it looks like it may affect all of the from_* 
methods. I don't grok Cython very well, so am not sure I can get to a root 
cause easily.

On 2020/11/04 23:09:37, Wes McKinney <[email protected]> wrote: 
> Seems a bit buggy, can you open a Jira issue? Thanks
> 
> On Wed, Nov 4, 2020 at 5:05 PM Jason Sachs <[email protected]> wrote:
> >
> > It looks like pyarrow.Table.from_pydict() cuts off binary data after an 
> > embedded 00 byte. Is this a known bug?
> >
> > (py3) C:\>python
> > Python 3.8.5 (default, Sep  3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] 
> > :: Anaconda, Inc. on win32
> > Type "help", "copyright", "credits" or "license" for more information.
> > >>> import numpy as np
> > >>> import pyarrow as pa
> > >>>
> > >>> data = np.array([b'', b'', b'', b'Foo!!', b'Bar!!',
> > ..        b'\x00Baz!', b'half\x00baked', b''], dtype='|S13')
> > >>> t = pa.Table.from_pydict({'data':data})
> > >>> t.to_pandas()
> >        data
> > 0       b''
> > 1       b''
> > 2       b''
> > 3  b'Foo!!'
> > 4  b'Bar!!'
> > 5       b''
> > 6   b'half'
> > 7       b''
> > >>> import pandas as pd
> > >>> pd.DataFrame(data)
> >                   0
> > 0               b''
> > 1               b''
> > 2               b''
> > 3          b'Foo!!'
> > 4          b'Bar!!'
> > 5       b'\x00Baz!'
> > 6  b'half\x00baked'
> > 7               b''
> > >>>
> 

Reply via email to