Re: Pandas Block Manager

2021-01-27 Thread Nicholas White
from that to the corresponding pandas > > memory (this is hypothetical, again I don't have enough context on > > pandas/numpy memory layouts). > > > > -Micah > > > > On Thu, Nov 12, 2020 at 3:01 PM Nicholas White > wrote: > > > > > OK

Re: Pandas Block Manager

2020-11-12 Thread Nicholas White
2020 at 22:52, Nicholas White wrote: > Thanks all, this has been interesting. I've made a patch that sort-of does > what I want[1] - I hope the test case is clear! I made the batch writer use > the `alignment` field that was already in the `IpcWriteOptions` to align > the buff

Re: Pandas Block Manager

2020-11-11 Thread Nicholas White
Thanks all, this has been interesting. I've made a patch that sort-of does what I want[1] - I hope the test case is clear! I made the batch writer use the `alignment` field that was already in the `IpcWriteOptions` to align the buffers, instead of fixing their alignment at 8. Arrow then writes out

Re: Pandas Block Manager

2020-11-10 Thread Nicholas White
I've done a bit more digging. This code: df = pd.DataFrame(np.random.randint(10, size=(5, 5))) table = pa.Table.from_pandas(df) mem = [] for c in table.columns: buf = c.chunks[0].buffers()[1] mem.append((buf.address, buf.size)) sorted(mem) ...prints... [(140262915478912,

Pandas Block Manager

2020-11-08 Thread Nicholas White
Hi - I've been looking through the Arrow specification format to look for ways to allow zero-copy creation of Pandas DataFrames (beyond `split_blocks`). Am I right in thinking that if you created an Arrow file (let's say of `m` rows and `n` columns of `float64`s for now) as a single RecordBatch