Re: [python] not an arrow file

Wes McKinney Mon, 25 Jan 2021 10:53:16 -0800

hi Ken -- it looks like you aren't calling "writer->Close()" after
writing the last record batch. I think that will fix the issue


On Mon, Jan 25, 2021 at 12:48 PM Teh, Kenneth M. <[email protected]> wrote:
>
> Hi Wes,
>
> My C++ code is attached. I tried to also read it from C++ by opening the disk 
> file as a MemoryMappedFile and get the same error when I make a 
> RecordBatchReader on the mmap'ed file, ie, "not an Arrow file".
>
> There must be some magical sequence of writes needed to make the file kosher.
>
> Thanks for helping.
>
> Ken
>
> p.s. I read your blog about relocating to Nashville.  Was my stomping grounds 
> back in the 80s. Memories.
>
> ________________________________
> From: Wes McKinney <[email protected]>
> Sent: Sunday, January 24, 2021 11:41 AM
> To: [email protected] <[email protected]>
> Subject: Re: [python] not an arrow file
>
> Can you show your C++ code?
>
> On Sun, Jan 24, 2021 at 8:10 AM Teh, Kenneth M. <[email protected]> wrote:
>
> Just started with arrow...
>
> I wrote a record batch to a file using ipc::MakeFileWriter to create a writer 
> and writer->WriteRecordBatch in a C++ program and tried to read it in python 
> with:
>
> [] import pyarrow as pa
> [] reader = pa.ipc.open_file("myfile")
>
>
> It raises the ArrowInvalid with the message "not an arrow file".
>
> If I write it out as a Table in feather format, I can read it in python. But 
> I want to write large files on the order of 100GB or more and then read them 
> back into python as pandas dataframes or something similar.
>
> So, I switched to using an ipc writer.
>
> Can something point me in the right direction?  Thanks.
>
> Ken

Re: [python] not an arrow file

Reply via email to