hi Ken -- it looks like you aren't calling "writer->Close()" after writing the last record batch. I think that will fix the issue
On Mon, Jan 25, 2021 at 12:48 PM Teh, Kenneth M. <[email protected]> wrote: > > Hi Wes, > > My C++ code is attached. I tried to also read it from C++ by opening the disk > file as a MemoryMappedFile and get the same error when I make a > RecordBatchReader on the mmap'ed file, ie, "not an Arrow file". > > There must be some magical sequence of writes needed to make the file kosher. > > Thanks for helping. > > Ken > > p.s. I read your blog about relocating to Nashville. Was my stomping grounds > back in the 80s. Memories. > > ________________________________ > From: Wes McKinney <[email protected]> > Sent: Sunday, January 24, 2021 11:41 AM > To: [email protected] <[email protected]> > Subject: Re: [python] not an arrow file > > Can you show your C++ code? > > On Sun, Jan 24, 2021 at 8:10 AM Teh, Kenneth M. <[email protected]> wrote: > > Just started with arrow... > > I wrote a record batch to a file using ipc::MakeFileWriter to create a writer > and writer->WriteRecordBatch in a C++ program and tried to read it in python > with: > > [] import pyarrow as pa > [] reader = pa.ipc.open_file("myfile") > > > It raises the ArrowInvalid with the message "not an arrow file". > > If I write it out as a Table in feather format, I can read it in python. But > I want to write large files on the order of 100GB or more and then read them > back into python as pandas dataframes or something similar. > > So, I switched to using an ipc writer. > > Can something point me in the right direction? Thanks. > > Ken
