Oh! The reader and writer are on the same thread & process, but it's possible for other threads to call `write_table` simultaneously. That probably accounts for the race condition here. Thanks!
On Thu, Jan 6, 2022 at 4:03 PM Weston Pace <[email protected]> wrote: > I'm guessing you mean write_table? Assuming you are passing a > filename / string (and not an open output stream) to write_table I > would expect that any files opened during the call have been closed > before the call returns. > > Pedantically, this is not quite the same thing as "finished writing on > disk" but more accurately, "finished writing to the OS". A power > outage shortly after a call to write_table completes could lead to > partial loss of a file. > > However, this should not matter for your case if I am understanding > your problem statement in that reddit post. As long as you open that > file handle to read after you have finished the call to write_table > you should see all of the contents immediately. > > There is always the opportunity for bugs but many of our unit tests > write files and then turn around and immediately read them and we > don't typically have trouble here. I'm assuming your reader & writer > are on the same thread & process? If you open a reader it's possible > your read task is running while your write task is running and then no > guarantees would be made. > > On Thu, Jan 6, 2022 at 12:47 PM Brandon Chinn <[email protected]> > wrote: > > > > When `pyarrow.parquet.write_file()` returns, is the parquet file > finished writing on disk, or is it still writing? > > > > Context: > https://www.reddit.com/r/learnpython/comments/rxmq43/help_with_python_file_flakily_not_returning_full/hrj99tq/?context=3 > > > > Thanks! > > Brandon Chinn >
