Oh! The reader and writer are on the same thread & process, but it's
possible for other threads to call `write_table` simultaneously. That
probably accounts for the race condition here. Thanks!

On Thu, Jan 6, 2022 at 4:03 PM Weston Pace <[email protected]> wrote:

> I'm guessing you mean write_table?  Assuming you are passing a
> filename / string (and not an open output stream) to write_table I
> would expect that any files opened during the call have been closed
> before the call returns.
>
> Pedantically, this is not quite the same thing as "finished writing on
> disk" but more accurately, "finished writing to the OS".  A power
> outage shortly after a call to write_table completes could lead to
> partial loss of a file.
>
> However, this should not matter for your case if I am understanding
> your problem statement in that reddit post.  As long as you open that
> file handle to read after you have finished the call to write_table
> you should see all of the contents immediately.
>
> There is always the opportunity for bugs but many of our unit tests
> write files and then turn around and immediately read them and we
> don't typically have trouble here.  I'm assuming your reader & writer
> are on the same thread & process?  If you open a reader it's possible
> your read task is running while your write task is running and then no
> guarantees would be made.
>
> On Thu, Jan 6, 2022 at 12:47 PM Brandon Chinn <[email protected]>
> wrote:
> >
> > When `pyarrow.parquet.write_file()` returns, is the parquet file
> finished writing on disk, or is it still writing?
> >
> > Context:
> https://www.reddit.com/r/learnpython/comments/rxmq43/help_with_python_file_flakily_not_returning_full/hrj99tq/?context=3
> >
> > Thanks!
> > Brandon Chinn
>

Reply via email to