On Thu, 19 Aug 2021 15:40:53 +0000 Hagai Har-Gil <[email protected]> wrote: > Right - I have a different app that uses sockets in another context for a > similar goal. > > The thing is - the Stream object is "advertised" (so to say) as a suitable > holder for such data. E.g., looking at the docs for > `pyarrow.ipc.open_stream()` and `pyarrow.ipc.NativeFile`, they specifically > mention how this is the right approach when doing streaming, and I assumed > that concurrent reading from that stream is a viable use case for such files. > > Perhaps I'm just completely ignorant of this topic and should've realized > that a NativeFile can't support this use case, but I believe that a minimal > warning against such "abuse" of the IPC protocol might be helpful in the > future.
Well, the IPC protocol does not change the semantics of the underlying file. If you're using a regular disk file, then by construction there's no guarding against unsynchronised access. If you're using a socket, then you get synchronisation by construction. I notice it is not possible currently to create a pyarrow.OSFile from a file descriptor: https://issues.apache.org/jira/browse/ARROW-10906 However, you should be able to create a pyarrow.PythonFile from a Python socket's file object (obtained using socket.makefile()). It will be less performant, but should hopefully work. Regards Antoine.
