Re: bug? pyarrow deserialize_components doesn't work in multiple processes
That works. I've tried a bunch of debugging and work arounds- as far as I can tell this is just a problem with deserializr from components and multiprocess. On Fri., 6 Jul. 2018, 5:12 pm Robert Nishihara, wrote: > Can you reproduce it without all of the multiprocessing code? E.g., just > call *pyarrow.serialize* in one interpreter. Then copy and paste the bytes > into another interpreter and call *pyarrow.deserialize *or > *pyarrow.deserialize_components*? > On Thu, Jul 5, 2018 at 9:48 PM Josh Quigley < > josh.quig...@lifetrading.com.au> > wrote: > > > Attachment inline: > > > > import pyarrow as pa > > import multiprocessing as mp > > import numpy as np > > > > def make_payload(): > > """Common function - make data to send""" > > return ['message', 123, np.random.uniform(-100, 100, (4, 4))] > > > > def send_payload(payload, connection): > > """Common function - serialize & send data through a socket""" > > s = pa.serialize(payload) > > c = s.to_components() > > > > # Send > > data = c.pop('data') > > connection.send(c) > > for d in data: > > connection.send_bytes(d) > > connection.send_bytes(b'') > > > > > > def recv_payload(connection): > > """Common function - recv data through a socket & deserialize""" > > c = connection.recv() > > c['data'] = [] > > while True: > > r = connection.recv_bytes() > > if len(r) == 0: > > break > > c['data'].append(pa.py_buffer(r)) > > > > print('...deserialize') > > return pa.deserialize_components(c) > > > > > > def run_same_process(): > > """Same process: Send data down a socket, then read data from the > > matching socket""" > > print('run_same_process') > > recv_conn,send_conn = mp.Pipe(duplex=False) > > payload = make_payload() > > print(payload) > > send_payload(payload, send_conn) > > payload2 = recv_payload(recv_conn) > > print(payload2) > > > > > > def receiver(recv_conn): > > """Separate process: runs in a different process, recv data & > > deserialize""" > > print('Receiver started') > > payload = recv_payload(recv_conn) > > print(payload) > > > > > > def run_separate_process(): > > """Separate process: launch the child process, then send data""" > > > > > > print('run_separate_process') > > recv_conn,send_conn = mp.Pipe(duplex=False) > > process = mp.Process(target=receiver, args=(recv_conn,)) > > process.start() > > > > payload = make_payload() > > print(payload) > > send_payload(payload, send_conn) > > > > process.join() > > > > if __name__ == '__main__': > > run_same_process() > > run_separate_process() > > > > > > On Fri, Jul 6, 2018 at 2:42 PM Josh Quigley < > > josh.quig...@lifetrading.com.au> > > wrote: > > > > > A reproducible program attached - it first runs serialize/deserialize > > from > > > the same process, then it does the same work using a separate process > for > > > the deserialize. > > > > > > The behaviour see is (after the same process code executes happily) is > > > hanging / child-process crashing during the call to deserialize. > > > > > > Is this expected, and if not, is there a known workaround? > > > > > > Running Windows 10, conda distribution, with package versions listed > > > below. I'll also see what happens if I run on *nix. > > > > > > - arrow-cpp=0.9.0=py36_vc14_7 > > > - boost-cpp=1.66.0=vc14_1 > > > - bzip2=1.0.6=vc14_1 > > > - hdf5=1.10.2=vc14_0 > > > - lzo=2.10=vc14_0 > > > - parquet-cpp=1.4.0=vc14_0 > > > - snappy=1.1.7=vc14_1 > > > - zlib=1.2.11=vc14_0 > > > - blas=1.0=mkl > > > - blosc=1.14.3=he51fdeb_0 > > > - cython=0.28.3=py36hfa6e2cd_0 > > > - icc_rt=2017.0.4=h97af966_0 > > > - intel-openmp=2018.0.3=0 > > > - numexpr=2.6.5=py36hcd2f87e_0 > > > - numpy=1.14.5=py36h9fa60d3_2 > > > - numpy-base=1.14.5=py36h5c71026_2 > > > - pandas=0.23.1=py36h830ac7b_0 > > > - pyarrow=0.9.0=py36hfe5e424_2 > > > - pytables=3.4.4=py36he6f6034_0 > > > - python=3.6.6=hea74fb7_0 > > > - vc=14=h0510ff6_3 > > > - vs2015_runtime=14.0.25123=3 > > > > > > > > >
Re: bug? pyarrow deserialize_components doesn't work in multiple processes
Attachment inline: import pyarrow as pa import multiprocessing as mp import numpy as np def make_payload(): """Common function - make data to send""" return ['message', 123, np.random.uniform(-100, 100, (4, 4))] def send_payload(payload, connection): """Common function - serialize & send data through a socket""" s = pa.serialize(payload) c = s.to_components() # Send data = c.pop('data') connection.send(c) for d in data: connection.send_bytes(d) connection.send_bytes(b'') def recv_payload(connection): """Common function - recv data through a socket & deserialize""" c = connection.recv() c['data'] = [] while True: r = connection.recv_bytes() if len(r) == 0: break c['data'].append(pa.py_buffer(r)) print('...deserialize') return pa.deserialize_components(c) def run_same_process(): """Same process: Send data down a socket, then read data from the matching socket""" print('run_same_process') recv_conn,send_conn = mp.Pipe(duplex=False) payload = make_payload() print(payload) send_payload(payload, send_conn) payload2 = recv_payload(recv_conn) print(payload2) def receiver(recv_conn): """Separate process: runs in a different process, recv data & deserialize""" print('Receiver started') payload = recv_payload(recv_conn) print(payload) def run_separate_process(): """Separate process: launch the child process, then send data""" print('run_separate_process') recv_conn,send_conn = mp.Pipe(duplex=False) process = mp.Process(target=receiver, args=(recv_conn,)) process.start() payload = make_payload() print(payload) send_payload(payload, send_conn) process.join() if __name__ == '__main__': run_same_process() run_separate_process() On Fri, Jul 6, 2018 at 2:42 PM Josh Quigley wrote: > A reproducible program attached - it first runs serialize/deserialize from > the same process, then it does the same work using a separate process for > the deserialize. > > The behaviour see is (after the same process code executes happily) is > hanging / child-process crashing during the call to deserialize. > > Is this expected, and if not, is there a known workaround? > > Running Windows 10, conda distribution, with package versions listed > below. I'll also see what happens if I run on *nix. > > - arrow-cpp=0.9.0=py36_vc14_7 > - boost-cpp=1.66.0=vc14_1 > - bzip2=1.0.6=vc14_1 > - hdf5=1.10.2=vc14_0 > - lzo=2.10=vc14_0 > - parquet-cpp=1.4.0=vc14_0 > - snappy=1.1.7=vc14_1 > - zlib=1.2.11=vc14_0 > - blas=1.0=mkl > - blosc=1.14.3=he51fdeb_0 > - cython=0.28.3=py36hfa6e2cd_0 > - icc_rt=2017.0.4=h97af966_0 > - intel-openmp=2018.0.3=0 > - numexpr=2.6.5=py36hcd2f87e_0 > - numpy=1.14.5=py36h9fa60d3_2 > - numpy-base=1.14.5=py36h5c71026_2 > - pandas=0.23.1=py36h830ac7b_0 > - pyarrow=0.9.0=py36hfe5e424_2 > - pytables=3.4.4=py36he6f6034_0 > - python=3.6.6=hea74fb7_0 > - vc=14=h0510ff6_3 > - vs2015_runtime=14.0.25123=3 > >
bug? pyarrow deserialize_components doesn't work in multiple processes
A reproducible program attached - it first runs serialize/deserialize from the same process, then it does the same work using a separate process for the deserialize. The behaviour see is (after the same process code executes happily) is hanging / child-process crashing during the call to deserialize. Is this expected, and if not, is there a known workaround? Running Windows 10, conda distribution, with package versions listed below. I'll also see what happens if I run on *nix. - arrow-cpp=0.9.0=py36_vc14_7 - boost-cpp=1.66.0=vc14_1 - bzip2=1.0.6=vc14_1 - hdf5=1.10.2=vc14_0 - lzo=2.10=vc14_0 - parquet-cpp=1.4.0=vc14_0 - snappy=1.1.7=vc14_1 - zlib=1.2.11=vc14_0 - blas=1.0=mkl - blosc=1.14.3=he51fdeb_0 - cython=0.28.3=py36hfa6e2cd_0 - icc_rt=2017.0.4=h97af966_0 - intel-openmp=2018.0.3=0 - numexpr=2.6.5=py36hcd2f87e_0 - numpy=1.14.5=py36h9fa60d3_2 - numpy-base=1.14.5=py36h5c71026_2 - pandas=0.23.1=py36h830ac7b_0 - pyarrow=0.9.0=py36hfe5e424_2 - pytables=3.4.4=py36he6f6034_0 - python=3.6.6=hea74fb7_0 - vc=14=h0510ff6_3 - vs2015_runtime=14.0.25123=3