Re: PyArrow 12 serialization

Philip Moore via user Thu, 11 May 2023 11:08:52 -0700

Thanks, Will – it works great!

From: Will Jones <[email protected]>
Date: Thursday, May 11, 2023 at 1:44 PM
To: [email protected] <[email protected]>, Philip Moore 
<[email protected]>
Subject: Re: PyArrow 12 serialization
Hi Phil,


It looks like you are trying to serialize a PyArrow table to Python bytes.

This function (from [1]) will give you a PyArrow Buffer object:

def write_ipc_buffer(table: pa.Table) -> pa.Buffer:
    sink = pa.BufferOutputStream()

    with pa.ipc.new_stream(sink, table.schema) as writer:
        writer.write_table(table)

    return sink.getvalue()

Then you can call to_pybytes() on that buffer. You will later be able to read 
that with:

    reader = pa.BufferReader(buffer)
    table = pa.ipc.open_stream(reader).read_all()

Best,

Will Jones

[1] 
https://github.com/wjones127/arrow-ipc-bench/blob/89d68b4d7cfcb3f5d28b6000abdb801c93198bbf/share_arrow.py#L19-L25

On Thu, May 11, 2023 at 10:35 AM Philip Moore via user 
<[email protected]<mailto:[email protected]>> wrote:
Hello,

                I’m attempting to use Apache Superset with PyArrow 12.0.0 – and 
it has this section of code:

if use_msgpack:
    with stats_timing(
        "sqllab.query.results_backend_pa_serialization", stats_logger
    ):
        data = (
            pa.default_serialization_context()
            .serialize(result_set.pa_table)
            .to_buffer()
            .to_pybytes()
        )

    # expand when loading data from results backend
    all_columns, expanded_columns = (selected_columns, [])


                That code worked fine in PyArrow 11.0.0 – but it appears that 
“default_serialization_context()” is removed in PyArrow 12.

                Can you advise on what this code should look like for use with 
PyArrow 12?

                Thank you.

Phil

Re: PyArrow 12 serialization

Reply via email to