Hello!

I've used Arrow a decent bit in Python and JS but I'm pretty new to Rust.
I'm trying to write a  minimal binding of Rust's Parquet to WebAssembly in
order to decode Parquet files to Arrow on the web. I have code that works
<https://github.com/kylebarron/parquet-wasm/blob/main/src/lib.rs> but only
some of the time. For example this test data
<https://github.com/kylebarron/parquet-wasm/blob/9495a87e00ae7073966d171bdcbfa1b87c63991b/data/works.parquet>
 (created here
<https://github.com/kylebarron/parquet-wasm/blob/9495a87e00ae7073966d171bdcbfa1b87c63991b/data/generate_data.py#L40-L43>)
seems to work with the js arrow.RecordBatchReader
<https://github.com/kylebarron/parquet-wasm/blob/79580c64c698570fd1a8a48b55698ca0be630aa8/www/index.js#L50-L52>
 but other test data
<https://github.com/kylebarron/parquet-wasm/blob/9495a87e00ae7073966d171bdcbfa1b87c63991b/data/not_work.parquet>
 (created here
<https://github.com/kylebarron/parquet-wasm/blob/79580c64c698570fd1a8a48b55698ca0be630aa8/data/generate_data.py#L45-L48>)
raises with "Error: Expected to read 1249648 metadata bytes, but only read
300.".

Based on logging, it *seems* as if parsing the Parquet file goes smoothly.
It's only writing the Arrow IPC format that fails (on the JS side when
trying to verify it). I'm currently trying to create the StreamWriter
<https://github.com/kylebarron/parquet-wasm/blob/79580c64c698570fd1a8a48b55698ca0be630aa8/src/lib.rs#L122-L123>,
then write all the Arrow RecordBatches into the writer
<https://github.com/kylebarron/parquet-wasm/blob/79580c64c698570fd1a8a48b55698ca0be630aa8/src/lib.rs#L127-L128>,
then finish the writer
<https://github.com/kylebarron/parquet-wasm/blob/79580c64c698570fd1a8a48b55698ca0be630aa8/src/lib.rs#L142>,
and send the output back to JS
<https://github.com/kylebarron/parquet-wasm/blob/79580c64c698570fd1a8a48b55698ca0be630aa8/src/lib.rs#L145-L156>
.

Has anyone seen a similar problem before, or any suggestions of where to
debug further? Alternatively, if an end-to-end example exists of reading
from a parquet file and returning an Arrow buffer would be very helpful to
see.

Best,
Kyle Barron

Reply via email to