Thanks Stefan, Had a quick look and liked the sound of being able to work with larger-than-memory datasets by analysing streams from data sources, however wasn't so impressed by the DuckDB showing (out-of-memory) on this benchmark site https://h2oai.github.io/db-benchmark/ However it looks like the benchmark is using CSV not Parquet/Arrow so that may make a difference. I read/write from Parquet currently using Polars and am pretty happy with the performance. What I'm missing currently is reading/writing to Parquet from J. I see there is a separate post which may help in that regard!
On Wed, Feb 2, 2022 at 8:27 PM Stefan Baumann <ste...@bstr.at> wrote: > Ric, You might want to check out DuckDB (https://duckdb.org/), I recently > used it for reading and writing Parquet files. > It's similar to SQLite but intended to be used for analytics. > Stefan. > > On Wed, Feb 2, 2022 at 5:29 AM Ric Sherlock <tikk...@gmail.com> wrote: > > > I spend a fair bit of time wrangling data formatted as C structs, CSV and > > am trying to move more to Parquet as a file storage format. > > I've also had on my list to investigate what would be involved in > > reading/writing Parquet from J. Do you know if anyone out there has > looked > > into this? > > Ric > > > > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm