[ https://issues.apache.org/jira/browse/ARROW-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andy Grove updated ARROW-10226: ------------------------------- Summary: [Rust] [Parquet] Parquet reader reading wrong columns in some batches within a parquet file (was: [Rust] [DataFusion] TPC-H query 1 no longer completes for 100GB dataset) > [Rust] [Parquet] Parquet reader reading wrong columns in some batches within > a parquet file > ------------------------------------------------------------------------------------------- > > Key: ARROW-10226 > URL: https://issues.apache.org/jira/browse/ARROW-10226 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion > Reporter: Andy Grove > Assignee: Andy Grove > Priority: Major > Fix For: 2.0.0 > > > I re-installed my desktop a few days ago (now using Ubuntu 20.04 LTS) and > when I try and run the TPC-H benchmark, it never completes and eventually > uses up all 64 GB RAM. > I can run Spark against the data set and the query completes in 24 seconds, > which IIRC is how long it took before. > It is possible that something is odd on my environment, but it is also > possible/likely that this is a real bug. > I am investigating this and will update the Jira once I know more. > I also went back to old commits that were working for me before and they show > the same issue so I don't think this is related to a recent code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)