[ https://issues.apache.org/jira/browse/ARROW-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ildar updated ARROW-4293: ------------------------- Description: Hi, I'm trying to use per-column statistics (min/max values) to filter out row groups while reading parquet file. But I don't see statistics built for binary columns. I noticed that {{ApplicationVersion::HasCorrectStatistics()}} discards statistics that have sort order {{UNSIGNED and haven't been created by parquet-cpp}}. As I understand there used to be some issues in {{parquet-mr}} before. But do they still persist? For example, I have parquet file created with {{parquet-mr}} version 1.10, it seems to have correct min/max values for binary columns. And {{parquet-cpp}} works fine for me if I remove this code from {{HasCorrectStatistics()}} func: {code:java} if (SortOrder::SIGNED != sort_order && !max_equals_min) { return false; }{code} was: Hi, I'm trying to use per-column statistics (min/max values) to filter out row groups while reading parquet file. But I don't see statistics built for binary columns. I noticed that {{ApplicationVersion::HasCorrectStatistics()}} discards statistics that have sort order {{UNSIGNED }}and haven't been created by {{parquet-cpp}}. As I understand there used to be some issues in {{parquet-mr}} before. But do they still persist? For example, I have parquet file created with {{parquet-mr}} version 1.10, it seems to have correct min/max values for binary columns. And {{parquet-cpp}} works fine for me if I remove this code from {{HasCorrectStatistics()}} func: {{ if (SortOrder::SIGNED != sort_order && !max_equals_min) {}} {{ return false; }}} > [C++] Can't access parquet statistics on binary columns > ------------------------------------------------------- > > Key: ARROW-4293 > URL: https://issues.apache.org/jira/browse/ARROW-4293 > Project: Apache Arrow > Issue Type: Bug > Reporter: Ildar > Priority: Major > > Hi, > I'm trying to use per-column statistics (min/max values) to filter out row > groups while reading parquet file. But I don't see statistics built for > binary columns. I noticed that {{ApplicationVersion::HasCorrectStatistics()}} > discards statistics that have sort order {{UNSIGNED and haven't been created > by parquet-cpp}}. As I understand there used to be some issues in > {{parquet-mr}} before. But do they still persist? > For example, I have parquet file created with {{parquet-mr}} version 1.10, it > seems to have correct min/max values for binary columns. And {{parquet-cpp}} > works fine for me if I remove this code from {{HasCorrectStatistics()}} func: > > {code:java} > if (SortOrder::SIGNED != sort_order && !max_equals_min) { > return false; > }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)