[ https://issues-test.apache.org/jira/browse/PARQUET-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16265591#comment-16265591 ]
Jorge Machado commented on PARQUET-1061: ---------------------------------------- Hi guys, I'm trying to read a parquet file in parallel outside of hadoop. Spark is using the class ParquetInputSplit. I would like to use it to but I'm wondering how to get the rowGroupOffsets[] ? is this the start position from every single block ? thanks > parquet dictionary filter does not work. > ---------------------------------------- > > Key: PARQUET-1061 > URL: https://issues-test.apache.org/jira/browse/PARQUET-1061 > Project: Parquet > Issue Type: Bug > Components: parquet-mr > Affects Versions: 1.9.0 > Environment: Hive 2.2.0 + Parquet-mr 1.9.0/master > Reporter: Junjie Chen > Priority: Major > > When perform selective query, we observed that dictionary filter was not > applied. Please see following code snippet. > if (rowGroupOffsets != null) { > // verify a row group was found for each offset > List<BlockMetaData> blocks = reader.getFooter().getBlocks(); > if (blocks.size() != rowGroupOffsets.length) { > throw new IllegalStateException( > "All of the offsets in the split should be found in the file." > + " expected: " + Arrays.toString(rowGroupOffsets) > + " found: " + blocks); > } > } else { > *Why apply data filter when row group offset equal to null? * > // apply data filters > reader.filterRowGroups(getFilter(configuration)); > } > I can enable filter after move else block code into second layer if. -- This message was sent by Atlassian JIRA (v7.6.0#76001)