The DictionaryFilter does not enabld successful in my machine.

2017-07-17 Thread Che, Jian
Hi

I have build the parquet-mr with version of 1.9.0 .  hive with version of 2.2.0.

SQL: select * from users where username='nmJNrQwzvZG';
The table of users stored as parquet file. The column of username has encoded 
by DictionaryWriter.

Some related configuration about parquet.

SET parquet.enable.dictionary=true;
SET hive.optimize.index.filter=true;
SET parquet.filter.index.enabled=true;



When I run the sql in hive with engine of mr. The DictionaryFilter did not work 
in this case.

Did I missed some important configuration?



Best Regards,
CheJian



[jira] [Commented] (PARQUET-1059) Improve the RLE encoding for Parquet Dictionary IDs

2017-07-17 Thread Dapeng Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089519#comment-16089519
 ] 

Dapeng Sun commented on PARQUET-1059:
-

Hi [~wesmckinn], thank you for your comments, how about create a new write 
version, such as  {{PARQUET_3_0}} or {{PARQUET_2_1}} , I think this 
optimization would be easy put into a new WRITE_VERSION.

> Improve the RLE encoding for Parquet Dictionary IDs
> ---
>
> Key: PARQUET-1059
> URL: https://issues.apache.org/jira/browse/PARQUET-1059
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Dapeng Sun
>
> The IDs of Parquet Dictionary encoding is using 
> {{RunLengthBitPackingHybridEncoder}}.
> RunLengthBitPackingHybridEncoder handles encoding with {{repeat}} and 
> {{bitpacking}}, we should improve it with the method likes 
> {{DeltaBinaryPackingWriter}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)