Re: Why the filter push down does not reduce the read data record count

2018-02-24 Thread Sun, Keith
Hi , My env : Hive 1.2.1 and Parquet 1.8.1 Per my search in hive and parquet source code of version 1.8.1, I did not see the paramters in that slides. but found that here :

Re: Why the filter push down does not reduce the read data record count

2018-02-23 Thread Furcy Pin
And if you come across a comprehensive documentation of parquet configuration, please share it!!! The Parquet documentation says that it can be configured but doesn't explain how: http://parquet.apache.org/documentation/latest/ and apparently, both TAJO (

Re: Why the filter push down does not reduce the read data record count

2018-02-23 Thread Sun, Keith
I got your point and thanks for the nice slides info. So the parquet filter is not an easy thing and I will try that according to the deck. Thanks ! From: Furcy Pin Sent: Friday, February 23, 2018 3:37:52 AM To: user@hive.apache.org

Re: Why the filter push down does not reduce the read data record count

2018-02-23 Thread Furcy Pin
Hi, Unless your table is partitioned or bucketed by myid, Hive generally requires to read through all the records to find the records that match your predicate. In other words, Hive table are generally not indexed for single record retrieval like you would expect RDBMs tables or Vertica tables

Why the filter push down does not reduce the read data record count

2018-02-23 Thread Sun, Keith
Hi, Why Hive still read so much "records" even with a filter pushdown enabled and the returned dataset would be a very small amount ( 4k out of 30billion records). The "RECORDS_IN" counter of Hive which still showed the 30billion count and also the output in the map reduce log like this :