Hi ,
My env : Hive 1.2.1 and Parquet 1.8.1
Per my search in hive and parquet source code of version 1.8.1, I did not see
the paramters in that slides. but found that here :
And if you come across a comprehensive documentation of parquet
configuration, please share it!!!
The Parquet documentation says that it can be configured but doesn't
explain how: http://parquet.apache.org/documentation/latest/
and apparently, both TAJO (
I got your point and thanks for the nice slides info.
So the parquet filter is not an easy thing and I will try that according to the
deck.
Thanks !
From: Furcy Pin
Sent: Friday, February 23, 2018 3:37:52 AM
To: user@hive.apache.org
Hi,
Unless your table is partitioned or bucketed by myid, Hive generally
requires to read through all the records to find the records that match
your predicate.
In other words, Hive table are generally not indexed for single record
retrieval like you would expect RDBMs tables or Vertica tables
Hi,
Why Hive still read so much "records" even with a filter pushdown enabled and
the returned dataset would be a very small amount ( 4k out of 30billion
records).
The "RECORDS_IN" counter of Hive which still showed the 30billion count and
also the output in the map reduce log like this :