Why the filter push down does not reduce the read data record count

2018-02-23 Thread Sun, Keith
Hi, Why Hive still read so much "records" even with a filter pushdown enabled and the returned dataset would be a very small amount ( 4k out of 30billion records). The "RECORDS_IN" counter of Hive which still showed the 30billion count and also the output in the map reduce log like this :

Re: Why the filter push down does not reduce the read data record count

2018-02-23 Thread Furcy Pin
Hi, Unless your table is partitioned or bucketed by myid, Hive generally requires to read through all the records to find the records that match your predicate. In other words, Hive table are generally not indexed for single record retrieval like you would expect RDBMs tables or Vertica tables to

Hive Sum Query on Decimal Column Returns Zero When Expected Result Has Too Many Digits

2018-02-23 Thread William Garvie
Hello, I have an issue where I'm running a sum query on a decimal column and I'm getting 0 as the result whenever my expected result is around 22 digits or larger. I created a stack overflow question here: https://stackoverflow.com/questions/48836455/hive-sum-query-on-decimal-column-returns-zero-

Re: Why the filter push down does not reduce the read data record count

2018-02-23 Thread Sun, Keith
I got your point and thanks for the nice slides info. So the parquet filter is not an easy thing and I will try that according to the deck. Thanks ! From: Furcy Pin Sent: Friday, February 23, 2018 3:37:52 AM To: user@hive.apache.org Subject: Re: Why the filte

Re: Why the filter push down does not reduce the read data record count

2018-02-23 Thread Furcy Pin
And if you come across a comprehensive documentation of parquet configuration, please share it!!! The Parquet documentation says that it can be configured but doesn't explain how: http://parquet.apache.org/documentation/latest/ and apparently, both TAJO ( http://tajo.apache.org/docs/0.8.0/table_ma

ODBC-hiveserver2 question

2018-02-23 Thread Andy Srine
Team, Is ADD JAR from HDFS (ADD JAR hdfs:///hive_jars/hive-contrib-2.1.1.jar;) supported in hiveserver2 via an ODBC connection? Some relevant points: - I am able to do it in Hive 2.1.1 via JDBC (beeline), but not via an ODBC client. - In Hive 1.2.1, I can add a jar from the local node,

Re: ODBC-hiveserver2 question

2018-02-23 Thread Jörn Franke
Add jar works only with local files on the Hive server. > On 23. Feb 2018, at 21:08, Andy Srine wrote: > > Team, > > Is ADD JAR from HDFS (ADD JAR hdfs:///hive_jars/hive-contrib-2.1.1.jar;) > supported in hiveserver2 via an ODBC connection? > > Some relevant points: > I am able to do it in H

Re: Proposal: File based metastore

2018-02-23 Thread Alexander Kolbasov
Would it be useful to have a tool that can save database(s), table(s) and partition(s) metadata in a file and then import this file in another metastore? These files can be stored together with data files or elsewhere. This would allow for targeted exchange of metadata between multiple HMS service

Re: ODBC-hiveserver2 question

2018-02-23 Thread Andrew Sears
Add JAR works with HDFS, though perhaps not with ODBC drivers.ADD JAR hdfs://:8020/hive_jars/hive-contrib-2.1.1.jar should work (depending on your nn port and confirm this file exists)Alternative syntaxADD JAR hdfs:/hive_jars/hive-contrib-2.1.1.jarThe ODBC driver could be having an issue with the