Re: Predicate Push Down Vs On Clause

2019-04-28 Thread Vineet Garg
Hi Varun,

Yes both of these are valid ways of filtering data before join in Hive.

As long as the join is not outer and the ON condition is not on non-null
generating side of join Hive planner will try to push the predicate down to
table scan.
In fact Hive goes one step ahead and also generate IS NOT NULL predicate on
join keys (e.g. a.some_id IS NOT NULL, b.some_other_id IS NOT NULL) and
push is down to table scan if possible.

Regards,
Vineet Garg


On Sun, Apr 28, 2019 at 11:54 AM Varun Rao  wrote:

> When performing a join in Hive and then filtering the output with a where
> clause, the Hive compiler will try to filter data before the tables are
> joined. This is known as predicate pushdown (
> http://allabouthadoop.net/what-is-predicate-pushdown-in-hive/)
>
> For example:
>
> SELECT * FROM a JOIN b ON a.some_id=b.some_other_id WHERE a.some_name=6
>
> Rows from table a which have some_name = 6 will be filtered before
> performing the join, if push down predicates are enabled(hive.optimize.ppd).
>
> However, I have also learned recently that there is another way of
> filtering data from a table before joining it with another table(
> https://vinaynotes.wordpress.com/2015/10/01/hive-tips-joins-occur-before-where-clause/
> ).
>
> One can provide the condition in the ON clause, and table a will be
> filtered before the join is performed
>
> For example:
>
> SELECT * FROM a JOIN b  ON a.some_id=b.some_other_id AND a.some_name=6
>
> Are these both valid ways of filtering data before joins?
>
> Thank you
>
> Yours Truly,
> Varun Rao
>


Re: Hive CBO - issues

2019-02-12 Thread Vineet Garg
Hi Venkatesh,

1) Queries which had single level join when run always had this thing in the 
log. "Not invoking CBO because the statement has too few joins".  Can anyone 
confirm this. Join optimization happens only when there are more than 1 join? 
Is there a way around this.
This is true. In early versions Hive didn’t go through CBO for less than 2 
joins.

2) Also, Can CBO automatically do a Map join(without Hive's set auto-convert 
statement) if assesses that one of the table involved is small and could be 
broadcasted?
You don’t need CBO for this but you do need statistics. I believe you also need 
auto-convert statement. Hive will not attempt to convert to map-join without 
this flag.

Regards,
Vineet Garg

On Feb 12, 2019, at 11:48 AM, Venkatesh Selvaraj 
mailto:venkateshselva...@pinterest.com>> wrote:

Hello All,

I would like to know if anyone of you faced this issue with HIVE CBO and also 
would like to get some directions as to how to go about resolving it.

We are using Hive 1.2.1. When we were evaluating the benefits of Cost based 
Optimization(CBO), we stumbled upon this.

1) Queries which had single level join when run always had this thing in the 
log. "Not invoking CBO because the statement has too few joins".  Can anyone 
confirm this. Join optimization happens only when there are more than 1 join? 
Is there a way around this.

2) Also, Can CBO automatically do a Map join(without Hive's set auto-convert 
statement) if assesses that one of the table involved is small and could be 
broadcasted?

Thanks in advance!!

Regards,
Venkatesh Selvaraj



Re: hive 3.1 mapjoin with complex predicate produce incorrect results

2018-12-21 Thread Vineet Garg
Hi Andrey,

I tried this on latest master and wasn’t able to reproduce. Would you mind 
sharing explain plan output? (after setting hive.user.explain = false).

Vineet

> On Dec 20, 2018, at 11:37 AM, Andrey Zinovyev  
> wrote:
> 
> Hi,
> We stumbled on some weird behaviour of mapjoin in hive 3.1 
> Sample schema:
> > create table table_data(key int, a int);
> > insert into table_data values (1, 1), (2, 2), (1, 3), (2, 4), (3, 5);
> > create table table_dict(key int, b int);
> > insert into table_dict values (1, 42), (2, 43);
> 
> Query:
> >SELECT xs.key, dict.key, dict.b
> >FROM table_data as xs
> >LEFT JOIN table_dict as dict ON if((xs.key is null) or (xs.key = ''), 44, 
> >xs.key) = dict.key;
> 
> returns wrong result when hive.auto.convert.join=true;
> +-+---+-+
> | xs.key  | dict.key  | dict.b  |
> +-+---+-+
> | 1   | 1 | 42  |
> | 2   | 1 | 43  |
> | 1   | 1 | 42  |
> | 2   | 1 | 43  |
> | 3   | 1 | NULL|
> +-+---+-
> 
> xs.key != dict.key (but they should be cause I join on them) while dict.b 
> values are right
> 
> when hive.auto.convert.join=false results are currect
> +-+---+-+
> | xs.key  | dict.key  | dict.b  |
> +-+---+-+
> | 1   | 1 | 42  |
> | 1   | 1 | 42  |
> | 2   | 2 | 43  |
> | 2   | 2 | 43  |
> | 3   | NULL  | NULL|
> +-+---+-+
> 
> It is definitely caused by if expression in ON. 
> 
> 
> -- 
> Andrey



Re: [feature request] auto-increment field in Hive

2018-09-15 Thread Vineet Garg
Not exactly sequence but an ability to generate unique numbers (with 
limitation) is under development: 
https://issues.apache.org/jira/browse/HIVE-20536


On Sep 15, 2018, at 11:10 AM, Shawn Weeks 
mailto:swe...@weeksconsulting.us>> wrote:

It doesn't help if you need concurrent threads writing to a table but we are 
just using the row_number analytic and a max value subquery to generate 
sequences on our star schema warehouse. It has worked pretty well so far. To 
provide true sequence support would require changes on the hive meta database 
side as well as locking so nothing has been done on it in a long time. A simple 
UDF isn't capable of providing true unique sequence support.

Thanks
Shawn

-Original Message-
From: Jörn Franke mailto:jornfra...@gmail.com>>
Sent: Saturday, September 15, 2018 6:09 AM
To: user@hive.apache.org
Subject: Re: [feature request] auto-increment field in Hive

If you really need it then you can write an UDF for it.

On 15. Sep 2018, at 11:54, Nicolas Paris 
mailto:nicolas.pa...@riseup.net>> wrote:

Hi

Hive does not provide auto-increment columns (=sequences). Is there any
chance that feature will be provided in the future ?

This is one of the highest limitation in hive data warehousing in
replacement of RDBMS right now.

Thanks,

--
nicolas




[ANNOUNCE] Apache Hive 3.1.0 Released

2018-07-30 Thread Vineet Garg
The Apache Hive team is proud to announce the release of Apache Hive
version 3.1.0.

The Apache Hive (TM) data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top
of Apache Hadoop (TM), it provides, among others:

* Tools to enable easy data extract/transform/load (ETL)

* A mechanism to impose structure on a variety of data formats

* Access to files stored either directly in Apache HDFS (TM) or in other
  data storage systems such as Apache HBase (TM)

* Query execution via Apache Hadoop MapReduce, Apache Tez and Apache
Spark frameworks.

For Hive release details and downloads, please
visit:https://hive.apache.org/downloads.html

Hive 3.1.0 Release Notes are available here:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343014&styleName=Text&projectId=12310843


We would like to thank the many contributors who made this release
possible.

Regards,

The Apache Hive Team


[ANNOUNCE] Apache Hive 3.0.0 Released

2018-05-21 Thread Vineet Garg
The Apache Hive team is proud to announce the release of Apache Hive
version 3.0.0.

The Apache Hive (TM) data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top
of Apache Hadoop (TM), it provides, among others:

* Tools to enable easy data extract/transform/load (ETL)

* A mechanism to impose structure on a variety of data formats

* Access to files stored either directly in Apache HDFS (TM) or in other
  data storage systems such as Apache HBase (TM)

* Query execution via Apache Hadoop MapReduce and Apache Tez frameworks.

For Hive release details and downloads, please visit:
https://hive.apache.org/downloads.html

Hive 3.0.0 Release Notes are available here:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12342162&styleName=Text&projectId=12310843

We would like to thank the many contributors who made this release
possible.

Regards,

The Apache Hive Team



[ANNOUNCE] Apache Hive 3.0.0 released

2018-05-21 Thread Vineet Garg
The Apache Hive team is proud to announce the release of Apache Hive
version 3.0.0.

The Apache Hive (TM) data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top
of Apache Hadoop (TM), it provides, among others:

* Tools to enable easy data extract/transform/load (ETL)

* A mechanism to impose structure on a variety of data formats

* Access to files stored either directly in Apache HDFS (TM) or in other
  data storage systems such as Apache HBase (TM)

* Query execution via Apache Hadoop MapReduce and Apache Tez frameworks.

For Hive release details and downloads, please visit:
https://hive.apache.org/downloads.html

Hive 3.0.0 Release Notes are available here:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12342162&styleName=Text&projectId=12310843

We would like to thank the many contributors who made this release
possible.

Regards,

The Apache Hive Team



Re: Welcome Rui Li to Hive PMC

2017-05-25 Thread Vineet Garg
Congrats Rui!

> On May 24, 2017, at 9:19 PM, Xuefu Zhang  wrote:
> 
> Hi all,
> 
> It's an honer to announce that Apache Hive PMC has recently voted to invite
> Rui Li as a new Hive PMC member. Rui is a long time Hive contributor and
> committer, and has made significant contribution in Hive especially in Hive
> on Spark. Please join me in congratulating him and looking forward to a
> bigger role that he will play in Apache Hive project.
> 
> Thanks,
> Xuefu



Re: Jimmy Xiang now a Hive PMC member

2017-05-25 Thread Vineet Garg
Congrats Jimmy!

> On May 24, 2017, at 9:16 PM, Xuefu Zhang  wrote:
> 
> Hi all,
> 
> It's an honer to announce that Apache Hive PMC has recently voted to invite 
> Jimmy Xiang as a new Hive PMC member. Please join me in congratulating him 
> and looking forward to a bigger role that he will play in Apache Hive project.
> 
> Thanks,
> Xuefu



Re: Request write access to the Hive wiki

2017-03-02 Thread Vineet Garg
Thank you!

On Mar 1, 2017, at 9:54 PM, Lefty Leverenz 
mailto:leftylever...@gmail.com>> wrote:

Done.  Welcome to the Hive wiki team, Vineet!

-- Lefty


On Wed, Mar 1, 2017 at 5:32 PM, Vineet Garg 
mailto:vg...@hortonworks.com>> wrote:
Hello,

I would like to get permissions to modify Hive wiki.

Username: vgarg
email: vg...@hortonworks.com<mailto:vg...@hortonworks.com>

Thanks,
Vineet Garg




Request write access to the Hive wiki

2017-03-01 Thread Vineet Garg
Hello,

I would like to get permissions to modify Hive wiki.

Username: vgarg
email: vg...@hortonworks.com<mailto:vg...@hortonworks.com>

Thanks,
Vineet Garg


Re: What's the 'hive.metastore.fastpath' in hive site for?

2017-01-11 Thread Vineet Garg
According to this 
https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
"Used to avoid all of the proxies and object copies in the metastore. Note, if 
this is set, you MUST use a local metastore (hive.metastore.uris must be empty) 
otherwise undefined and most likely undesired behavior will result"

From: Huang Meilong mailto:ims...@outlook.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Monday, January 9, 2017 at 3:22 AM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>, 
"hive-u...@hadoop.apache.org" 
mailto:hive-u...@hadoop.apache.org>>
Subject: What's the 'hive.metastore.fastpath' in hive site for?

'hive.metastore.fastpath