[jira] [Commented] (HIVE-6009) Add from_unixtime UDF that has controllable Timezone

2017-02-07 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857460#comment-15857460
 ] 

Alexander Pivovarov commented on HIVE-6009:
---

you can convert bigint to UTC timestamp and then convert UTC timestamp to GMT-5 
timestamp (EST)

{code}
select from_unixtime(129384);
2010-12-31 16:00:00   // in Greenwich

select from_utc_timestamp(from_unixtime(129384), 'GMT-5');
2010-12-31 11:00:00   // in NYC

> Add from_unixtime UDF that has controllable Timezone
> 
>
> Key: HIVE-6009
> URL: https://issues.apache.org/jira/browse/HIVE-6009
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.10.0
> Environment: CDH4.4
>Reporter: Johndee Burks
>Priority: Trivial
>
> Currently the from_unixtime UDF takes into a account timezone of the system 
> doing the transformation. I think that implementation is good, but it would 
> be nice to include or change the current UDF to have a configurable timezone. 
> It would be useful for looking at timestamp data from different regions in 
> the native region's timezone. 
> Example: 
> from_unixtime(unix_time, format, timezone)
> from_unixtime(129384, dd MMM , GMT-5)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-6009) Add from_unixtime UDF that has controllable Timezone

2017-02-07 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857460#comment-15857460
 ] 

Alexander Pivovarov edited comment on HIVE-6009 at 2/8/17 5:57 AM:
---

you can convert bigint to UTC timestamp and then convert UTC timestamp to GMT-5 
timestamp (EST)

{code}
select from_unixtime(129384);
2010-12-31 16:00:00   // in Greenwich

select from_utc_timestamp(from_unixtime(129384), 'GMT-5');
2010-12-31 11:00:00   // in NYC
{code}


was (Author: apivovarov):
you can convert bigint to UTC timestamp and then convert UTC timestamp to GMT-5 
timestamp (EST)

{code}
select from_unixtime(129384);
2010-12-31 16:00:00   // in Greenwich

select from_utc_timestamp(from_unixtime(129384), 'GMT-5');
2010-12-31 11:00:00   // in NYC

> Add from_unixtime UDF that has controllable Timezone
> 
>
> Key: HIVE-6009
> URL: https://issues.apache.org/jira/browse/HIVE-6009
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.10.0
> Environment: CDH4.4
>Reporter: Johndee Burks
>Priority: Trivial
>
> Currently the from_unixtime UDF takes into a account timezone of the system 
> doing the transformation. I think that implementation is good, but it would 
> be nice to include or change the current UDF to have a configurable timezone. 
> It would be useful for looking at timestamp data from different regions in 
> the native region's timezone. 
> Example: 
> from_unixtime(unix_time, format, timezone)
> from_unixtime(129384, dd MMM , GMT-5)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-3558) UDF LEFT(string,position) to HIVE

2017-02-07 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov resolved HIVE-3558.
---
Resolution: Won't Fix

> UDF  LEFT(string,position) to HIVE
> --
>
> Key: HIVE-3558
> URL: https://issues.apache.org/jira/browse/HIVE-3558
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.9.0
>Reporter: Aruna Babu
>Priority: Minor
> Attachments: HIVE-3558.1.patch.txt, udf_left.q, udf_left.q.out
>
>
> Introduction
>   UDF (User Defined Function) to obtain the left most 'n' characters from 
> a string in  HIVE. 
> Relevance
>   Current releases of Hive lacks a function which would returns the 
> leftmost len characters from the string str, or NULL if any argument is NULL. 
>   
> The function LEFT(string,length)  would return the leftmost 'n' characters 
> from the string , or NULL if any argument is NULL which would be useful while 
> using HiveQL. This would find its use  in all the technical aspects where the 
> concept of strings are used.
> Functionality :-
> Function Name: LEFT(string,length) 
>
> Returns the leftmost length characters from the string  or NULL if any 
> argument is NULL.  
> Example: hive>SELECT LEFT('https://www.irctc.co.in',5);
>   -> 'https'
> Usage :-
> Case 1: To query a table to find details based on an https request
> Table :-Transaction
> Request_id|date|period_id|url_name
> 0001|01/07/2012|110001|https://www.irctc.co.in
> 0002|02/07/2012|110001|https://nextstep.tcs.com
> 0003|03/07/2012|110001|https://www.hdfcbank.com
> 0005|01/07/2012|110001|http://www.lmnm.co.in
> 0006|08/07/2012|110001|http://nextstart.com
> 0007|10/07/2012|110001|https://netbanking.icicibank.com
> 0012|21/07/2012|110001|http://www.people.co.in
> 0026|08/07/2012|110001|http://nextprobs.com
> 00023|25/07/2012|110001|https://netbanking.canarabank.com
> Query : select * from transaction where LEFT(url_name,5)='https';
> Result :-
> 0001|01/07/2012|110001|https://www.irctc.com
> 0002|02/07/2012|110001|https://nextstep.tcs.com  
> 0003|03/07/2012|110001|https://www.hdfcbank.com
> 0007|10/07/2012|110001|https://netbanking.icicibank.com
> 00023|25/07/2012|110001|https://netbanking.canarabank.com



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-3558) UDF LEFT(string,position) to HIVE

2017-02-07 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857451#comment-15857451
 ] 

Alexander Pivovarov commented on HIVE-3558:
---

You can use substr to get LEFT and RIGHT
{code}
// get characters from 1st to 5th included
SELECT substr('https://www.irctc.co.in', 1, 5);
https

// all RIGHT characters starting from 6th
SELECT substr('https://www.irctc.co.in', 6);
://www.irctc.co.in
{code}

> UDF  LEFT(string,position) to HIVE
> --
>
> Key: HIVE-3558
> URL: https://issues.apache.org/jira/browse/HIVE-3558
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.9.0
>Reporter: Aruna Babu
>Priority: Minor
> Attachments: HIVE-3558.1.patch.txt, udf_left.q, udf_left.q.out
>
>
> Introduction
>   UDF (User Defined Function) to obtain the left most 'n' characters from 
> a string in  HIVE. 
> Relevance
>   Current releases of Hive lacks a function which would returns the 
> leftmost len characters from the string str, or NULL if any argument is NULL. 
>   
> The function LEFT(string,length)  would return the leftmost 'n' characters 
> from the string , or NULL if any argument is NULL which would be useful while 
> using HiveQL. This would find its use  in all the technical aspects where the 
> concept of strings are used.
> Functionality :-
> Function Name: LEFT(string,length) 
>
> Returns the leftmost length characters from the string  or NULL if any 
> argument is NULL.  
> Example: hive>SELECT LEFT('https://www.irctc.co.in',5);
>   -> 'https'
> Usage :-
> Case 1: To query a table to find details based on an https request
> Table :-Transaction
> Request_id|date|period_id|url_name
> 0001|01/07/2012|110001|https://www.irctc.co.in
> 0002|02/07/2012|110001|https://nextstep.tcs.com
> 0003|03/07/2012|110001|https://www.hdfcbank.com
> 0005|01/07/2012|110001|http://www.lmnm.co.in
> 0006|08/07/2012|110001|http://nextstart.com
> 0007|10/07/2012|110001|https://netbanking.icicibank.com
> 0012|21/07/2012|110001|http://www.people.co.in
> 0026|08/07/2012|110001|http://nextprobs.com
> 00023|25/07/2012|110001|https://netbanking.canarabank.com
> Query : select * from transaction where LEFT(url_name,5)='https';
> Result :-
> 0001|01/07/2012|110001|https://www.irctc.com
> 0002|02/07/2012|110001|https://nextstep.tcs.com  
> 0003|03/07/2012|110001|https://www.hdfcbank.com
> 0007|10/07/2012|110001|https://netbanking.icicibank.com
> 00023|25/07/2012|110001|https://netbanking.canarabank.com



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-2710) row_sequence UDF is not documented

2017-02-07 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov resolved HIVE-2710.
---
Resolution: Won't Fix

> row_sequence UDF is not documented
> --
>
> Key: HIVE-2710
> URL: https://issues.apache.org/jira/browse/HIVE-2710
> Project: Hive
>  Issue Type: Bug
>Reporter: Sho Shimauchi
>Priority: Minor
>
> row_sequence UDF was implemented in HIVE-1304, however the function is not 
> documented on hive wiki.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-2710) row_sequence UDF is not documented

2017-02-07 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857440#comment-15857440
 ] 

Alexander Pivovarov commented on HIVE-2710:
---

row_sequence UDF was moved to contrib package.
Usually we do not describe contrib package UDFs in LanguageManual UDF

> row_sequence UDF is not documented
> --
>
> Key: HIVE-2710
> URL: https://issues.apache.org/jira/browse/HIVE-2710
> Project: Hive
>  Issue Type: Bug
>Reporter: Sho Shimauchi
>Priority: Minor
>
> row_sequence UDF was implemented in HIVE-1304, however the function is not 
> documented on hive wiki.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-6046) add UDF for converting date time from one presentation to another

2017-02-07 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-6046:
--
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

> add  UDF for converting date time from one presentation to another
> --
>
> Key: HIVE-6046
> URL: https://issues.apache.org/jira/browse/HIVE-6046
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.13.0
>Reporter: Kostiantyn Kudriavtsev
>Assignee: Kostiantyn Kudriavtsev
> Attachments: Hive-6046-Feb15.patch, Hive-6046.patch, HIVE-6046.patch
>
>
> it'd be nice to have function for converting datetime to different formats, 
> for example:
> format_date('2013-12-12 00:00:00.0', '-MM-dd HH:mm:ss.S', '/MM/dd')
> There are two signatures to facilitate further using:
> format_date(datetime, fromFormat, toFormat)
> format_date(timestamp, toFormat)
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-6214) Need a UDF to convert a Date String from any standard format to another. Should be able to provide the Date String, current format and to the format into which it need to

2017-02-07 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov resolved HIVE-6214.
---
Resolution: Duplicate

> Need a UDF to convert a Date String from any standard format to another. 
> Should be able to provide the Date String, current format and to the format 
> into which it need to be converted and returned as String output of UDF
> 
>
> Key: HIVE-6214
> URL: https://issues.apache.org/jira/browse/HIVE-6214
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
> Environment: Software
>Reporter: Rony Pius Manakkal
>Priority: Minor
>  Labels: features
>
> Need a UDF to convert a Date String from any standard format to another. 
> Should be able to provide the Date String, current format and to the format 
> into which it need to be converted and returned as String output of UDF
> Example : String convertDateFormat(String dateString, String 
> currentDateFormat, String requiredFormat);



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-9988) Evaluating UDF before query is run

2017-02-07 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857435#comment-15857435
 ] 

Alexander Pivovarov commented on HIVE-9988:
---

You can assign the expression to variable before query is evaluated and then 
use the variable in WHERE
{code}
set dt=from_unixtime(unix_timestamp(),'MMdd');

select * from A where dt=${hiveconf:dt};
{code}

> Evaluating UDF before query is run
> --
>
> Key: HIVE-9988
> URL: https://issues.apache.org/jira/browse/HIVE-9988
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ådne Brunborg
>
> When using UDFs on partition column in Hive, all partitions are scanned 
> before the UDF is resolved. 
> If the UDF could be evaluated before query is run, this would greatly improve 
> performance in cases like this.
> Example - the table has a partition by datestamp (bigint): 
> The following where clause touches upon all 82 partitions:
> {{WHERE datestamp=cast(from_unixtime(unix_timestamp(),'MMdd') as bigint)}}
> {{15/03/16 09:21:53 INFO mapred.FileInputFormat: Total input paths to process 
> : 82}}
> …whereas the following only touches the one partition:
> {{WHERE datestamp=20150316}}
> {{15/03/16 09:23:06 INFO input.FileInputFormat: Total input paths to process 
> : 1}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15746) Fix default delimiter2 in str_to_map UDF or in method description

2017-02-02 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851014#comment-15851014
 ] 

Alexander Pivovarov commented on HIVE-15746:


str_to_map documentation was fixed as well (LanguageManual UDF wiki) 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions

> Fix default delimiter2 in str_to_map UDF or in method description
> -
>
> Key: HIVE-15746
> URL: https://issues.apache.org/jira/browse/HIVE-15746
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 2.1.1
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Trivial
> Fix For: 2.2.0
>
> Attachments: HIVE-15746.1.patch
>
>
> According to UDF wiki and to GenericUDFStringToMap.java class comments 
> default delimiter 2 should be '='.
> But in the code default_del2 = ":"
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFStringToMap.java#L53
> We need to fix code or fix the method description and UDF wiki
> Let me know what you think?
> {code}
> str_to_map("a=1,b=2")
> vs
> str_to_map("a:1,b:2")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15746) Fix default delimiter2 in str_to_map UDF or in method description

2017-02-02 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-15746:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Fix default delimiter2 in str_to_map UDF or in method description
> -
>
> Key: HIVE-15746
> URL: https://issues.apache.org/jira/browse/HIVE-15746
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 2.1.1
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Trivial
> Fix For: 2.2.0
>
> Attachments: HIVE-15746.1.patch
>
>
> According to UDF wiki and to GenericUDFStringToMap.java class comments 
> default delimiter 2 should be '='.
> But in the code default_del2 = ":"
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFStringToMap.java#L53
> We need to fix code or fix the method description and UDF wiki
> Let me know what you think?
> {code}
> str_to_map("a=1,b=2")
> vs
> str_to_map("a:1,b:2")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15746) Fix default delimiter2 in str_to_map UDF or in method description

2017-02-02 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-15746:
---
Affects Version/s: (was: 2.2.0)
   2.1.1
Fix Version/s: 2.2.0

> Fix default delimiter2 in str_to_map UDF or in method description
> -
>
> Key: HIVE-15746
> URL: https://issues.apache.org/jira/browse/HIVE-15746
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 2.1.1
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Trivial
> Fix For: 2.2.0
>
> Attachments: HIVE-15746.1.patch
>
>
> According to UDF wiki and to GenericUDFStringToMap.java class comments 
> default delimiter 2 should be '='.
> But in the code default_del2 = ":"
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFStringToMap.java#L53
> We need to fix code or fix the method description and UDF wiki
> Let me know what you think?
> {code}
> str_to_map("a=1,b=2")
> vs
> str_to_map("a:1,b:2")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15746) Fix default delimiter2 in str_to_map UDF or in method description

2017-01-28 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-15746:
---
Attachment: HIVE-15746.1.patch

I changed default delimiter2 value to ":"  in UDF method description

> Fix default delimiter2 in str_to_map UDF or in method description
> -
>
> Key: HIVE-15746
> URL: https://issues.apache.org/jira/browse/HIVE-15746
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 2.2.0
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Trivial
> Attachments: HIVE-15746.1.patch
>
>
> According to UDF wiki and to GenericUDFStringToMap.java class comments 
> default delimiter 2 should be '='.
> But in the code default_del2 = ":"
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFStringToMap.java#L53
> We need to fix code or fix the method description and UDF wiki
> Let me know what you think?
> {code}
> str_to_map("a=1,b=2")
> vs
> str_to_map("a:1,b:2")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15746) Fix default delimiter2 in str_to_map UDF or in method description

2017-01-28 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-15746:
---
Status: Patch Available  (was: Open)

> Fix default delimiter2 in str_to_map UDF or in method description
> -
>
> Key: HIVE-15746
> URL: https://issues.apache.org/jira/browse/HIVE-15746
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 2.2.0
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Trivial
> Attachments: HIVE-15746.1.patch
>
>
> According to UDF wiki and to GenericUDFStringToMap.java class comments 
> default delimiter 2 should be '='.
> But in the code default_del2 = ":"
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFStringToMap.java#L53
> We need to fix code or fix the method description and UDF wiki
> Let me know what you think?
> {code}
> str_to_map("a=1,b=2")
> vs
> str_to_map("a:1,b:2")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11593) Add aes_encrypt and aes_decrypt UDFs

2017-01-27 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-11593:
---
Affects Version/s: (was: 2.2.0)

> Add aes_encrypt and aes_decrypt UDFs
> 
>
> Key: HIVE-11593
> URL: https://issues.apache.org/jira/browse/HIVE-11593
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11593.1.patch, HIVE-11593.2.patch
>
>
> AES (Advanced Encryption Standard) algorithm.
> Oracle JRE supports AES-128 out of the box
> AES-192 and AES-256 are supported if Cryptography Extension (JCE) Unlimited 
> Strength Jurisdiction Policy Files are installed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15746) Fix default delimiter2 in str_to_map UDF or in method description

2017-01-27 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-15746:
---
Affects Version/s: 2.2.0

> Fix default delimiter2 in str_to_map UDF or in method description
> -
>
> Key: HIVE-15746
> URL: https://issues.apache.org/jira/browse/HIVE-15746
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 2.2.0
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Trivial
>
> According to UDF wiki and to GenericUDFStringToMap.java class comments 
> default delimiter 2 should be '='.
> But in the code default_del2 = ":"
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFStringToMap.java#L53
> We need to fix code or fix the method description and UDF wiki
> Let me know what you think?
> {code}
> str_to_map("a=1,b=2")
> vs
> str_to_map("a:1,b:2")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11593) Add aes_encrypt and aes_decrypt UDFs

2017-01-27 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-11593:
---
Affects Version/s: 2.2.0

> Add aes_encrypt and aes_decrypt UDFs
> 
>
> Key: HIVE-11593
> URL: https://issues.apache.org/jira/browse/HIVE-11593
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 2.2.0
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11593.1.patch, HIVE-11593.2.patch
>
>
> AES (Advanced Encryption Standard) algorithm.
> Oracle JRE supports AES-128 out of the box
> AES-192 and AES-256 are supported if Cryptography Extension (JCE) Unlimited 
> Strength Jurisdiction Policy Files are installed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11593) Add aes_encrypt and aes_decrypt UDFs

2015-09-05 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732169#comment-14732169
 ] 

Alexander Pivovarov commented on HIVE-11593:


Added functions descriptions to the wiki 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Misc.Functions

> Add aes_encrypt and aes_decrypt UDFs
> 
>
> Key: HIVE-11593
> URL: https://issues.apache.org/jira/browse/HIVE-11593
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11593.1.patch, HIVE-11593.2.patch
>
>
> AES (Advanced Encryption Standard) algorithm.
> Oracle JRE supports AES-128 out of the box
> AES-192 and AES-256 are supported if Cryptography Extension (JCE) Unlimited 
> Strength Jurisdiction Policy Files are installed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11593) Add aes_encrypt and aes_decrypt UDFs

2015-08-27 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717087#comment-14717087
 ] 

Alexander Pivovarov commented on HIVE-11593:


The error is not related to the patch #2.  Next build 5087 also has this test 
(TestHCatClient.testTableSchemaPropagation) failed

 Add aes_encrypt and aes_decrypt UDFs
 

 Key: HIVE-11593
 URL: https://issues.apache.org/jira/browse/HIVE-11593
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-11593.1.patch, HIVE-11593.2.patch


 AES (Advanced Encryption Standard) algorithm.
 Oracle JRE supports AES-128 out of the box
 AES-192 and AES-256 are supported if Cryptography Extension (JCE) Unlimited 
 Strength Jurisdiction Policy Files are installed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11593) Add aes_encrypt and aes_decrypt UDFs

2015-08-26 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-11593:
---
Attachment: HIVE-11593.2.patch

patch #2
- move Cipher.getInstance to initialize

 Add aes_encrypt and aes_decrypt UDFs
 

 Key: HIVE-11593
 URL: https://issues.apache.org/jira/browse/HIVE-11593
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-11593.1.patch, HIVE-11593.2.patch


 AES (Advanced Encryption Standard) algorithm.
 Oracle JRE supports AES-128 out of the box
 AES-192 and AES-256 are supported if Cryptography Extension (JCE) Unlimited 
 Strength Jurisdiction Policy Files are installed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11593) Add aes_encrypt and aes_decrypt UDFs

2015-08-18 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-11593:
---
Description: 
AES (Advanced Encryption Standard) algorithm.

Oracle JRE supports AES-128 out of the box
AES-192 and AES-256 are supported if Cryptography Extension (JCE) Unlimited 
Strength Jurisdiction Policy Files are installed


  was:
AES (Advanced Encryption Standard) algorithm.

Oracle JRE supports AES-128 out of the box
AES-192 and AES-256 are supported if Cryptography Extension (JCE) Unlimited 
Strength Jurisdiction Policy Files installed



 Add aes_encrypt and aes_decrypt UDFs
 

 Key: HIVE-11593
 URL: https://issues.apache.org/jira/browse/HIVE-11593
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov

 AES (Advanced Encryption Standard) algorithm.
 Oracle JRE supports AES-128 out of the box
 AES-192 and AES-256 are supported if Cryptography Extension (JCE) Unlimited 
 Strength Jurisdiction Policy Files are installed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11593) Add aes_encrypt and aes_decrypt UDFs

2015-08-18 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-11593:
---
Attachment: HIVE-11593.1.patch

patch #1

 Add aes_encrypt and aes_decrypt UDFs
 

 Key: HIVE-11593
 URL: https://issues.apache.org/jira/browse/HIVE-11593
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-11593.1.patch


 AES (Advanced Encryption Standard) algorithm.
 Oracle JRE supports AES-128 out of the box
 AES-192 and AES-256 are supported if Cryptography Extension (JCE) Unlimited 
 Strength Jurisdiction Policy Files are installed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-1262) Add security/checksum UDFs sha,crc32,md5,aes_encrypt, and aes_decrypt

2015-08-18 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov resolved HIVE-1262.
---
Resolution: Duplicate

 Add security/checksum UDFs sha,crc32,md5,aes_encrypt, and aes_decrypt
 -

 Key: HIVE-1262
 URL: https://issues.apache.org/jira/browse/HIVE-1262
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.6.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: hive-1262-1.patch.txt


 Add security/checksum UDFs sha,crc32,md5,aes_encrypt, and aes_decrypt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10276) Implement date_format(timestamp, fmt) UDF

2015-08-14 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697363#comment-14697363
 ] 

Alexander Pivovarov commented on HIVE-10276:


Hive documentation for date_format UDF clearly says - Supported formats are 
Java SimpleDateFormat formats
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions

Feel free to submit patch for date_format_mysql UDF

 Implement date_format(timestamp, fmt) UDF
 -

 Key: HIVE-10276
 URL: https://issues.apache.org/jira/browse/HIVE-10276
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Fix For: 1.2.0

 Attachments: HIVE-10276.01.patch


 date_format(date/timestamp/string, fmt) converts a date/timestamp/string to a 
 value of String in the format specified by the java date format fmt.
 Supported formats listed here:
 https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10485) Create md5 UDF

2015-08-14 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698117#comment-14698117
 ] 

Alexander Pivovarov commented on HIVE-10485:


Ruslan, try unhex(md5(string/binary))

 Create md5 UDF
 --

 Key: HIVE-10485
 URL: https://issues.apache.org/jira/browse/HIVE-10485
 Project: Hive
  Issue Type: Task
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-10485.1.patch, HIVE-10485.2.patch, 
 HIVE-10485.3.patch


 MD5(str)
 Calculates an MD5 128-bit checksum for the string. The value is returned as a 
 string of 32 hex digits, or NULL if the argument was NULL. The return value 
 can, for example, be used as a hash key.
 Example:
 {code}
 SELECT MD5('udf_md5');
 'ce62ef0d2d27dc37b6d488b92f4b24fd'
 {code}
 online md5 generator: http://www.md5.cz/
 MySQL has md5 function: 
 https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_md5
 PostgreSQL also has md5 function: 
 http://www.postgresql.org/docs/9.1/static/functions-string.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11103) Add banker's rounding BROUND UDF

2015-08-13 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695730#comment-14695730
 ] 

Alexander Pivovarov commented on HIVE-11103:


Committed to master and branch-1. Thank you Jason for your review!

 Add banker's rounding BROUND UDF
 

 Key: HIVE-11103
 URL: https://issues.apache.org/jira/browse/HIVE-11103
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-11103.1.patch, HIVE-11103.1.patch, 
 HIVE-11103.2.patch, HIVE-11103.4.patch


 Banker's rounding: the value is rounded to the nearest even number. Also 
 known as Gaussian rounding, and, in German, mathematische Rundung.
 Example
 {code}
   2 digits2 digits
 UnroundedStandard roundingGaussian rounding
   54.1754  54.18  54.18
  343.2050 343.21 343.20
 +106.2038+106.20+106.20 
 =======
  503.5842 503.59 503.58
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11103) Add banker's rounding BROUND UDF

2015-08-10 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-11103:
---
Attachment: HIVE-11103.4.patch

patch #4:
- added BRoubs udf to Vectorizer 
- added non-default constructor to BRoundWithNumDigitsDoubleToDouble and 
FuncRoundWithNumDigitsDecimalToDecimal
- added BRound to TestVectorizationContext
- fixed vector_bround.q.out


 Add banker's rounding BROUND UDF
 

 Key: HIVE-11103
 URL: https://issues.apache.org/jira/browse/HIVE-11103
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-11103.1.patch, HIVE-11103.1.patch, 
 HIVE-11103.2.patch, HIVE-11103.4.patch


 Banker's rounding: the value is rounded to the nearest even number. Also 
 known as Gaussian rounding, and, in German, mathematische Rundung.
 Example
 {code}
   2 digits2 digits
 UnroundedStandard roundingGaussian rounding
   54.1754  54.18  54.18
  343.2050 343.21 343.20
 +106.2038+106.20+106.20 
 =======
  503.5842 503.59 503.58
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11143) Tests udf_from_utc_timestamp.q/udf_to_utc_timestamp.q do not work with updated Java timezone information

2015-07-27 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643033#comment-14643033
 ] 

Alexander Pivovarov commented on HIVE-11143:


+1

 Tests udf_from_utc_timestamp.q/udf_to_utc_timestamp.q do not work with 
 updated Java timezone information
 

 Key: HIVE-11143
 URL: https://issues.apache.org/jira/browse/HIVE-11143
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-11143.1.patch


 It looks like there were recent changes to the Europe/Moscow time zone in 
 2014. When udf_from_utc_timestamp.q/udf_to_utc_timestamp.q are run with more 
 recent versions of JDK or with an updated time zone database, the tests fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11103) Add banker's rounding BROUND UDF

2015-07-27 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643001#comment-14643001
 ] 

Alexander Pivovarov commented on HIVE-11103:


I do not think 2 tests failure relates to patch #2

 Add banker's rounding BROUND UDF
 

 Key: HIVE-11103
 URL: https://issues.apache.org/jira/browse/HIVE-11103
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-11103.1.patch, HIVE-11103.1.patch, 
 HIVE-11103.2.patch


 Banker's rounding: the value is rounded to the nearest even number. Also 
 known as Gaussian rounding, and, in German, mathematische Rundung.
 Example
 {code}
   2 digits2 digits
 UnroundedStandard roundingGaussian rounding
   54.1754  54.18  54.18
  343.2050 343.21 343.20
 +106.2038+106.20+106.20 
 =======
  503.5842 503.59 503.58
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11103) Add banker's rounding BROUND UDF

2015-07-27 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-11103:
---
Attachment: HIVE-11103.2.patch

patch #2
- added vector_bround.q
- added testMathExprBround

 Add banker's rounding BROUND UDF
 

 Key: HIVE-11103
 URL: https://issues.apache.org/jira/browse/HIVE-11103
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-11103.1.patch, HIVE-11103.1.patch, 
 HIVE-11103.2.patch


 Banker's rounding: the value is rounded to the nearest even number. Also 
 known as Gaussian rounding, and, in German, mathematische Rundung.
 Example
 {code}
   2 digits2 digits
 UnroundedStandard roundingGaussian rounding
   54.1754  54.18  54.18
  343.2050 343.21 343.20
 +106.2038+106.20+106.20 
 =======
  503.5842 503.59 503.58
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo

2015-07-07 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov reopened HIVE-9557:
---

I take it

 create UDF to measure strings similarity using Cosine Similarity algo
 -

 Key: HIVE-9557
 URL: https://issues.apache.org/jira/browse/HIVE-9557
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
  Labels: CosineSimilarity, SimilarityMetric, UDF

 algo description http://en.wikipedia.org/wiki/Cosine_similarity
 {code}
 --one word different, total 2 words
 str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f
 {code}
 reference implementation:
 https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo

2015-06-27 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604327#comment-14604327
 ] 

Alexander Pivovarov commented on HIVE-9557:
---

Hi Nishant, if I click on RB link I get this
{code}
You don't have access to this review request.

This review request is private. You must be a requested reviewer, either 
directly or on a requested group, and have permission to access the repository 
in order to view this review request.
{code}

I recommend to use RBTTools to upload and update patches.
for initial upload
{code}
rbt post -g yes
{code}
to update
{code}
rbt post -u -g yes 
{code}
https://www.reviewboard.org/downloads/rbtools/


 create UDF to measure strings similarity using Cosine Similarity algo
 -

 Key: HIVE-9557
 URL: https://issues.apache.org/jira/browse/HIVE-9557
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Nishant Kelkar
  Labels: CosineSimilarity, SimilarityMetric, UDF
 Attachments: HIVE-9557.1.patch, udf_cosine_similarity-v01.patch


 algo description http://en.wikipedia.org/wiki/Cosine_similarity
 {code}
 --one word different, total 2 words
 str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f
 {code}
 reference implementation:
 https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11134) HS2 should log open session failure

2015-06-27 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604359#comment-14604359
 ] 

Alexander Pivovarov commented on HIVE-11134:


+1

 HS2 should log open session failure
 ---

 Key: HIVE-11134
 URL: https://issues.apache.org/jira/browse/HIVE-11134
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-11134.1.patch


 HiveServer2 should log OpenSession failure.  If beeline is not running with 
 --verbose=true all stack trace information is not available for later 
 debugging, as it is not currently logged in server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7150) FileInputStream is not closed in HiveConnection#getHttpClient()

2015-06-27 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-7150:
--
Component/s: JDBC

 FileInputStream is not closed in HiveConnection#getHttpClient()
 ---

 Key: HIVE-7150
 URL: https://issues.apache.org/jira/browse/HIVE-7150
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Reporter: Ted Yu
Assignee: Alexander Pivovarov
  Labels: jdbc
 Attachments: HIVE-7150.1.patch, HIVE-7150.2.patch, HIVE-7150.3.patch, 
 HIVE-7150.4.patch


 Here is related code:
 {code}
 sslTrustStore.load(new FileInputStream(sslTrustStorePath),
 sslTrustStorePassword.toCharArray());
 {code}
 The FileInputStream is not closed upon returning from the method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7180) BufferedReader is not closed in MetaStoreSchemaInfo ctor

2015-06-27 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-7180:
--
Attachment: HIVE-7180.4.patch

patch #4
- using java 7 try-with-resources

 BufferedReader is not closed in MetaStoreSchemaInfo ctor
 

 Key: HIVE-7180
 URL: https://issues.apache.org/jira/browse/HIVE-7180
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Ted Yu
Assignee: Alexander Pivovarov
Priority: Minor
  Labels: patch
 Attachments: HIVE-7180.3.patch, HIVE-7180.4.patch, HIVE-7180.patch, 
 HIVE-7180_001.patch


 Here is related code:
 {code}
   BufferedReader bfReader =
 new BufferedReader(new FileReader(upgradeListFile));
   String currSchemaVersion;
   while ((currSchemaVersion = bfReader.readLine()) != null) {
 upgradeOrderList.add(currSchemaVersion.trim());
 {code}
 BufferedReader / FileReader should be closed upon return from ctor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7150) FileInputStream is not closed in HiveConnection#getHttpClient()

2015-06-27 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-7150:
--
Attachment: HIVE-7150.4.patch

patch #4
- using try-with-resources

 FileInputStream is not closed in HiveConnection#getHttpClient()
 ---

 Key: HIVE-7150
 URL: https://issues.apache.org/jira/browse/HIVE-7150
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Alexander Pivovarov
  Labels: jdbc
 Attachments: HIVE-7150.1.patch, HIVE-7150.2.patch, HIVE-7150.3.patch, 
 HIVE-7150.4.patch


 Here is related code:
 {code}
 sslTrustStore.load(new FileInputStream(sslTrustStorePath),
 sslTrustStorePassword.toCharArray());
 {code}
 The FileInputStream is not closed upon returning from the method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7180) BufferedReader is not closed in MetaStoreSchemaInfo ctor

2015-06-27 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-7180:
--
Component/s: Metastore

 BufferedReader is not closed in MetaStoreSchemaInfo ctor
 

 Key: HIVE-7180
 URL: https://issues.apache.org/jira/browse/HIVE-7180
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.1
Reporter: Ted Yu
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: HIVE-7180.3.patch, HIVE-7180.4.patch, HIVE-7180.patch, 
 HIVE-7180_001.patch


 Here is related code:
 {code}
   BufferedReader bfReader =
 new BufferedReader(new FileReader(upgradeListFile));
   String currSchemaVersion;
   while ((currSchemaVersion = bfReader.readLine()) != null) {
 upgradeOrderList.add(currSchemaVersion.trim());
 {code}
 BufferedReader / FileReader should be closed upon return from ctor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7180) BufferedReader is not closed in MetaStoreSchemaInfo ctor

2015-06-27 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-7180:
--
Labels:   (was: patch)

 BufferedReader is not closed in MetaStoreSchemaInfo ctor
 

 Key: HIVE-7180
 URL: https://issues.apache.org/jira/browse/HIVE-7180
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.1
Reporter: Ted Yu
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: HIVE-7180.3.patch, HIVE-7180.4.patch, HIVE-7180.patch, 
 HIVE-7180_001.patch


 Here is related code:
 {code}
   BufferedReader bfReader =
 new BufferedReader(new FileReader(upgradeListFile));
   String currSchemaVersion;
   while ((currSchemaVersion = bfReader.readLine()) != null) {
 upgradeOrderList.add(currSchemaVersion.trim());
 {code}
 BufferedReader / FileReader should be closed upon return from ctor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo

2015-06-27 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604360#comment-14604360
 ] 

Alexander Pivovarov commented on HIVE-9557:
---

I can open the RB now.
I noticed that some people use github PR now as an alternative to RB. e.g. 
https://issues.apache.org/jira/browse/HIVE-11134

 create UDF to measure strings similarity using Cosine Similarity algo
 -

 Key: HIVE-9557
 URL: https://issues.apache.org/jira/browse/HIVE-9557
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Nishant Kelkar
  Labels: CosineSimilarity, SimilarityMetric, UDF
 Attachments: HIVE-9557.1.patch, udf_cosine_similarity-v01.patch


 algo description http://en.wikipedia.org/wiki/Cosine_similarity
 {code}
 --one word different, total 2 words
 str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f
 {code}
 reference implementation:
 https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo

2015-06-27 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604346#comment-14604346
 ] 

Alexander Pivovarov commented on HIVE-9557:
---

The review request should be public.
https://www.reviewboard.org/docs/manual/2.5/admin/configuration/access-control/#access-control

 create UDF to measure strings similarity using Cosine Similarity algo
 -

 Key: HIVE-9557
 URL: https://issues.apache.org/jira/browse/HIVE-9557
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Nishant Kelkar
  Labels: CosineSimilarity, SimilarityMetric, UDF
 Attachments: HIVE-9557.1.patch, udf_cosine_similarity-v01.patch


 algo description http://en.wikipedia.org/wiki/Cosine_similarity
 {code}
 --one word different, total 2 words
 str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f
 {code}
 reference implementation:
 https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11103) Add banker's rounding BROUND UDF

2015-06-27 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-11103:
---
Attachment: HIVE-11103.1.patch

attaching patch #1 again

 Add banker's rounding BROUND UDF
 

 Key: HIVE-11103
 URL: https://issues.apache.org/jira/browse/HIVE-11103
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-11103.1.patch, HIVE-11103.1.patch


 Banker's rounding: the value is rounded to the nearest even number. Also 
 known as Gaussian rounding, and, in German, mathematische Rundung.
 Example
 {code}
   2 digits2 digits
 UnroundedStandard roundingGaussian rounding
   54.1754  54.18  54.18
  343.2050 343.21 343.20
 +106.2038+106.20+106.20 
 =======
  503.5842 503.59 503.58
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7180) BufferedReader is not closed in MetaStoreSchemaInfo ctor

2015-06-27 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604536#comment-14604536
 ] 

Alexander Pivovarov commented on HIVE-7180:
---

The errors are unrelated. Previous build (4418) had the same 10 errors

 BufferedReader is not closed in MetaStoreSchemaInfo ctor
 

 Key: HIVE-7180
 URL: https://issues.apache.org/jira/browse/HIVE-7180
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.1
Reporter: Ted Yu
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: HIVE-7180.3.patch, HIVE-7180.4.patch, HIVE-7180.patch, 
 HIVE-7180_001.patch


 Here is related code:
 {code}
   BufferedReader bfReader =
 new BufferedReader(new FileReader(upgradeListFile));
   String currSchemaVersion;
   while ((currSchemaVersion = bfReader.readLine()) != null) {
 upgradeOrderList.add(currSchemaVersion.trim());
 {code}
 BufferedReader / FileReader should be closed upon return from ctor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-7150) FileInputStream is not closed in HiveConnection#getHttpClient()

2015-06-26 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov reassigned HIVE-7150:
-

Assignee: Alexander Pivovarov

 FileInputStream is not closed in HiveConnection#getHttpClient()
 ---

 Key: HIVE-7150
 URL: https://issues.apache.org/jira/browse/HIVE-7150
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Alexander Pivovarov
  Labels: jdbc
 Attachments: HIVE-7150.1.patch, HIVE-7150.2.patch


 Here is related code:
 {code}
 sslTrustStore.load(new FileInputStream(sslTrustStorePath),
 sslTrustStorePassword.toCharArray());
 {code}
 The FileInputStream is not closed upon returning from the method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7150) FileInputStream is not closed in HiveConnection#getHttpClient()

2015-06-26 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-7150:
--
Attachment: HIVE-7150.3.patch

patch #3
- close FileInputStream in 3 places in HiveConnection class

 FileInputStream is not closed in HiveConnection#getHttpClient()
 ---

 Key: HIVE-7150
 URL: https://issues.apache.org/jira/browse/HIVE-7150
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Alexander Pivovarov
  Labels: jdbc
 Attachments: HIVE-7150.1.patch, HIVE-7150.2.patch, HIVE-7150.3.patch


 Here is related code:
 {code}
 sslTrustStore.load(new FileInputStream(sslTrustStorePath),
 sslTrustStorePassword.toCharArray());
 {code}
 The FileInputStream is not closed upon returning from the method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-7180) BufferedReader is not closed in MetaStoreSchemaInfo ctor

2015-06-26 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov reassigned HIVE-7180:
-

Assignee: Alexander Pivovarov  (was: skrho)

 BufferedReader is not closed in MetaStoreSchemaInfo ctor
 

 Key: HIVE-7180
 URL: https://issues.apache.org/jira/browse/HIVE-7180
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Ted Yu
Assignee: Alexander Pivovarov
Priority: Minor
  Labels: patch
 Attachments: HIVE-7180.patch, HIVE-7180_001.patch


 Here is related code:
 {code}
   BufferedReader bfReader =
 new BufferedReader(new FileReader(upgradeListFile));
   String currSchemaVersion;
   while ((currSchemaVersion = bfReader.readLine()) != null) {
 upgradeOrderList.add(currSchemaVersion.trim());
 {code}
 BufferedReader / FileReader should be closed upon return from ctor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7180) BufferedReader is not closed in MetaStoreSchemaInfo ctor

2015-06-26 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603939#comment-14603939
 ] 

Alexander Pivovarov commented on HIVE-7180:
---

I do not think we should change constructor signature. Let me fix it

 BufferedReader is not closed in MetaStoreSchemaInfo ctor
 

 Key: HIVE-7180
 URL: https://issues.apache.org/jira/browse/HIVE-7180
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Ted Yu
Assignee: Alexander Pivovarov
Priority: Minor
  Labels: patch
 Attachments: HIVE-7180.patch, HIVE-7180_001.patch


 Here is related code:
 {code}
   BufferedReader bfReader =
 new BufferedReader(new FileReader(upgradeListFile));
   String currSchemaVersion;
   while ((currSchemaVersion = bfReader.readLine()) != null) {
 upgradeOrderList.add(currSchemaVersion.trim());
 {code}
 BufferedReader / FileReader should be closed upon return from ctor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7180) BufferedReader is not closed in MetaStoreSchemaInfo ctor

2015-06-26 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-7180:
--
Attachment: HIVE-7180.3.patch

patch #3
- add finally block and close reader quietly

 BufferedReader is not closed in MetaStoreSchemaInfo ctor
 

 Key: HIVE-7180
 URL: https://issues.apache.org/jira/browse/HIVE-7180
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Ted Yu
Assignee: Alexander Pivovarov
Priority: Minor
  Labels: patch
 Attachments: HIVE-7180.3.patch, HIVE-7180.patch, HIVE-7180_001.patch


 Here is related code:
 {code}
   BufferedReader bfReader =
 new BufferedReader(new FileReader(upgradeListFile));
   String currSchemaVersion;
   while ((currSchemaVersion = bfReader.readLine()) != null) {
 upgradeOrderList.add(currSchemaVersion.trim());
 {code}
 BufferedReader / FileReader should be closed upon return from ctor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7233) File hive-hwi-0.13.1 not found on lib folder

2015-06-26 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603952#comment-14603952
 ] 

Alexander Pivovarov commented on HIVE-7233:
---

Lots of modern web project use embedded webservers like Tomcat or Jetty.
They are not supposed to be deployed to web container and this is why they 
should not be packed as WAR.

Instead of it the webapp and embedded server are started as regular java 
program using HWIServer.main() (which calls HWIServer.start())

 File hive-hwi-0.13.1 not found on lib folder
 

 Key: HIVE-7233
 URL: https://issues.apache.org/jira/browse/HIVE-7233
 Project: Hive
  Issue Type: New Feature
  Components: Web UI
Affects Versions: 0.13.1
Reporter: Dinh Hoang Luong
Assignee: Thomas Friedrich
 Attachments: HIVE-7233.1.patch


 I found that: 
 line 27 of file 
 .../apache-hive-0.13.1-sr/hwi/pom.xml with 
 packagejar/package instead of packagewar/package
 sorry my english is bad. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7598) Potential null pointer dereference in MergeTask#closeJob()

2015-06-26 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603959#comment-14603959
 ] 

Alexander Pivovarov commented on HIVE-7598:
---

isCompressed and tableInfo can not be null because they are used in the code 
later

 Potential null pointer dereference in MergeTask#closeJob()
 --

 Key: HIVE-7598
 URL: https://issues.apache.org/jira/browse/HIVE-7598
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: SUYEON LEE
Priority: Minor
 Attachments: HIVE-7598.patch


 Call to Utilities.mvFileToFinalPath() passes null as second last parameter, 
 conf.
 null gets passed to createEmptyBuckets() which dereferences conf directly:
 {code}
 boolean isCompressed = conf.getCompressed();
 TableDesc tableInfo = conf.getTableInfo();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11116) Can not select data from table which points to remote hdfs location

2015-06-25 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-6:
---
Description: 
I tried to create new table which points to remote hdfs location and select 
data from it.

It works for hive-0.14 and hive-1.0  but it does not work starting from hive-1.1

to reproduce the issue
1. create folder on remote hdfs
{code}
hadoop fs -mkdir -p hdfs://remote-nn/tmp/et1
{code}

2. create table 
{code}
CREATE TABLE et1 (
  a string
) stored as textfile
LOCATION 'hdfs://remote-nn/tmp/et1';
{code}

3. run select
{code}
select * from et1 limit 10;
{code}

4. Should get the following error
{code}
select * from et1;
15/06/25 13:43:44 [main]: ERROR parse.CalcitePlanner: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to determine if 
hdfs://remote-nn/tmp/et1is encrypted: java.lang.IllegalArgumentException: Wrong 
FS: hdfs://remote-nn/tmp/et1, expected: hdfs://localhost:8020
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1763)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1875)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1689)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1427)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10132)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10147)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:190)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.IllegalArgumentException: Wrong FS: 
hdfs://remote-nn/tmp/et1, expected: hdfs://localhost:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1906)
at 
org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
at 
org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1097)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1759)
... 25 more

FAILED: SemanticException Unable to determine if hdfs://remote-nn/tmp/et1is 
encrypted: java.lang.IllegalArgumentException: Wrong FS: 
hdfs://remote-nn/tmp/et1, expected: hdfs://localhost:8020
15/06/25 13:43:44 [main]: ERROR ql.Driver: FAILED: SemanticException Unable to 
determine if hdfs://remote-nn/tmp/et1is encrypted: 
java.lang.IllegalArgumentException: Wrong FS: hdfs://remote-nn/tmp/et1, 
expected: hdfs://localhost:8020
org.apache.hadoop.hive.ql.parse.SemanticException: Unable to determine if 
hdfs://remote-nn/tmp/et1is encrypted: java.lang.IllegalArgumentException: Wrong 
FS: hdfs://remote-nn/tmp/et1, expected: hdfs://localhost:8020
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1743)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1427)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10132)
at 

[jira] [Updated] (HIVE-11103) Add banker's rounding BROUND UDF

2015-06-25 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-11103:
---
Attachment: HIVE-11103.1.patch

patch #1

 Add banker's rounding BROUND UDF
 

 Key: HIVE-11103
 URL: https://issues.apache.org/jira/browse/HIVE-11103
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-11103.1.patch


 Banker's rounding: the value is rounded to the nearest even number. Also 
 known as Gaussian rounding, and, in German, mathematische Rundung.
 Example
 {code}
   2 digits2 digits
 UnroundedStandard roundingGaussian rounding
   54.1754  54.18  54.18
  343.2050 343.21 343.20
 +106.2038+106.20+106.20 
 =======
  503.5842 503.59 503.58
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo

2015-06-24 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600195#comment-14600195
 ] 

Alexander Pivovarov commented on HIVE-9557:
---

rename clientnegative/udf_cosine_similarity.q to
clientnegative/udf_cosine_similarity_error_1.q
then
{code}
# build hive
mvn clean install -Phadoop-2,dist -DskipTests
# build itest
cd itest
mvn clean install -Phadoop-2 -DskipTests
# build qtest
cd qtest
mvn clean install -Phadoop-2 -DskipTests
# run q test. it will overwrite q.out file
mvn test -Dtest=TestCliDriver -Dqfile=udf_cosine_similarity.q,show_functions.q 
-Dtest.output.overwrite=true -Phadoop-2
# run negative q file test
mvn test -Dtest=TestNegativeCliDriver -Dqfile=udf_cosine_similarity_error_1.q 
-Dtest.output.overwrite=true -Phadoop-2
{code}

 create UDF to measure strings similarity using Cosine Similarity algo
 -

 Key: HIVE-9557
 URL: https://issues.apache.org/jira/browse/HIVE-9557
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Nishant Kelkar
  Labels: CosineSimilarity, SimilarityMetric, UDF
 Attachments: udf_cosine_similarity-v01.patch


 algo description http://en.wikipedia.org/wiki/Cosine_similarity
 {code}
 --one word different, total 2 words
 str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f
 {code}
 reference implementation:
 https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo

2015-06-24 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599737#comment-14599737
 ] 

Alexander Pivovarov commented on HIVE-9557:
---

Hi Nishant,

Thank you for the patch. Can you look at the following recommendations/issues
- usually patch name should look like HIVE-9557.1.patch
- can you attach RB link to the Jira? So, we can leave comments for particular 
code lines
- you have to provide integration tests for the function (q file and q.out file)
- the function should be registered
- probably you can look at HIVE-9556 as an example
- hive code uses 2 spaces for indent
- a*b should be separated by space a * b

 create UDF to measure strings similarity using Cosine Similarity algo
 -

 Key: HIVE-9557
 URL: https://issues.apache.org/jira/browse/HIVE-9557
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Nishant Kelkar
  Labels: CosineSimilarity, SimilarityMetric, UDF
 Attachments: udf_cosine_similarity-v01.patch


 algo description http://en.wikipedia.org/wiki/Cosine_similarity
 {code}
 --one word different, total 2 words
 str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f
 {code}
 reference implementation:
 https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-17 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590278#comment-14590278
 ] 

Alexander Pivovarov commented on HIVE-10841:


1.0.1 was released on May 20. How it can be fixed in 1.0.1?  
http://apache.mirrors.lucidnetworks.net/hive/hive-1.0.1/

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Fix For: 1.3.0, 1.2.1, 2.0.0, 1.0.2

 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, 
 HIVE-10841.2.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-17 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590280#comment-14590280
 ] 

Alexander Pivovarov commented on HIVE-10841:


Yes, 1.0.2 is better

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Fix For: 1.3.0, 1.2.1, 2.0.0, 1.0.2

 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, 
 HIVE-10841.2.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   

[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-16 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10841:
---
Fix Version/s: 2.0.0
   1.3.0

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Fix For: 1.3.0, 1.2.1, 2.0.0

 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, 
 HIVE-10841.2.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic 

[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-16 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10841:
---
Affects Version/s: 2.0.0

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Fix For: 1.3.0, 1.2.1, 2.0.0

 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, 
 HIVE-10841.2.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-16 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589081#comment-14589081
 ] 

Alexander Pivovarov commented on HIVE-10841:


Currently the patch is committed to

https://github.com/apache/hive/commits/branch-1
https://github.com/apache/hive/commits/branch-1.0
https://github.com/apache/hive/commits/branch-1.2
https://github.com/apache/hive/commits/master

I updated Fix Version/s field accordingly

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Fix For: 1.3.0, 1.2.1, 2.0.0

 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, 
 HIVE-10841.2.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
 

[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-16 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10841:
---
Affects Version/s: (was: 2.0.0)

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Fix For: 1.3.0, 1.2.1, 2.0.0

 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, 
 HIVE-10841.2.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: 

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-12 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14583988#comment-14583988
 ] 

Alexander Pivovarov commented on HIVE-10841:


Laljo, could you cherry-pick the commit to branch-1  (hive-1.3.0)?
https://github.com/apache/hive/commits/branch-1

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Fix For: 1.2.1

 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, 
 HIVE-10841.2.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator

[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0

2015-06-11 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582813#comment-14582813
 ] 

Alexander Pivovarov commented on HIVE-10704:


+1

 Errors in Tez HashTableLoader when estimated table size is 0
 

 Key: HIVE-10704
 URL: https://issues.apache.org/jira/browse/HIVE-10704
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Jason Dere
Assignee: Mostafa Mokhtar
 Fix For: 1.2.1

 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, 
 HIVE-10704.3.patch


 Couple of issues:
 - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all 
 tables, the largest small table selection is wrong and could select the large 
 table (which results in NPE)
 - The memory estimates can either divide-by-zero, or allocate 0 memory if the 
 table size is 0. Try to come up with a sensible default for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-10 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581444#comment-14581444
 ] 

Alexander Pivovarov commented on HIVE-10841:


5. Can you add RB link?

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-10841.1.patch, HIVE-10841.2.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: 

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-10 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581443#comment-14581443
 ] 

Alexander Pivovarov commented on HIVE-10841:


3. New integration test which reproduces the issue is also needed
4. Is it possible to add JUnit tests for the fixes?

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-10841.1.patch, HIVE-10841.2.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)

[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0

2015-06-10 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581360#comment-14581360
 ] 

Alexander Pivovarov commented on HIVE-10704:


I put my comment on RB on May 30, 2015

 Errors in Tez HashTableLoader when estimated table size is 0
 

 Key: HIVE-10704
 URL: https://issues.apache.org/jira/browse/HIVE-10704
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Jason Dere
Assignee: Mostafa Mokhtar
 Fix For: 1.2.1

 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, 
 HIVE-10704.3.patch


 Couple of issues:
 - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all 
 tables, the largest small table selection is wrong and could select the large 
 table (which results in NPE)
 - The memory estimates can either divide-by-zero, or allocate 0 memory if the 
 table size is 0. Try to come up with a sensible default for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0

2015-06-10 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581361#comment-14581361
 ] 

Alexander Pivovarov commented on HIVE-10704:


I put my comment on RB on May 30, 2015

 Errors in Tez HashTableLoader when estimated table size is 0
 

 Key: HIVE-10704
 URL: https://issues.apache.org/jira/browse/HIVE-10704
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Jason Dere
Assignee: Mostafa Mokhtar
 Fix For: 1.2.1

 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, 
 HIVE-10704.3.patch


 Couple of issues:
 - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all 
 tables, the largest small table selection is wrong and could select the large 
 table (which results in NPE)
 - The memory estimates can either divide-by-zero, or allocate 0 memory if the 
 table size is 0. Try to come up with a sensible default for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-09 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579530#comment-14579530
 ] 

Alexander Pivovarov commented on HIVE-10841:


Sure, np. Btw, my apache id is apivovarov

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-10841.1.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
 

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-08 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578345#comment-14578345
 ] 

Alexander Pivovarov commented on HIVE-10841:


[~jpullokkaran], thank you for HIVE-10841.1.patch.
It changes 2 files:
- SemanticAnalyzer.java
- OpProcFactory.java

I tried the fix in SemanticAnalyzer.java only. It solve the issue with my test 
query.
So, looks like the fix in SemanticAnalyzer.java is enough to solve the issue.

Why do we need fixes in OpProcFactory.java? Should we open separate Jira for 
them?

2. Looks like we need to rerun bunch of Cli, Tez and Spark tests...

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-10841.1.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
 

[jira] [Updated] (HIVE-10748) Replace StringBuffer with StringBuilder where possible

2015-06-04 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10748:
---
Fix Version/s: 2.0.0

 Replace StringBuffer with StringBuilder where possible
 --

 Key: HIVE-10748
 URL: https://issues.apache.org/jira/browse/HIVE-10748
 Project: Hive
  Issue Type: Improvement
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-10748.1.patch, HIVE-10748.1.patch, 
 HIVE-10748.2.patch


 I found 40 places in Hive where new StringBuffer( is used.
 Where possible, it is recommended that StringBuilder be used in preference 
 to StringBuffer as it will be faster under most implementations
 https://docs.oracle.com/javase/7/docs/api/java/lang/StringBuilder.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10427) collect_list() and collect_set() should accept struct types as argument

2015-06-04 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573363#comment-14573363
 ] 

Alexander Pivovarov commented on HIVE-10427:


+1

 collect_list() and collect_set() should accept struct types as argument
 ---

 Key: HIVE-10427
 URL: https://issues.apache.org/jira/browse/HIVE-10427
 Project: Hive
  Issue Type: Wish
  Components: UDF
Reporter: Alexander Behm
Assignee: Chao Sun
 Attachments: HIVE-10427.1.patch, HIVE-10427.2.patch, 
 HIVE-10427.3.patch, HIVE-10427.4.patch


 The collect_list() and collect_set() functions currently only accept scalar 
 argument types. It would be very useful if these functions could also accept 
 struct argument types for creating nested data from flat data.
 For example, suppose I wanted to create a nested customers/orders table from 
 two flat tables, customers and orders. Then it'd be very convenient to write 
 something like this:
 {code}
 insert into table nested_customers_orders
 select c.*, collect_list(named_struct(oid, o.oid, order_date: o.date...))
 from customers c inner join orders o on (c.cid = o.oid)
 group by c.cid
 {code}
 Thanks you for your consideration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-04 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573342#comment-14573342
 ] 

Alexander Pivovarov commented on HIVE-10841:


Better to put code/sql/plan to \{code\}...\{code\} blocks. It will be easier to 
read

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-03 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571849#comment-14571849
 ] 

Alexander Pivovarov commented on HIVE-10841:


LOG info for the queries with different JOIN operators order

L, LA, FR, A, PI, acct - only 2 log messages from ppd.OpProcFactory contain = 
120
{code}
explain
select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid
WHERE
  L.id = 4436
  and acct.brn = 120;

15/06/03 16:31:47 [main]: INFO ppd.OpProcFactory: Processing for FIL(25)
15/06/03 16:31:47 [main]: INFO ppd.OpProcFactory: Pushdown Predicates of FIL 
For Alias : acct
15/06/03 16:31:47 [main]: INFO ppd.OpProcFactory:   (_col20 = 120)

15/06/03 16:31:47 [main]: INFO ppd.OpProcFactory: Processing for JOIN(24)
15/06/03 16:31:47 [main]: INFO ppd.OpProcFactory: Pushdown Predicates of JOIN 
For Alias : acct
15/06/03 16:31:47 [main]: INFO ppd.OpProcFactory:   (VALUE._col19 = 120)

  Stage: Stage-9
Map Reduce
Select Operator
  expressions: _col19 (type: int), 120 (type: int)
{code}

L, LA, FR, A, acct, PI - 8 log lines from  ppd.OpProcFactory contain = 120

{code}
explain
select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN acct ON A.id = acct.aid
JOIN PI ON PI.id = LA.pi_id
WHERE
  L.id = 4436
  and acct.brn = 120;

15/06/03 15:45:25 [main]: INFO ppd.OpProcFactory: Processing for FIL(25)
15/06/03 15:45:39 [main]: INFO ppd.OpProcFactory: Pushdown Predicates of FIL 
For Alias : acct
15/06/03 15:45:39 [main]: INFO ppd.OpProcFactory:   (_col20 = 120)

15/06/03 15:46:23 [main]: INFO ppd.OpProcFactory: Processing for JOIN(24)
15/06/03 15:46:23 [main]: INFO ppd.OpProcFactory: Pushdown Predicates of JOIN 
For Alias : acct
15/06/03 15:46:23 [main]: INFO ppd.OpProcFactory:   (VALUE._col19 = 120)

15/06/03 15:46:26 [main]: INFO ppd.OpProcFactory: Processing for RS(21)
15/06/03 15:46:26 [main]: INFO ppd.OpProcFactory: Pushdown Predicates of RS For 
Alias : acct
15/06/03 15:46:26 [main]: INFO ppd.OpProcFactory:   (_col20 = 120)

15/06/03 15:46:43 [main]: INFO ppd.OpProcFactory: Processing for FIL(20)
15/06/03 15:46:49 [main]: INFO ppd.OpProcFactory: Pushdown Predicates of FIL 
For Alias : acct
15/06/03 15:46:49 [main]: INFO ppd.OpProcFactory:   (_col20 = 120)

15/06/03 15:46:52 [main]: INFO ppd.OpProcFactory: Processing for JOIN(19)
15/06/03 15:46:52 [main]: INFO ppd.OpProcFactory: Pushdown Predicates of JOIN 
For Alias : acct
15/06/03 15:46:52 [main]: INFO ppd.OpProcFactory:   (VALUE._col1 = 120)

15/06/03 15:59:18 [main]: INFO ppd.OpProcFactory: Processing for RS(18)
15/06/03 15:59:18 [main]: INFO ppd.OpProcFactory: Pushdown Predicates of RS For 
Alias : acct
15/06/03 15:59:18 [main]: INFO ppd.OpProcFactory:   (brn = 120)

15/06/03 15:59:19 [main]: INFO ppd.OpProcFactory: Processing for FIL(17)
15/06/03 15:59:50 [main]: INFO ppd.OpProcFactory: Pushdown Predicates of FIL 
For Alias : acct
15/06/03 15:59:50 [main]: INFO ppd.OpProcFactory:   (brn = 120)

15/06/03 16:00:20 [main]: INFO ppd.OpProcFactory: Processing for TS(4)
15/06/03 16:00:20 [main]: INFO ppd.OpProcFactory: Pushdown Predicates of TS For 
Alias : acct
15/06/03 16:00:20 [main]: INFO ppd.OpProcFactory:   aid is not null
15/06/03 16:00:20 [main]: INFO ppd.OpProcFactory:   (brn = 120)

15/06/03 16:01:38 [main]: INFO optimizer.ConstantPropagateProcFactory: expr 
Const int 120 fold from Column[VALUE._col19] is removed.

15/06/03 16:01:38 [main]: INFO optimizer.ColumnPrunerProcFactory: RS 21 
oldColExprMap: {VALUE._col5=Column[_col5], VALUE._col4=Const int 4436, 
VALUE._col3=Column[_col3], VALUE._col2=Column[_col2], 
VALUE._col1=Column[_col1], VALUE._col0=Const int 4436, 
KEY.reducesinkkey0=Column[_col6], VALUE._col14=Column[_col15], 
VALUE._col13=Column[_col14], VALUE._col16=Column[_col17], 
VALUE._col15=Column[_col16], VALUE._col18=Column[_col19], VALUE._col9=Const int 
4436, VALUE._col17=Column[_col18], VALUE._col8=Column[_col9], 
VALUE._col7=Column[_col8], VALUE._col19=Const int 120, 
VALUE._col6=Column[_col7], VALUE._col20=Column[_col21], 
VALUE._col11=Column[_col12], VALUE._col21=Column[_col22], 
VALUE._col12=Column[_col13], VALUE._col22=Column[_col23], 
VALUE._col10=Column[_col11]}

15/06/03 16:01:38 [main]: INFO optimizer.ColumnPrunerProcFactory: RS 18 
oldColExprMap: {VALUE._col4=Column[ROW__ID], 
VALUE._col3=Column[INPUT__FILE__NAME], 
VALUE._col2=Column[BLOCK__OFFSET__INSIDE__FILE], VALUE._col1=Const int 120, 
VALUE._col0=Column[acc_n], KEY.reducesinkkey0=Column[aid]}

STAGE PLANS:
  Stage: Stage-12
acct 
  TableScan
alias: acct
Statistics: Num rows: 5 Data size: 63 Basic stats: COMPLETE Column 
stats: NONE
Filter 

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-03 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570374#comment-14570374
 ] 

Alexander Pivovarov commented on HIVE-10841:


Thank you Laljo John for looking at the issue. I tried the patch #1. Yes, it 
makes output to contain just 1 row. Probably the filter is added to Reduce step.
BUT, IMHO the patch #1 does not fix the root cause of the problem. I still do 
not see Filter Operator predicate (brn is not null) for acct table on Map step.
{code}
select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid
WHERE
  L.id = 4436
  and acct.brn is not null;
{code}
{code}
acct 
  TableScan
alias: acct
Statistics: Num rows: 5 Data size: 63 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator
  predicate: aid is not null (type: boolean)
  Statistics: Num rows: 3 Data size: 37 Basic stats: COMPLETE 
Column stats: NONE
  HashTable Sink Operator
keys:
  0 _col5 (type: int)
  1 id (type: int)
  2 aid (type: int)
{code}

Hovewer, if you change JOIN order and put acct before PI then the plan will 
be correct - Map Local Operator Tree for acct table will have Filter Operator 
predicate (brn is not null)
{code}
select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN acct ON A.id = acct.aid
JOIN PI ON PI.id = LA.pi_id
WHERE
  L.id = 4436
  and acct.brn is not null;
{code}
{code}
  TableScan
alias: acct
Statistics: Num rows: 5 Data size: 63 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator
  predicate: (aid is not null and brn is not null) (type: boolean)
  Statistics: Num rows: 2 Data size: 25 Basic stats: COMPLETE 
Column stats: NONE
  HashTable Sink Operator
keys:
  0 _col5 (type: int)
  1 id (type: int)
  2 aid (type: int)
{code}

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 

[jira] [Updated] (HIVE-10748) Replace StringBuffer with StringBuilder where possible

2015-06-01 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10748:
---
Attachment: HIVE-10748.2.patch

patch #2:
- rebased to the latest

 Replace StringBuffer with StringBuilder where possible
 --

 Key: HIVE-10748
 URL: https://issues.apache.org/jira/browse/HIVE-10748
 Project: Hive
  Issue Type: Improvement
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: HIVE-10748.1.patch, HIVE-10748.1.patch, 
 HIVE-10748.2.patch


 I found 40 places in Hive where new StringBuffer( is used.
 Where possible, it is recommended that StringBuilder be used in preference 
 to StringBuffer as it will be faster under most implementations
 https://docs.oracle.com/javase/7/docs/api/java/lang/StringBuilder.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-01 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568044#comment-14568044
 ] 

Alexander Pivovarov commented on HIVE-10841:


Changing the order of JOIN operators fixes the plan.
Filter Operator predicates are added to Map step for acct table.

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov

 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: 

[jira] [Commented] (HIVE-10748) Replace StringBuffer with StringBuilder where possible

2015-06-01 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568290#comment-14568290
 ] 

Alexander Pivovarov commented on HIVE-10748:


Build 4130 has 3 failed tests. I think errors are unrelated to the patch #2.
I checked prev build 4129 - it also has the same 3 failed tests.
So, everything looks good!

 Replace StringBuffer with StringBuilder where possible
 --

 Key: HIVE-10748
 URL: https://issues.apache.org/jira/browse/HIVE-10748
 Project: Hive
  Issue Type: Improvement
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: HIVE-10748.1.patch, HIVE-10748.1.patch, 
 HIVE-10748.2.patch


 I found 40 places in Hive where new StringBuffer( is used.
 Where possible, it is recommended that StringBuilder be used in preference 
 to StringBuffer as it will be faster under most implementations
 https://docs.oracle.com/javase/7/docs/api/java/lang/StringBuilder.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-31 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov reassigned HIVE-10841:
--

Assignee: Alexander Pivovarov

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov

 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col6 (type: int)
   1 id 

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-30 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566239#comment-14566239
 ] 

Alexander Pivovarov commented on HIVE-10841:


disabling hive.ppd.remove.duplicatefilters helps
{code}
set hive.ppd.remove.duplicatefilters=false;
{code}

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0
Reporter: Alexander Pivovarov

 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 

[jira] [Commented] (HIVE-686) add UDF substring_index

2015-05-30 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566375#comment-14566375
 ] 

Alexander Pivovarov commented on HIVE-686:
--

Added the function description to the Language Manual 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions

 add UDF substring_index
 ---

 Key: HIVE-686
 URL: https://issues.apache.org/jira/browse/HIVE-686
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Namit Jain
Assignee: Alexander Pivovarov
 Fix For: 1.3.0

 Attachments: HIVE-686.1.patch, HIVE-686.1.patch, HIVE-686.patch, 
 HIVE-686.patch


 SUBSTRING_INDEX(str,delim,count)
 Returns the substring from string str before count occurrences of the 
 delimiter delim. If count is positive, everything to the left of the final 
 delimiter (counting from the left) is returned. If count is negative, 
 everything to the right of the final delimiter (counting from the right) is 
 returned. SUBSTRING_INDEX() performs a case-sensitive match when searching 
 for delim.
 Examples:
 {code}
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', 3);
 --www.mysql.com
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2);
 --www.mysql
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', 1);
 --www
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', 0);
 --''
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', -1);
 --com
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2);
 --mysql.com
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', -3);
 --www.mysql.com
 {code}
 {code}
 --#delim does not exist in str
 SELECT SUBSTRING_INDEX('www.mysql.com', 'Q', 1);
 --www.mysql.com
 --#delim is 2 chars
 SELECT SUBSTRING_INDEX('www||mysql||com', '||', 2);
 --www||mysql
 --#delim is empty string
 SELECT SUBSTRING_INDEX('www.mysql.com', '', 2);
 --''
 --#str is empty string
 SELECT SUBSTRING_INDEX('', '.', 2);
 --''
 {code}
 {code}
 --#null params
 SELECT SUBSTRING_INDEX(null, '.', 1);
 --null
 SELECT SUBSTRING_INDEX('www.mysql.com', null, 1);
 --null
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', null);
 --null
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-29 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564297#comment-14564297
 ] 

Alexander Pivovarov commented on HIVE-10841:


if I set hive.optimize.ppd=false then query returns 1 row BUT the plan does not 
have brn is not null

if I  set hive.optimize.ppd=true and change JOIN statements order to 
(A,acct,PI) then query returns 1 row AND the plan HAS brn is not null.
{code}
set hive.optimize.ppd=true;

select
  acct.ACC_N,
  acct.brn
FROM LA
JOIN A ON LA.aid = A.id
JOIN acct ON LA.aid = acct.aid
JOIN PI ON PI.id = LA.pi_id
WHERE
  LA.loan_id = 4436
  and acct.brn is not null;
OK
10  122
{code}




 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0
Reporter: Alexander Pivovarov

 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-29 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564273#comment-14564273
 ] 

Alexander Pivovarov commented on HIVE-10841:


L tables also can be removed from the query - it does not affect the result.

But I found that order of JOIN statements matters.
I tries the following combinations of the JOIN statements
(PI,acct,A) (PI,A,acct) (A,PI,acct) (acct,PI,A) (A,act,PI) (act,A,PI)

3 rows are returned only for (A,PI,acct) combination
FROM LA
JOIN A
JOIN PI
JOIN acct

{code}
select
  acct.ACC_N,
  acct.brn
FROM LA
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON LA.aid = acct.aid
WHERE
  LA.loan_id = 4436
  and acct.brn is not null;
OK
10  122
NULLNULL
NULLNULL
{code}

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0
Reporter: Alexander Pivovarov

 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 

[jira] [Updated] (HIVE-686) add UDF substring_index

2015-05-29 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-686:
-
Attachment: HIVE-686.1.patch

reattaching path #1 to rerun tests

 add UDF substring_index
 ---

 Key: HIVE-686
 URL: https://issues.apache.org/jira/browse/HIVE-686
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Namit Jain
Assignee: Alexander Pivovarov
 Attachments: HIVE-686.1.patch, HIVE-686.1.patch, HIVE-686.patch, 
 HIVE-686.patch


 SUBSTRING_INDEX(str,delim,count)
 Returns the substring from string str before count occurrences of the 
 delimiter delim. If count is positive, everything to the left of the final 
 delimiter (counting from the left) is returned. If count is negative, 
 everything to the right of the final delimiter (counting from the right) is 
 returned. SUBSTRING_INDEX() performs a case-sensitive match when searching 
 for delim.
 Examples:
 {code}
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', 3);
 --www.mysql.com
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2);
 --www.mysql
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', 1);
 --www
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', 0);
 --''
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', -1);
 --com
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2);
 --mysql.com
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', -3);
 --www.mysql.com
 {code}
 {code}
 --#delim does not exist in str
 SELECT SUBSTRING_INDEX('www.mysql.com', 'Q', 1);
 --www.mysql.com
 --#delim is 2 chars
 SELECT SUBSTRING_INDEX('www||mysql||com', '||', 2);
 --www||mysql
 --#delim is empty string
 SELECT SUBSTRING_INDEX('www.mysql.com', '', 2);
 --''
 --#str is empty string
 SELECT SUBSTRING_INDEX('', '.', 2);
 --''
 {code}
 {code}
 --#null params
 SELECT SUBSTRING_INDEX(null, '.', 1);
 --null
 SELECT SUBSTRING_INDEX('www.mysql.com', null, 1);
 --null
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', null);
 --null
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-29 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565502#comment-14565502
 ] 

Alexander Pivovarov commented on HIVE-10841:


I tested one of my queries with set hive.optimize.ppd=false;
Disabling ppd kills the performance. 9min vs 60 min.

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0
Reporter: Alexander Pivovarov

 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
  

[jira] [Commented] (HIVE-10788) Change sort_array to support non-primitive types

2015-05-29 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565780#comment-14565780
 ] 

Alexander Pivovarov commented on HIVE-10788:


+1

 Change sort_array to support non-primitive types
 

 Key: HIVE-10788
 URL: https://issues.apache.org/jira/browse/HIVE-10788
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Chao Sun
Assignee: Chao Sun
 Attachments: HIVE-10788.1.patch, HIVE-10788.2.patch, 
 HIVE-10788.3.patch


 Currently {{sort_array}} only support primitive types. As we already support 
 comparison between non-primitive types, it makes sense to remove this 
 restriction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10845) TezJobMonitor uses killedTaskCount instead of killedTaskAttemptCount

2015-05-28 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563339#comment-14563339
 ] 

Alexander Pivovarov commented on HIVE-10845:


Siddharth, do you want to change variable name killedTasks to 
killedTasksAttempts, to make it similar to failedTaskAttempts (1 lines above)

 TezJobMonitor uses killedTaskCount instead of killedTaskAttemptCount
 

 Key: HIVE-10845
 URL: https://issues.apache.org/jira/browse/HIVE-10845
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: 1.2.1

 Attachments: HIVE-10845.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10846) LLAP: preemption in AM due to failures / out of order scheduling

2015-05-28 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563345#comment-14563345
 ] 

Alexander Pivovarov commented on HIVE-10846:


what branch?

 LLAP: preemption in AM due to failures / out of order scheduling
 

 Key: HIVE-10846
 URL: https://issues.apache.org/jira/browse/HIVE-10846
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: llap

 Attachments: HIVE-10846.1.txt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-28 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10841:
---
Description: 
The result from the following SELECT query is 3 rows but it should be 1 row.
I checked it in MySQL - it returned 1 row.

To reproduce the issue in Hive
1. prepare tables
{code}
drop table if exists L;
drop table if exists LA;
drop table if exists FR;
drop table if exists A;
drop table if exists PI;
drop table if exists acct;

create table L as select 4436 id;
create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
create table FR as select 4436 loan_id;
create table A as select 4748 id;
create table PI as select 4415 id;

create table acct as select 4748 aid, 10 acc_n, 122 brn;
insert into table acct values(4748, null, null);
insert into table acct values(4748, null, null);
{code}

2. run SELECT query
{code}
select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid
WHERE
  L.id = 4436
  and acct.brn is not null;
{code}

the result is 3 rows
{code}
10  122
NULLNULL
NULLNULL
{code}

but it should be 1 row

{code}
10  122
{code}

2.1 explain select ... output for hive-1.3.0 MR
{code}
STAGE DEPENDENCIES:
  Stage-12 is a root stage
  Stage-9 depends on stages: Stage-12
  Stage-0 depends on stages: Stage-9

STAGE PLANS:
  Stage: Stage-12
Map Reduce Local Work
  Alias - Map Local Tables:
a 
  Fetch Operator
limit: -1
acct 
  Fetch Operator
limit: -1
fr 
  Fetch Operator
limit: -1
l 
  Fetch Operator
limit: -1
pi 
  Fetch Operator
limit: -1
  Alias - Map Local Operator Tree:
a 
  TableScan
alias: a
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator
  predicate: id is not null (type: boolean)
  Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
stats: NONE
  HashTable Sink Operator
keys:
  0 _col5 (type: int)
  1 id (type: int)
  2 aid (type: int)
acct 
  TableScan
alias: acct
Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator
  predicate: aid is not null (type: boolean)
  Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
Column stats: NONE
  HashTable Sink Operator
keys:
  0 _col5 (type: int)
  1 id (type: int)
  2 aid (type: int)
fr 
  TableScan
alias: fr
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator
  predicate: (loan_id = 4436) (type: boolean)
  Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
stats: NONE
  HashTable Sink Operator
keys:
  0 4436 (type: int)
  1 4436 (type: int)
  2 4436 (type: int)
l 
  TableScan
alias: l
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator
  predicate: (id = 4436) (type: boolean)
  Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
stats: NONE
  HashTable Sink Operator
keys:
  0 4436 (type: int)
  1 4436 (type: int)
  2 4436 (type: int)
pi 
  TableScan
alias: pi
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator
  predicate: id is not null (type: boolean)
  Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
stats: NONE
  HashTable Sink Operator
keys:
  0 _col6 (type: int)
  1 id (type: int)

  Stage: Stage-9
Map Reduce
  Map Operator Tree:
  TableScan
alias: la
Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator
  predicate: (((loan_id is not null and aid is not null) and pi_id 
is not null) and (loan_id = 4436)) (type: boolean)
  Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE 
Column stats: NONE
  Map Join Operator
condition map:
 Inner Join 0 to 1
 Inner Join 0 to 2
keys:
  0 4436 (type: int)
  

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-28 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564229#comment-14564229
 ] 

Alexander Pivovarov commented on HIVE-10841:


if we change is not null to is null or to  = 122 then all 3 rows will be 
NULL or 122 for column brn (second column).

and acct.brn is null;
{code}
10  NULL
NULLNULL
NULLNULL
{code}

and acct.brn = 122;
{code}
10  122
NULL122
NULL122
{code}

and acct.brn is null;
{code}
10  122
NULLNULL
NULLNULL
{code}

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0
Reporter: Alexander Pivovarov

 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: 

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-28 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564213#comment-14564213
 ] 

Alexander Pivovarov commented on HIVE-10841:


if we look at hive-0.12 plan then we can see that it has brn is not null 
predicate in Filter Operator
{code}
acct 
  TableScan
alias: acct
Filter Operator
  predicate:
  expr: brn is not null
  type: boolean
{code}

But in hive-1.3.0 plan I do not see brn at all. It only has predicate: aid 
is not null for acct table.
Does it mean that hive-1.3.0 plan is wrong?

I checked ppd folder diff btw 0.12.0 and 0.13.0
It was two fixes
HIVE-4293 : Predicates following UDTF operator are removed by PPD 
HIVE-5411 : Migrate expression serialization to Kryo



 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0
Reporter: Alexander Pivovarov

 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink 

[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-28 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10841:
---
Component/s: Query Planning

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0
Reporter: Alexander Pivovarov

 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col6 (type: int)
   1 id (type: int)
   Stage: Stage-9
 Map Reduce
   

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-28 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564215#comment-14564215
 ] 

Alexander Pivovarov commented on HIVE-10841:


hive-0.12.0 plan
{code}
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_JOIN (TOK_JOIN (TOK_JOIN 
(TOK_TABREF (TOK_TABNAME L)) (TOK_TABREF (TOK_TABNAME LA)) (= (. 
(TOK_TABLE_OR_COL L) id) (. (TOK_TABLE_OR_COL LA) loan_id))) (TOK_TABREF 
(TOK_TABNAME FR)) (= (. (TOK_TABLE_OR_COL L) id) (. (TOK_TABLE_OR_COL FR) 
loan_id))) (TOK_TABREF (TOK_TABNAME A)) (= (. (TOK_TABLE_OR_COL LA) aid) (. 
(TOK_TABLE_OR_COL A) id))) (TOK_TABREF (TOK_TABNAME PI)) (= (. 
(TOK_TABLE_OR_COL PI) id) (. (TOK_TABLE_OR_COL LA) pi_id))) (TOK_TABREF 
(TOK_TABNAME acct)) (= (. (TOK_TABLE_OR_COL A) id) (. (TOK_TABLE_OR_COL acct) 
aid (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT 
(TOK_SELEXPR (. (TOK_TABLE_OR_COL acct) ACC_N)) (TOK_SELEXPR (. 
(TOK_TABLE_OR_COL acct) brn))) (TOK_WHERE (and (= (. (TOK_TABLE_OR_COL L) id) 
4436) (TOK_FUNCTION TOK_ISNOTNULL (. (TOK_TABLE_OR_COL acct) brn))

STAGE DEPENDENCIES:
  Stage-11 is a root stage
  Stage-8 depends on stages: Stage-11
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-11
Map Reduce Local Work
  Alias - Map Local Tables:
a 
  Fetch Operator
limit: -1
acct 
  Fetch Operator
limit: -1
fr 
  Fetch Operator
limit: -1
l 
  Fetch Operator
limit: -1
pi 
  Fetch Operator
limit: -1
  Alias - Map Local Operator Tree:
a 
  TableScan
alias: a
HashTable Sink Operator
  condition expressions:
0 {_col5}
1 
2 {acc_n} {brn}
  handleSkewJoin: false
  keys:
0 [Column[_col4]]
1 [Column[id]]
2 [Column[aid]]
  Position of Big Table: 0
acct 
  TableScan
alias: acct
Filter Operator
  predicate:
  expr: brn is not null
  type: boolean
  HashTable Sink Operator
condition expressions:
  0 {_col5}
  1 
  2 {acc_n} {brn}
handleSkewJoin: false
keys:
  0 [Column[_col4]]
  1 [Column[id]]
  2 [Column[aid]]
Position of Big Table: 0
fr 
  TableScan
alias: fr
Filter Operator
  predicate:
  expr: (loan_id = 4436)
  type: boolean
  HashTable Sink Operator
condition expressions:
  0 
  1 {aid} {pi_id}
  2 
handleSkewJoin: false
keys:
  0 [Column[id]]
  1 [Column[loan_id]]
  2 [Column[loan_id]]
Position of Big Table: 1
l 
  TableScan
alias: l
Filter Operator
  predicate:
  expr: (id = 4436)
  type: boolean
  HashTable Sink Operator
condition expressions:
  0 
  1 {aid} {pi_id}
  2 
handleSkewJoin: false
keys:
  0 [Column[id]]
  1 [Column[loan_id]]
  2 [Column[loan_id]]
Position of Big Table: 1
pi 
  TableScan
alias: pi
HashTable Sink Operator
  condition expressions:
0 {_col15} {_col16}
1 
  handleSkewJoin: false
  keys:
0 [Column[_col2]]
1 [Column[id]]
  Position of Big Table: 0

  Stage: Stage-8
Map Reduce
  Alias - Map Operator Tree:
la 
  TableScan
alias: la
Filter Operator
  predicate:
  expr: (loan_id = 4436)
  type: boolean
  Map Join Operator
condition map:
 Inner Join 0 to 1
 Inner Join 0 to 2
condition expressions:
  0 
  1 {aid} {pi_id}
  2 
handleSkewJoin: false
keys:
  0 [Column[id]]
  1 [Column[loan_id]]
  2 [Column[loan_id]]
outputColumnNames: _col4, _col5
Position of Big Table: 1
Map Join Operator
  condition map:
   Inner Join 0 to 1
   Inner Join 1 to 2
  

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-28 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564243#comment-14564243
 ] 

Alexander Pivovarov commented on HIVE-10841:


I found that JOIN FR can be removed - the result still will be 3 rows
But adding or removing JOIN PI changes Filter Operator predicate for acct 
table

if we remove JOIN PI then acct table Filter Operator predicate has brn is 
not null and query returns 1 row
{code}
acct 
  TableScan
alias: acct
Statistics: Num rows: 5 Data size: 63 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator
  predicate: (aid is not null and brn is not null) (type: boolean)
{code}

How it can be possible that removing JOIN PI changes Filter Operator 
predicate for acct table?

The query below returns 1 row. Query plan has brn is not null predicate in 
Filter Operator for acct table.
But if we remove comment before JOIN PI then query plan will not have brn is 
not null predicate.
{code}
explain select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN A ON LA.aid = A.id
--JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid
WHERE
  L.id = 4436
  and acct.brn is not null;
{code}


 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0
Reporter: Alexander Pivovarov

 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   

[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0

2015-05-27 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561397#comment-14561397
 ] 

Alexander Pivovarov commented on HIVE-10704:


What is the negative side of dividing memory equally?
If we have 3 tables  (1KB, 0B, 2MB) how we should divided the memory? 
(MapInteger, Long tableMemorySizes). Is dividing memory equally the best 
solution?
Maybe we should treat 0B table as 1B table and give it minimum amount of the 
memory.

 Errors in Tez HashTableLoader when estimated table size is 0
 

 Key: HIVE-10704
 URL: https://issues.apache.org/jira/browse/HIVE-10704
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Jason Dere
Assignee: Mostafa Mokhtar
 Fix For: 1.2.1

 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, 
 HIVE-10704.3.patch


 Couple of issues:
 - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all 
 tables, the largest small table selection is wrong and could select the large 
 table (which results in NPE)
 - The memory estimates can either divide-by-zero, or allocate 0 memory if the 
 table size is 0. Try to come up with a sensible default for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-686) add UDF substring_index

2015-05-27 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561282#comment-14561282
 ] 

Alexander Pivovarov commented on HIVE-686:
--

prev builds - 4057, 4058 have 55 failed tests as well

 add UDF substring_index
 ---

 Key: HIVE-686
 URL: https://issues.apache.org/jira/browse/HIVE-686
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Namit Jain
Assignee: Alexander Pivovarov
 Attachments: HIVE-686.1.patch, HIVE-686.patch, HIVE-686.patch


 SUBSTRING_INDEX(str,delim,count)
 Returns the substring from string str before count occurrences of the 
 delimiter delim. If count is positive, everything to the left of the final 
 delimiter (counting from the left) is returned. If count is negative, 
 everything to the right of the final delimiter (counting from the right) is 
 returned. SUBSTRING_INDEX() performs a case-sensitive match when searching 
 for delim.
 Examples:
 {code}
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', 3);
 --www.mysql.com
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2);
 --www.mysql
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', 1);
 --www
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', 0);
 --''
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', -1);
 --com
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2);
 --mysql.com
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', -3);
 --www.mysql.com
 {code}
 {code}
 --#delim does not exist in str
 SELECT SUBSTRING_INDEX('www.mysql.com', 'Q', 1);
 --www.mysql.com
 --#delim is 2 chars
 SELECT SUBSTRING_INDEX('www||mysql||com', '||', 2);
 --www||mysql
 --#delim is empty string
 SELECT SUBSTRING_INDEX('www.mysql.com', '', 2);
 --''
 --#str is empty string
 SELECT SUBSTRING_INDEX('', '.', 2);
 --''
 {code}
 {code}
 --#null params
 SELECT SUBSTRING_INDEX(null, '.', 1);
 --null
 SELECT SUBSTRING_INDEX('www.mysql.com', null, 1);
 --null
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', null);
 --null
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work for large queries

2015-05-27 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10841:
---
Description: 
The result from the following SELECT query is 3 rows but it should be 1 row.
I checked it in MySQL - it returned 1 row.

To reproduce the issue in Hive
1. prepare tables
{code}
drop table if exists L;
drop table if exists LA;
drop table if exists FR;
drop table if exists A;
drop table if exists PI;
drop table if exists acct;

create table L as select 4436 id;
create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
create table FR as select 4436 loan_id;
create table A as select 4748 id;
create table PI as select 4415 id;

create table acct as select 4748 aid, 10 acc_n, 122 brn;
insert into table acct values(4748, null, null);
insert into table acct values(4748, null, null);
{code}

2. run SELECT query
{code}
select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid
WHERE
  L.id = 4436
  and acct.brn is not null;
{code}

the result is 3 rows
{code}
10  122
NULLNULL
NULLNULL
{code}

but it should be 1 row

{code}
10  122
{code}

3. workaround is to put acct.brn is not null to join condition
{code}
select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid and acct.brn is not null
WHERE
  L.id = 4436;
{code}

I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations 
have the issue

  was:
The result from the following SELCT query is 3 rows but it should be 1 row.
I checked it in MySQL - it returned 1 row.

To reproduce the issue in Hive
1. prepare tables
{code}
drop table if exists L;
drop table if exists LA;
drop table if exists FR;
drop table if exists A;
drop table if exists PI;
drop table if exists acct;

create table L as select 4436 id;
create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
create table FR as select 4436 loan_id;
create table A as select 4748 id;
create table PI as select 4415 id;

create table acct as select 4748 aid, 10 acc_n, 122 brn;
insert into table acct values(4748, null, null);
insert into table acct values(4748, null, null);
{code}

2. run SELECT query
{code}
select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid
WHERE
  L.id = 4436
  and acct.brn is not null;
{code}

the result is 3 rows
{code}
10  122
NULLNULL
NULLNULL
{code}

but it should be 1 row

{code}
10  122
{code}

3. workaround is to put acct.brn is not null to join condition
{code}
select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid and acct.brn is not null
WHERE
  L.id = 4436;
{code}


 [WHERE col is not null] does not work for large queries
 ---

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Alexander Pivovarov

 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 3. workaround is to put acct.brn is not null to join condition
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid and acct.brn is not null
 WHERE
   L.id = 4436;
 {code}
 I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations 
 have the issue



--
This message was 

[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work for large queries

2015-05-27 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10841:
---
Description: 
The result from the following SELECT query is 3 rows but it should be 1 row.
I checked it in MySQL - it returned 1 row.

To reproduce the issue in Hive
1. prepare tables
{code}
drop table if exists L;
drop table if exists LA;
drop table if exists FR;
drop table if exists A;
drop table if exists PI;
drop table if exists acct;

create table L as select 4436 id;
create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
create table FR as select 4436 loan_id;
create table A as select 4748 id;
create table PI as select 4415 id;

create table acct as select 4748 aid, 10 acc_n, 122 brn;
insert into table acct values(4748, null, null);
insert into table acct values(4748, null, null);
{code}

2. run SELECT query
{code}
select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid
WHERE
  L.id = 4436
  and acct.brn is not null;
{code}

the result is 3 rows
{code}
10  122
NULLNULL
NULLNULL
{code}

but it should be 1 row

{code}
10  122
{code}

3. The workaround is to put acct.brn is not null to join condition
{code}
select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid and acct.brn is not null
WHERE
  L.id = 4436;
OK
10  122
{code}

I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations 
have the issue

  was:
The result from the following SELECT query is 3 rows but it should be 1 row.
I checked it in MySQL - it returned 1 row.

To reproduce the issue in Hive
1. prepare tables
{code}
drop table if exists L;
drop table if exists LA;
drop table if exists FR;
drop table if exists A;
drop table if exists PI;
drop table if exists acct;

create table L as select 4436 id;
create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
create table FR as select 4436 loan_id;
create table A as select 4748 id;
create table PI as select 4415 id;

create table acct as select 4748 aid, 10 acc_n, 122 brn;
insert into table acct values(4748, null, null);
insert into table acct values(4748, null, null);
{code}

2. run SELECT query
{code}
select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid
WHERE
  L.id = 4436
  and acct.brn is not null;
{code}

the result is 3 rows
{code}
10  122
NULLNULL
NULLNULL
{code}

but it should be 1 row

{code}
10  122
{code}

3. workaround is to put acct.brn is not null to join condition
{code}
select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid and acct.brn is not null
WHERE
  L.id = 4436;
{code}

I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations 
have the issue


 [WHERE col is not null] does not work for large queries
 ---

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Alexander Pivovarov

 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 3. The workaround is to put acct.brn is not null to join condition
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid and acct.brn is not null
 WHERE
   L.id = 4436;
 OK
 10   

[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-27 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10841:
---
Summary: [WHERE col is not null] does not work sometimes for queries with 
many JOIN statements  (was: [WHERE col is not null] does not work for large 
queries)

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Alexander Pivovarov

 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 3. The workaround is to put acct.brn is not null to join condition
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid and acct.brn is not null
 WHERE
   L.id = 4436;
 OK
 10122
 Time taken: 23.479 seconds, Fetched: 1 row(s)
 {code}
 I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations 
 have the issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work for large queries

2015-05-27 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10841:
---
Description: 
The result from the following SELECT query is 3 rows but it should be 1 row.
I checked it in MySQL - it returned 1 row.

To reproduce the issue in Hive
1. prepare tables
{code}
drop table if exists L;
drop table if exists LA;
drop table if exists FR;
drop table if exists A;
drop table if exists PI;
drop table if exists acct;

create table L as select 4436 id;
create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
create table FR as select 4436 loan_id;
create table A as select 4748 id;
create table PI as select 4415 id;

create table acct as select 4748 aid, 10 acc_n, 122 brn;
insert into table acct values(4748, null, null);
insert into table acct values(4748, null, null);
{code}

2. run SELECT query
{code}
select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid
WHERE
  L.id = 4436
  and acct.brn is not null;
{code}

the result is 3 rows
{code}
10  122
NULLNULL
NULLNULL
{code}

but it should be 1 row

{code}
10  122
{code}

3. The workaround is to put acct.brn is not null to join condition
{code}
select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid and acct.brn is not null
WHERE
  L.id = 4436;

OK
10  122
Time taken: 23.479 seconds, Fetched: 1 row(s)
{code}

I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations 
have the issue

  was:
The result from the following SELECT query is 3 rows but it should be 1 row.
I checked it in MySQL - it returned 1 row.

To reproduce the issue in Hive
1. prepare tables
{code}
drop table if exists L;
drop table if exists LA;
drop table if exists FR;
drop table if exists A;
drop table if exists PI;
drop table if exists acct;

create table L as select 4436 id;
create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
create table FR as select 4436 loan_id;
create table A as select 4748 id;
create table PI as select 4415 id;

create table acct as select 4748 aid, 10 acc_n, 122 brn;
insert into table acct values(4748, null, null);
insert into table acct values(4748, null, null);
{code}

2. run SELECT query
{code}
select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid
WHERE
  L.id = 4436
  and acct.brn is not null;
{code}

the result is 3 rows
{code}
10  122
NULLNULL
NULLNULL
{code}

but it should be 1 row

{code}
10  122
{code}

3. The workaround is to put acct.brn is not null to join condition
{code}
select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid and acct.brn is not null
WHERE
  L.id = 4436;
OK
10  122
{code}

I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations 
have the issue


 [WHERE col is not null] does not work for large queries
 ---

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Alexander Pivovarov

 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 3. The workaround is to put acct.brn is not null to join condition
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = 

[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0

2015-05-26 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560431#comment-14560431
 ] 

Alexander Pivovarov commented on HIVE-10704:


Mostafa, can you check RB link? I'm not sure it shows HIVE-10704.3.patch

 Errors in Tez HashTableLoader when estimated table size is 0
 

 Key: HIVE-10704
 URL: https://issues.apache.org/jira/browse/HIVE-10704
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Jason Dere
Assignee: Mostafa Mokhtar
 Fix For: 1.2.1

 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, 
 HIVE-10704.3.patch


 Couple of issues:
 - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all 
 tables, the largest small table selection is wrong and could select the large 
 table (which results in NPE)
 - The memory estimates can either divide-by-zero, or allocate 0 memory if the 
 table size is 0. Try to come up with a sensible default for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-686) add UDF substring_index

2015-05-26 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-686:
-
Attachment: HIVE-686.1.patch

patch #1
- derive substring_index from GenericUDF
- add Junit and qtest tests

 add UDF substring_index
 ---

 Key: HIVE-686
 URL: https://issues.apache.org/jira/browse/HIVE-686
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Namit Jain
Assignee: Alexander Pivovarov
 Attachments: HIVE-686.1.patch, HIVE-686.patch, HIVE-686.patch


 SUBSTRING_INDEX(str,delim,count)
 Returns the substring from string str before count occurrences of the 
 delimiter delim. If count is positive, everything to the left of the final 
 delimiter (counting from the left) is returned. If count is negative, 
 everything to the right of the final delimiter (counting from the right) is 
 returned. SUBSTRING_INDEX() performs a case-sensitive match when searching 
 for delim.
 Examples:
 {code}
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', 3);
 --www.mysql.com
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2);
 --www.mysql
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', 1);
 --www
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', 0);
 --''
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', -1);
 --com
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2);
 --mysql.com
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', -3);
 --www.mysql.com
 {code}
 {code}
 --#delim does not exist in str
 SELECT SUBSTRING_INDEX('www.mysql.com', 'Q', 1);
 --www.mysql.com
 --#delim is 2 chars
 SELECT SUBSTRING_INDEX('www||mysql||com', '||', 2);
 --www||mysql
 --#delim is empty string
 SELECT SUBSTRING_INDEX('www.mysql.com', '', 2);
 --''
 --#str is empty string
 SELECT SUBSTRING_INDEX('', '.', 2);
 --''
 {code}
 {code}
 --#null params
 SELECT SUBSTRING_INDEX(null, '.', 1);
 --null
 SELECT SUBSTRING_INDEX('www.mysql.com', null, 1);
 --null
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', null);
 --null
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0

2015-05-26 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560432#comment-14560432
 ] 

Alexander Pivovarov commented on HIVE-10704:


Mostafa, can you check RB link? I'm not sure it shows HIVE-10704.3.patch

 Errors in Tez HashTableLoader when estimated table size is 0
 

 Key: HIVE-10704
 URL: https://issues.apache.org/jira/browse/HIVE-10704
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Jason Dere
Assignee: Mostafa Mokhtar
 Fix For: 1.2.1

 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, 
 HIVE-10704.3.patch


 Couple of issues:
 - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all 
 tables, the largest small table selection is wrong and could select the large 
 table (which results in NPE)
 - The memory estimates can either divide-by-zero, or allocate 0 memory if the 
 table size is 0. Try to come up with a sensible default for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   >