[jira] [Commented] (HIVE-6009) Add from_unixtime UDF that has controllable Timezone
[ https://issues.apache.org/jira/browse/HIVE-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857460#comment-15857460 ] Alexander Pivovarov commented on HIVE-6009: --- you can convert bigint to UTC timestamp and then convert UTC timestamp to GMT-5 timestamp (EST) {code} select from_unixtime(129384); 2010-12-31 16:00:00 // in Greenwich select from_utc_timestamp(from_unixtime(129384), 'GMT-5'); 2010-12-31 11:00:00 // in NYC > Add from_unixtime UDF that has controllable Timezone > > > Key: HIVE-6009 > URL: https://issues.apache.org/jira/browse/HIVE-6009 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 0.10.0 > Environment: CDH4.4 >Reporter: Johndee Burks >Priority: Trivial > > Currently the from_unixtime UDF takes into a account timezone of the system > doing the transformation. I think that implementation is good, but it would > be nice to include or change the current UDF to have a configurable timezone. > It would be useful for looking at timestamp data from different regions in > the native region's timezone. > Example: > from_unixtime(unix_time, format, timezone) > from_unixtime(129384, dd MMM , GMT-5) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-6009) Add from_unixtime UDF that has controllable Timezone
[ https://issues.apache.org/jira/browse/HIVE-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857460#comment-15857460 ] Alexander Pivovarov edited comment on HIVE-6009 at 2/8/17 5:57 AM: --- you can convert bigint to UTC timestamp and then convert UTC timestamp to GMT-5 timestamp (EST) {code} select from_unixtime(129384); 2010-12-31 16:00:00 // in Greenwich select from_utc_timestamp(from_unixtime(129384), 'GMT-5'); 2010-12-31 11:00:00 // in NYC {code} was (Author: apivovarov): you can convert bigint to UTC timestamp and then convert UTC timestamp to GMT-5 timestamp (EST) {code} select from_unixtime(129384); 2010-12-31 16:00:00 // in Greenwich select from_utc_timestamp(from_unixtime(129384), 'GMT-5'); 2010-12-31 11:00:00 // in NYC > Add from_unixtime UDF that has controllable Timezone > > > Key: HIVE-6009 > URL: https://issues.apache.org/jira/browse/HIVE-6009 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 0.10.0 > Environment: CDH4.4 >Reporter: Johndee Burks >Priority: Trivial > > Currently the from_unixtime UDF takes into a account timezone of the system > doing the transformation. I think that implementation is good, but it would > be nice to include or change the current UDF to have a configurable timezone. > It would be useful for looking at timestamp data from different regions in > the native region's timezone. > Example: > from_unixtime(unix_time, format, timezone) > from_unixtime(129384, dd MMM , GMT-5) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (HIVE-3558) UDF LEFT(string,position) to HIVE
[ https://issues.apache.org/jira/browse/HIVE-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov resolved HIVE-3558. --- Resolution: Won't Fix > UDF LEFT(string,position) to HIVE > -- > > Key: HIVE-3558 > URL: https://issues.apache.org/jira/browse/HIVE-3558 > Project: Hive > Issue Type: New Feature > Components: UDF >Affects Versions: 0.9.0 >Reporter: Aruna Babu >Priority: Minor > Attachments: HIVE-3558.1.patch.txt, udf_left.q, udf_left.q.out > > > Introduction > UDF (User Defined Function) to obtain the left most 'n' characters from > a string in HIVE. > Relevance > Current releases of Hive lacks a function which would returns the > leftmost len characters from the string str, or NULL if any argument is NULL. > > The function LEFT(string,length) would return the leftmost 'n' characters > from the string , or NULL if any argument is NULL which would be useful while > using HiveQL. This would find its use in all the technical aspects where the > concept of strings are used. > Functionality :- > Function Name: LEFT(string,length) > > Returns the leftmost length characters from the string or NULL if any > argument is NULL. > Example: hive>SELECT LEFT('https://www.irctc.co.in',5); > -> 'https' > Usage :- > Case 1: To query a table to find details based on an https request > Table :-Transaction > Request_id|date|period_id|url_name > 0001|01/07/2012|110001|https://www.irctc.co.in > 0002|02/07/2012|110001|https://nextstep.tcs.com > 0003|03/07/2012|110001|https://www.hdfcbank.com > 0005|01/07/2012|110001|http://www.lmnm.co.in > 0006|08/07/2012|110001|http://nextstart.com > 0007|10/07/2012|110001|https://netbanking.icicibank.com > 0012|21/07/2012|110001|http://www.people.co.in > 0026|08/07/2012|110001|http://nextprobs.com > 00023|25/07/2012|110001|https://netbanking.canarabank.com > Query : select * from transaction where LEFT(url_name,5)='https'; > Result :- > 0001|01/07/2012|110001|https://www.irctc.com > 0002|02/07/2012|110001|https://nextstep.tcs.com > 0003|03/07/2012|110001|https://www.hdfcbank.com > 0007|10/07/2012|110001|https://netbanking.icicibank.com > 00023|25/07/2012|110001|https://netbanking.canarabank.com -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-3558) UDF LEFT(string,position) to HIVE
[ https://issues.apache.org/jira/browse/HIVE-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857451#comment-15857451 ] Alexander Pivovarov commented on HIVE-3558: --- You can use substr to get LEFT and RIGHT {code} // get characters from 1st to 5th included SELECT substr('https://www.irctc.co.in', 1, 5); https // all RIGHT characters starting from 6th SELECT substr('https://www.irctc.co.in', 6); ://www.irctc.co.in {code} > UDF LEFT(string,position) to HIVE > -- > > Key: HIVE-3558 > URL: https://issues.apache.org/jira/browse/HIVE-3558 > Project: Hive > Issue Type: New Feature > Components: UDF >Affects Versions: 0.9.0 >Reporter: Aruna Babu >Priority: Minor > Attachments: HIVE-3558.1.patch.txt, udf_left.q, udf_left.q.out > > > Introduction > UDF (User Defined Function) to obtain the left most 'n' characters from > a string in HIVE. > Relevance > Current releases of Hive lacks a function which would returns the > leftmost len characters from the string str, or NULL if any argument is NULL. > > The function LEFT(string,length) would return the leftmost 'n' characters > from the string , or NULL if any argument is NULL which would be useful while > using HiveQL. This would find its use in all the technical aspects where the > concept of strings are used. > Functionality :- > Function Name: LEFT(string,length) > > Returns the leftmost length characters from the string or NULL if any > argument is NULL. > Example: hive>SELECT LEFT('https://www.irctc.co.in',5); > -> 'https' > Usage :- > Case 1: To query a table to find details based on an https request > Table :-Transaction > Request_id|date|period_id|url_name > 0001|01/07/2012|110001|https://www.irctc.co.in > 0002|02/07/2012|110001|https://nextstep.tcs.com > 0003|03/07/2012|110001|https://www.hdfcbank.com > 0005|01/07/2012|110001|http://www.lmnm.co.in > 0006|08/07/2012|110001|http://nextstart.com > 0007|10/07/2012|110001|https://netbanking.icicibank.com > 0012|21/07/2012|110001|http://www.people.co.in > 0026|08/07/2012|110001|http://nextprobs.com > 00023|25/07/2012|110001|https://netbanking.canarabank.com > Query : select * from transaction where LEFT(url_name,5)='https'; > Result :- > 0001|01/07/2012|110001|https://www.irctc.com > 0002|02/07/2012|110001|https://nextstep.tcs.com > 0003|03/07/2012|110001|https://www.hdfcbank.com > 0007|10/07/2012|110001|https://netbanking.icicibank.com > 00023|25/07/2012|110001|https://netbanking.canarabank.com -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (HIVE-2710) row_sequence UDF is not documented
[ https://issues.apache.org/jira/browse/HIVE-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov resolved HIVE-2710. --- Resolution: Won't Fix > row_sequence UDF is not documented > -- > > Key: HIVE-2710 > URL: https://issues.apache.org/jira/browse/HIVE-2710 > Project: Hive > Issue Type: Bug >Reporter: Sho Shimauchi >Priority: Minor > > row_sequence UDF was implemented in HIVE-1304, however the function is not > documented on hive wiki. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-2710) row_sequence UDF is not documented
[ https://issues.apache.org/jira/browse/HIVE-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857440#comment-15857440 ] Alexander Pivovarov commented on HIVE-2710: --- row_sequence UDF was moved to contrib package. Usually we do not describe contrib package UDFs in LanguageManual UDF > row_sequence UDF is not documented > -- > > Key: HIVE-2710 > URL: https://issues.apache.org/jira/browse/HIVE-2710 > Project: Hive > Issue Type: Bug >Reporter: Sho Shimauchi >Priority: Minor > > row_sequence UDF was implemented in HIVE-1304, however the function is not > documented on hive wiki. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-6046) add UDF for converting date time from one presentation to another
[ https://issues.apache.org/jira/browse/HIVE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-6046: -- Resolution: Duplicate Status: Resolved (was: Patch Available) > add UDF for converting date time from one presentation to another > -- > > Key: HIVE-6046 > URL: https://issues.apache.org/jira/browse/HIVE-6046 > Project: Hive > Issue Type: New Feature > Components: UDF >Affects Versions: 0.13.0 >Reporter: Kostiantyn Kudriavtsev >Assignee: Kostiantyn Kudriavtsev > Attachments: Hive-6046-Feb15.patch, Hive-6046.patch, HIVE-6046.patch > > > it'd be nice to have function for converting datetime to different formats, > for example: > format_date('2013-12-12 00:00:00.0', '-MM-dd HH:mm:ss.S', '/MM/dd') > There are two signatures to facilitate further using: > format_date(datetime, fromFormat, toFormat) > format_date(timestamp, toFormat) > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (HIVE-6214) Need a UDF to convert a Date String from any standard format to another. Should be able to provide the Date String, current format and to the format into which it need to
[ https://issues.apache.org/jira/browse/HIVE-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov resolved HIVE-6214. --- Resolution: Duplicate > Need a UDF to convert a Date String from any standard format to another. > Should be able to provide the Date String, current format and to the format > into which it need to be converted and returned as String output of UDF > > > Key: HIVE-6214 > URL: https://issues.apache.org/jira/browse/HIVE-6214 > Project: Hive > Issue Type: New Feature > Components: UDF > Environment: Software >Reporter: Rony Pius Manakkal >Priority: Minor > Labels: features > > Need a UDF to convert a Date String from any standard format to another. > Should be able to provide the Date String, current format and to the format > into which it need to be converted and returned as String output of UDF > Example : String convertDateFormat(String dateString, String > currentDateFormat, String requiredFormat); -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-9988) Evaluating UDF before query is run
[ https://issues.apache.org/jira/browse/HIVE-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857435#comment-15857435 ] Alexander Pivovarov commented on HIVE-9988: --- You can assign the expression to variable before query is evaluated and then use the variable in WHERE {code} set dt=from_unixtime(unix_timestamp(),'MMdd'); select * from A where dt=${hiveconf:dt}; {code} > Evaluating UDF before query is run > -- > > Key: HIVE-9988 > URL: https://issues.apache.org/jira/browse/HIVE-9988 > Project: Hive > Issue Type: Improvement >Reporter: Ådne Brunborg > > When using UDFs on partition column in Hive, all partitions are scanned > before the UDF is resolved. > If the UDF could be evaluated before query is run, this would greatly improve > performance in cases like this. > Example - the table has a partition by datestamp (bigint): > The following where clause touches upon all 82 partitions: > {{WHERE datestamp=cast(from_unixtime(unix_timestamp(),'MMdd') as bigint)}} > {{15/03/16 09:21:53 INFO mapred.FileInputFormat: Total input paths to process > : 82}} > …whereas the following only touches the one partition: > {{WHERE datestamp=20150316}} > {{15/03/16 09:23:06 INFO input.FileInputFormat: Total input paths to process > : 1}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15746) Fix default delimiter2 in str_to_map UDF or in method description
[ https://issues.apache.org/jira/browse/HIVE-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851014#comment-15851014 ] Alexander Pivovarov commented on HIVE-15746: str_to_map documentation was fixed as well (LanguageManual UDF wiki) https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions > Fix default delimiter2 in str_to_map UDF or in method description > - > > Key: HIVE-15746 > URL: https://issues.apache.org/jira/browse/HIVE-15746 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 2.1.1 >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov >Priority: Trivial > Fix For: 2.2.0 > > Attachments: HIVE-15746.1.patch > > > According to UDF wiki and to GenericUDFStringToMap.java class comments > default delimiter 2 should be '='. > But in the code default_del2 = ":" > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFStringToMap.java#L53 > We need to fix code or fix the method description and UDF wiki > Let me know what you think? > {code} > str_to_map("a=1,b=2") > vs > str_to_map("a:1,b:2") > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15746) Fix default delimiter2 in str_to_map UDF or in method description
[ https://issues.apache.org/jira/browse/HIVE-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-15746: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Fix default delimiter2 in str_to_map UDF or in method description > - > > Key: HIVE-15746 > URL: https://issues.apache.org/jira/browse/HIVE-15746 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 2.1.1 >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov >Priority: Trivial > Fix For: 2.2.0 > > Attachments: HIVE-15746.1.patch > > > According to UDF wiki and to GenericUDFStringToMap.java class comments > default delimiter 2 should be '='. > But in the code default_del2 = ":" > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFStringToMap.java#L53 > We need to fix code or fix the method description and UDF wiki > Let me know what you think? > {code} > str_to_map("a=1,b=2") > vs > str_to_map("a:1,b:2") > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15746) Fix default delimiter2 in str_to_map UDF or in method description
[ https://issues.apache.org/jira/browse/HIVE-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-15746: --- Affects Version/s: (was: 2.2.0) 2.1.1 Fix Version/s: 2.2.0 > Fix default delimiter2 in str_to_map UDF or in method description > - > > Key: HIVE-15746 > URL: https://issues.apache.org/jira/browse/HIVE-15746 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 2.1.1 >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov >Priority: Trivial > Fix For: 2.2.0 > > Attachments: HIVE-15746.1.patch > > > According to UDF wiki and to GenericUDFStringToMap.java class comments > default delimiter 2 should be '='. > But in the code default_del2 = ":" > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFStringToMap.java#L53 > We need to fix code or fix the method description and UDF wiki > Let me know what you think? > {code} > str_to_map("a=1,b=2") > vs > str_to_map("a:1,b:2") > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15746) Fix default delimiter2 in str_to_map UDF or in method description
[ https://issues.apache.org/jira/browse/HIVE-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-15746: --- Attachment: HIVE-15746.1.patch I changed default delimiter2 value to ":" in UDF method description > Fix default delimiter2 in str_to_map UDF or in method description > - > > Key: HIVE-15746 > URL: https://issues.apache.org/jira/browse/HIVE-15746 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 2.2.0 >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov >Priority: Trivial > Attachments: HIVE-15746.1.patch > > > According to UDF wiki and to GenericUDFStringToMap.java class comments > default delimiter 2 should be '='. > But in the code default_del2 = ":" > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFStringToMap.java#L53 > We need to fix code or fix the method description and UDF wiki > Let me know what you think? > {code} > str_to_map("a=1,b=2") > vs > str_to_map("a:1,b:2") > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15746) Fix default delimiter2 in str_to_map UDF or in method description
[ https://issues.apache.org/jira/browse/HIVE-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-15746: --- Status: Patch Available (was: Open) > Fix default delimiter2 in str_to_map UDF or in method description > - > > Key: HIVE-15746 > URL: https://issues.apache.org/jira/browse/HIVE-15746 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 2.2.0 >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov >Priority: Trivial > Attachments: HIVE-15746.1.patch > > > According to UDF wiki and to GenericUDFStringToMap.java class comments > default delimiter 2 should be '='. > But in the code default_del2 = ":" > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFStringToMap.java#L53 > We need to fix code or fix the method description and UDF wiki > Let me know what you think? > {code} > str_to_map("a=1,b=2") > vs > str_to_map("a:1,b:2") > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11593) Add aes_encrypt and aes_decrypt UDFs
[ https://issues.apache.org/jira/browse/HIVE-11593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-11593: --- Affects Version/s: (was: 2.2.0) > Add aes_encrypt and aes_decrypt UDFs > > > Key: HIVE-11593 > URL: https://issues.apache.org/jira/browse/HIVE-11593 > Project: Hive > Issue Type: New Feature > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11593.1.patch, HIVE-11593.2.patch > > > AES (Advanced Encryption Standard) algorithm. > Oracle JRE supports AES-128 out of the box > AES-192 and AES-256 are supported if Cryptography Extension (JCE) Unlimited > Strength Jurisdiction Policy Files are installed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15746) Fix default delimiter2 in str_to_map UDF or in method description
[ https://issues.apache.org/jira/browse/HIVE-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-15746: --- Affects Version/s: 2.2.0 > Fix default delimiter2 in str_to_map UDF or in method description > - > > Key: HIVE-15746 > URL: https://issues.apache.org/jira/browse/HIVE-15746 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 2.2.0 >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov >Priority: Trivial > > According to UDF wiki and to GenericUDFStringToMap.java class comments > default delimiter 2 should be '='. > But in the code default_del2 = ":" > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFStringToMap.java#L53 > We need to fix code or fix the method description and UDF wiki > Let me know what you think? > {code} > str_to_map("a=1,b=2") > vs > str_to_map("a:1,b:2") > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11593) Add aes_encrypt and aes_decrypt UDFs
[ https://issues.apache.org/jira/browse/HIVE-11593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-11593: --- Affects Version/s: 2.2.0 > Add aes_encrypt and aes_decrypt UDFs > > > Key: HIVE-11593 > URL: https://issues.apache.org/jira/browse/HIVE-11593 > Project: Hive > Issue Type: New Feature > Components: UDF >Affects Versions: 2.2.0 >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11593.1.patch, HIVE-11593.2.patch > > > AES (Advanced Encryption Standard) algorithm. > Oracle JRE supports AES-128 out of the box > AES-192 and AES-256 are supported if Cryptography Extension (JCE) Unlimited > Strength Jurisdiction Policy Files are installed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11593) Add aes_encrypt and aes_decrypt UDFs
[ https://issues.apache.org/jira/browse/HIVE-11593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732169#comment-14732169 ] Alexander Pivovarov commented on HIVE-11593: Added functions descriptions to the wiki https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Misc.Functions > Add aes_encrypt and aes_decrypt UDFs > > > Key: HIVE-11593 > URL: https://issues.apache.org/jira/browse/HIVE-11593 > Project: Hive > Issue Type: New Feature > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11593.1.patch, HIVE-11593.2.patch > > > AES (Advanced Encryption Standard) algorithm. > Oracle JRE supports AES-128 out of the box > AES-192 and AES-256 are supported if Cryptography Extension (JCE) Unlimited > Strength Jurisdiction Policy Files are installed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11593) Add aes_encrypt and aes_decrypt UDFs
[ https://issues.apache.org/jira/browse/HIVE-11593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717087#comment-14717087 ] Alexander Pivovarov commented on HIVE-11593: The error is not related to the patch #2. Next build 5087 also has this test (TestHCatClient.testTableSchemaPropagation) failed Add aes_encrypt and aes_decrypt UDFs Key: HIVE-11593 URL: https://issues.apache.org/jira/browse/HIVE-11593 Project: Hive Issue Type: New Feature Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-11593.1.patch, HIVE-11593.2.patch AES (Advanced Encryption Standard) algorithm. Oracle JRE supports AES-128 out of the box AES-192 and AES-256 are supported if Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files are installed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11593) Add aes_encrypt and aes_decrypt UDFs
[ https://issues.apache.org/jira/browse/HIVE-11593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-11593: --- Attachment: HIVE-11593.2.patch patch #2 - move Cipher.getInstance to initialize Add aes_encrypt and aes_decrypt UDFs Key: HIVE-11593 URL: https://issues.apache.org/jira/browse/HIVE-11593 Project: Hive Issue Type: New Feature Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-11593.1.patch, HIVE-11593.2.patch AES (Advanced Encryption Standard) algorithm. Oracle JRE supports AES-128 out of the box AES-192 and AES-256 are supported if Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files are installed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11593) Add aes_encrypt and aes_decrypt UDFs
[ https://issues.apache.org/jira/browse/HIVE-11593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-11593: --- Description: AES (Advanced Encryption Standard) algorithm. Oracle JRE supports AES-128 out of the box AES-192 and AES-256 are supported if Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files are installed was: AES (Advanced Encryption Standard) algorithm. Oracle JRE supports AES-128 out of the box AES-192 and AES-256 are supported if Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files installed Add aes_encrypt and aes_decrypt UDFs Key: HIVE-11593 URL: https://issues.apache.org/jira/browse/HIVE-11593 Project: Hive Issue Type: New Feature Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov AES (Advanced Encryption Standard) algorithm. Oracle JRE supports AES-128 out of the box AES-192 and AES-256 are supported if Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files are installed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11593) Add aes_encrypt and aes_decrypt UDFs
[ https://issues.apache.org/jira/browse/HIVE-11593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-11593: --- Attachment: HIVE-11593.1.patch patch #1 Add aes_encrypt and aes_decrypt UDFs Key: HIVE-11593 URL: https://issues.apache.org/jira/browse/HIVE-11593 Project: Hive Issue Type: New Feature Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-11593.1.patch AES (Advanced Encryption Standard) algorithm. Oracle JRE supports AES-128 out of the box AES-192 and AES-256 are supported if Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files are installed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-1262) Add security/checksum UDFs sha,crc32,md5,aes_encrypt, and aes_decrypt
[ https://issues.apache.org/jira/browse/HIVE-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov resolved HIVE-1262. --- Resolution: Duplicate Add security/checksum UDFs sha,crc32,md5,aes_encrypt, and aes_decrypt - Key: HIVE-1262 URL: https://issues.apache.org/jira/browse/HIVE-1262 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.6.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: hive-1262-1.patch.txt Add security/checksum UDFs sha,crc32,md5,aes_encrypt, and aes_decrypt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10276) Implement date_format(timestamp, fmt) UDF
[ https://issues.apache.org/jira/browse/HIVE-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697363#comment-14697363 ] Alexander Pivovarov commented on HIVE-10276: Hive documentation for date_format UDF clearly says - Supported formats are Java SimpleDateFormat formats https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions Feel free to submit patch for date_format_mysql UDF Implement date_format(timestamp, fmt) UDF - Key: HIVE-10276 URL: https://issues.apache.org/jira/browse/HIVE-10276 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Fix For: 1.2.0 Attachments: HIVE-10276.01.patch date_format(date/timestamp/string, fmt) converts a date/timestamp/string to a value of String in the format specified by the java date format fmt. Supported formats listed here: https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10485) Create md5 UDF
[ https://issues.apache.org/jira/browse/HIVE-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698117#comment-14698117 ] Alexander Pivovarov commented on HIVE-10485: Ruslan, try unhex(md5(string/binary)) Create md5 UDF -- Key: HIVE-10485 URL: https://issues.apache.org/jira/browse/HIVE-10485 Project: Hive Issue Type: Task Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Fix For: 1.3.0, 2.0.0 Attachments: HIVE-10485.1.patch, HIVE-10485.2.patch, HIVE-10485.3.patch MD5(str) Calculates an MD5 128-bit checksum for the string. The value is returned as a string of 32 hex digits, or NULL if the argument was NULL. The return value can, for example, be used as a hash key. Example: {code} SELECT MD5('udf_md5'); 'ce62ef0d2d27dc37b6d488b92f4b24fd' {code} online md5 generator: http://www.md5.cz/ MySQL has md5 function: https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_md5 PostgreSQL also has md5 function: http://www.postgresql.org/docs/9.1/static/functions-string.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11103) Add banker's rounding BROUND UDF
[ https://issues.apache.org/jira/browse/HIVE-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695730#comment-14695730 ] Alexander Pivovarov commented on HIVE-11103: Committed to master and branch-1. Thank you Jason for your review! Add banker's rounding BROUND UDF Key: HIVE-11103 URL: https://issues.apache.org/jira/browse/HIVE-11103 Project: Hive Issue Type: New Feature Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11103.1.patch, HIVE-11103.1.patch, HIVE-11103.2.patch, HIVE-11103.4.patch Banker's rounding: the value is rounded to the nearest even number. Also known as Gaussian rounding, and, in German, mathematische Rundung. Example {code} 2 digits2 digits UnroundedStandard roundingGaussian rounding 54.1754 54.18 54.18 343.2050 343.21 343.20 +106.2038+106.20+106.20 ======= 503.5842 503.59 503.58 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11103) Add banker's rounding BROUND UDF
[ https://issues.apache.org/jira/browse/HIVE-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-11103: --- Attachment: HIVE-11103.4.patch patch #4: - added BRoubs udf to Vectorizer - added non-default constructor to BRoundWithNumDigitsDoubleToDouble and FuncRoundWithNumDigitsDecimalToDecimal - added BRound to TestVectorizationContext - fixed vector_bround.q.out Add banker's rounding BROUND UDF Key: HIVE-11103 URL: https://issues.apache.org/jira/browse/HIVE-11103 Project: Hive Issue Type: New Feature Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-11103.1.patch, HIVE-11103.1.patch, HIVE-11103.2.patch, HIVE-11103.4.patch Banker's rounding: the value is rounded to the nearest even number. Also known as Gaussian rounding, and, in German, mathematische Rundung. Example {code} 2 digits2 digits UnroundedStandard roundingGaussian rounding 54.1754 54.18 54.18 343.2050 343.21 343.20 +106.2038+106.20+106.20 ======= 503.5842 503.59 503.58 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11143) Tests udf_from_utc_timestamp.q/udf_to_utc_timestamp.q do not work with updated Java timezone information
[ https://issues.apache.org/jira/browse/HIVE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643033#comment-14643033 ] Alexander Pivovarov commented on HIVE-11143: +1 Tests udf_from_utc_timestamp.q/udf_to_utc_timestamp.q do not work with updated Java timezone information Key: HIVE-11143 URL: https://issues.apache.org/jira/browse/HIVE-11143 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11143.1.patch It looks like there were recent changes to the Europe/Moscow time zone in 2014. When udf_from_utc_timestamp.q/udf_to_utc_timestamp.q are run with more recent versions of JDK or with an updated time zone database, the tests fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11103) Add banker's rounding BROUND UDF
[ https://issues.apache.org/jira/browse/HIVE-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643001#comment-14643001 ] Alexander Pivovarov commented on HIVE-11103: I do not think 2 tests failure relates to patch #2 Add banker's rounding BROUND UDF Key: HIVE-11103 URL: https://issues.apache.org/jira/browse/HIVE-11103 Project: Hive Issue Type: New Feature Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-11103.1.patch, HIVE-11103.1.patch, HIVE-11103.2.patch Banker's rounding: the value is rounded to the nearest even number. Also known as Gaussian rounding, and, in German, mathematische Rundung. Example {code} 2 digits2 digits UnroundedStandard roundingGaussian rounding 54.1754 54.18 54.18 343.2050 343.21 343.20 +106.2038+106.20+106.20 ======= 503.5842 503.59 503.58 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11103) Add banker's rounding BROUND UDF
[ https://issues.apache.org/jira/browse/HIVE-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-11103: --- Attachment: HIVE-11103.2.patch patch #2 - added vector_bround.q - added testMathExprBround Add banker's rounding BROUND UDF Key: HIVE-11103 URL: https://issues.apache.org/jira/browse/HIVE-11103 Project: Hive Issue Type: New Feature Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-11103.1.patch, HIVE-11103.1.patch, HIVE-11103.2.patch Banker's rounding: the value is rounded to the nearest even number. Also known as Gaussian rounding, and, in German, mathematische Rundung. Example {code} 2 digits2 digits UnroundedStandard roundingGaussian rounding 54.1754 54.18 54.18 343.2050 343.21 343.20 +106.2038+106.20+106.20 ======= 503.5842 503.59 503.58 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov reopened HIVE-9557: --- I take it create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Labels: CosineSimilarity, SimilarityMetric, UDF algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604327#comment-14604327 ] Alexander Pivovarov commented on HIVE-9557: --- Hi Nishant, if I click on RB link I get this {code} You don't have access to this review request. This review request is private. You must be a requested reviewer, either directly or on a requested group, and have permission to access the repository in order to view this review request. {code} I recommend to use RBTTools to upload and update patches. for initial upload {code} rbt post -g yes {code} to update {code} rbt post -u -g yes {code} https://www.reviewboard.org/downloads/rbtools/ create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar Labels: CosineSimilarity, SimilarityMetric, UDF Attachments: HIVE-9557.1.patch, udf_cosine_similarity-v01.patch algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11134) HS2 should log open session failure
[ https://issues.apache.org/jira/browse/HIVE-11134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604359#comment-14604359 ] Alexander Pivovarov commented on HIVE-11134: +1 HS2 should log open session failure --- Key: HIVE-11134 URL: https://issues.apache.org/jira/browse/HIVE-11134 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-11134.1.patch HiveServer2 should log OpenSession failure. If beeline is not running with --verbose=true all stack trace information is not available for later debugging, as it is not currently logged in server side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7150) FileInputStream is not closed in HiveConnection#getHttpClient()
[ https://issues.apache.org/jira/browse/HIVE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-7150: -- Component/s: JDBC FileInputStream is not closed in HiveConnection#getHttpClient() --- Key: HIVE-7150 URL: https://issues.apache.org/jira/browse/HIVE-7150 Project: Hive Issue Type: Bug Components: JDBC Reporter: Ted Yu Assignee: Alexander Pivovarov Labels: jdbc Attachments: HIVE-7150.1.patch, HIVE-7150.2.patch, HIVE-7150.3.patch, HIVE-7150.4.patch Here is related code: {code} sslTrustStore.load(new FileInputStream(sslTrustStorePath), sslTrustStorePassword.toCharArray()); {code} The FileInputStream is not closed upon returning from the method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7180) BufferedReader is not closed in MetaStoreSchemaInfo ctor
[ https://issues.apache.org/jira/browse/HIVE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-7180: -- Attachment: HIVE-7180.4.patch patch #4 - using java 7 try-with-resources BufferedReader is not closed in MetaStoreSchemaInfo ctor Key: HIVE-7180 URL: https://issues.apache.org/jira/browse/HIVE-7180 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Ted Yu Assignee: Alexander Pivovarov Priority: Minor Labels: patch Attachments: HIVE-7180.3.patch, HIVE-7180.4.patch, HIVE-7180.patch, HIVE-7180_001.patch Here is related code: {code} BufferedReader bfReader = new BufferedReader(new FileReader(upgradeListFile)); String currSchemaVersion; while ((currSchemaVersion = bfReader.readLine()) != null) { upgradeOrderList.add(currSchemaVersion.trim()); {code} BufferedReader / FileReader should be closed upon return from ctor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7150) FileInputStream is not closed in HiveConnection#getHttpClient()
[ https://issues.apache.org/jira/browse/HIVE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-7150: -- Attachment: HIVE-7150.4.patch patch #4 - using try-with-resources FileInputStream is not closed in HiveConnection#getHttpClient() --- Key: HIVE-7150 URL: https://issues.apache.org/jira/browse/HIVE-7150 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: Alexander Pivovarov Labels: jdbc Attachments: HIVE-7150.1.patch, HIVE-7150.2.patch, HIVE-7150.3.patch, HIVE-7150.4.patch Here is related code: {code} sslTrustStore.load(new FileInputStream(sslTrustStorePath), sslTrustStorePassword.toCharArray()); {code} The FileInputStream is not closed upon returning from the method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7180) BufferedReader is not closed in MetaStoreSchemaInfo ctor
[ https://issues.apache.org/jira/browse/HIVE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-7180: -- Component/s: Metastore BufferedReader is not closed in MetaStoreSchemaInfo ctor Key: HIVE-7180 URL: https://issues.apache.org/jira/browse/HIVE-7180 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.1 Reporter: Ted Yu Assignee: Alexander Pivovarov Priority: Minor Attachments: HIVE-7180.3.patch, HIVE-7180.4.patch, HIVE-7180.patch, HIVE-7180_001.patch Here is related code: {code} BufferedReader bfReader = new BufferedReader(new FileReader(upgradeListFile)); String currSchemaVersion; while ((currSchemaVersion = bfReader.readLine()) != null) { upgradeOrderList.add(currSchemaVersion.trim()); {code} BufferedReader / FileReader should be closed upon return from ctor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7180) BufferedReader is not closed in MetaStoreSchemaInfo ctor
[ https://issues.apache.org/jira/browse/HIVE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-7180: -- Labels: (was: patch) BufferedReader is not closed in MetaStoreSchemaInfo ctor Key: HIVE-7180 URL: https://issues.apache.org/jira/browse/HIVE-7180 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.1 Reporter: Ted Yu Assignee: Alexander Pivovarov Priority: Minor Attachments: HIVE-7180.3.patch, HIVE-7180.4.patch, HIVE-7180.patch, HIVE-7180_001.patch Here is related code: {code} BufferedReader bfReader = new BufferedReader(new FileReader(upgradeListFile)); String currSchemaVersion; while ((currSchemaVersion = bfReader.readLine()) != null) { upgradeOrderList.add(currSchemaVersion.trim()); {code} BufferedReader / FileReader should be closed upon return from ctor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604360#comment-14604360 ] Alexander Pivovarov commented on HIVE-9557: --- I can open the RB now. I noticed that some people use github PR now as an alternative to RB. e.g. https://issues.apache.org/jira/browse/HIVE-11134 create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar Labels: CosineSimilarity, SimilarityMetric, UDF Attachments: HIVE-9557.1.patch, udf_cosine_similarity-v01.patch algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604346#comment-14604346 ] Alexander Pivovarov commented on HIVE-9557: --- The review request should be public. https://www.reviewboard.org/docs/manual/2.5/admin/configuration/access-control/#access-control create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar Labels: CosineSimilarity, SimilarityMetric, UDF Attachments: HIVE-9557.1.patch, udf_cosine_similarity-v01.patch algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11103) Add banker's rounding BROUND UDF
[ https://issues.apache.org/jira/browse/HIVE-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-11103: --- Attachment: HIVE-11103.1.patch attaching patch #1 again Add banker's rounding BROUND UDF Key: HIVE-11103 URL: https://issues.apache.org/jira/browse/HIVE-11103 Project: Hive Issue Type: New Feature Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-11103.1.patch, HIVE-11103.1.patch Banker's rounding: the value is rounded to the nearest even number. Also known as Gaussian rounding, and, in German, mathematische Rundung. Example {code} 2 digits2 digits UnroundedStandard roundingGaussian rounding 54.1754 54.18 54.18 343.2050 343.21 343.20 +106.2038+106.20+106.20 ======= 503.5842 503.59 503.58 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7180) BufferedReader is not closed in MetaStoreSchemaInfo ctor
[ https://issues.apache.org/jira/browse/HIVE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604536#comment-14604536 ] Alexander Pivovarov commented on HIVE-7180: --- The errors are unrelated. Previous build (4418) had the same 10 errors BufferedReader is not closed in MetaStoreSchemaInfo ctor Key: HIVE-7180 URL: https://issues.apache.org/jira/browse/HIVE-7180 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.1 Reporter: Ted Yu Assignee: Alexander Pivovarov Priority: Minor Attachments: HIVE-7180.3.patch, HIVE-7180.4.patch, HIVE-7180.patch, HIVE-7180_001.patch Here is related code: {code} BufferedReader bfReader = new BufferedReader(new FileReader(upgradeListFile)); String currSchemaVersion; while ((currSchemaVersion = bfReader.readLine()) != null) { upgradeOrderList.add(currSchemaVersion.trim()); {code} BufferedReader / FileReader should be closed upon return from ctor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-7150) FileInputStream is not closed in HiveConnection#getHttpClient()
[ https://issues.apache.org/jira/browse/HIVE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov reassigned HIVE-7150: - Assignee: Alexander Pivovarov FileInputStream is not closed in HiveConnection#getHttpClient() --- Key: HIVE-7150 URL: https://issues.apache.org/jira/browse/HIVE-7150 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: Alexander Pivovarov Labels: jdbc Attachments: HIVE-7150.1.patch, HIVE-7150.2.patch Here is related code: {code} sslTrustStore.load(new FileInputStream(sslTrustStorePath), sslTrustStorePassword.toCharArray()); {code} The FileInputStream is not closed upon returning from the method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7150) FileInputStream is not closed in HiveConnection#getHttpClient()
[ https://issues.apache.org/jira/browse/HIVE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-7150: -- Attachment: HIVE-7150.3.patch patch #3 - close FileInputStream in 3 places in HiveConnection class FileInputStream is not closed in HiveConnection#getHttpClient() --- Key: HIVE-7150 URL: https://issues.apache.org/jira/browse/HIVE-7150 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: Alexander Pivovarov Labels: jdbc Attachments: HIVE-7150.1.patch, HIVE-7150.2.patch, HIVE-7150.3.patch Here is related code: {code} sslTrustStore.load(new FileInputStream(sslTrustStorePath), sslTrustStorePassword.toCharArray()); {code} The FileInputStream is not closed upon returning from the method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-7180) BufferedReader is not closed in MetaStoreSchemaInfo ctor
[ https://issues.apache.org/jira/browse/HIVE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov reassigned HIVE-7180: - Assignee: Alexander Pivovarov (was: skrho) BufferedReader is not closed in MetaStoreSchemaInfo ctor Key: HIVE-7180 URL: https://issues.apache.org/jira/browse/HIVE-7180 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Ted Yu Assignee: Alexander Pivovarov Priority: Minor Labels: patch Attachments: HIVE-7180.patch, HIVE-7180_001.patch Here is related code: {code} BufferedReader bfReader = new BufferedReader(new FileReader(upgradeListFile)); String currSchemaVersion; while ((currSchemaVersion = bfReader.readLine()) != null) { upgradeOrderList.add(currSchemaVersion.trim()); {code} BufferedReader / FileReader should be closed upon return from ctor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7180) BufferedReader is not closed in MetaStoreSchemaInfo ctor
[ https://issues.apache.org/jira/browse/HIVE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603939#comment-14603939 ] Alexander Pivovarov commented on HIVE-7180: --- I do not think we should change constructor signature. Let me fix it BufferedReader is not closed in MetaStoreSchemaInfo ctor Key: HIVE-7180 URL: https://issues.apache.org/jira/browse/HIVE-7180 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Ted Yu Assignee: Alexander Pivovarov Priority: Minor Labels: patch Attachments: HIVE-7180.patch, HIVE-7180_001.patch Here is related code: {code} BufferedReader bfReader = new BufferedReader(new FileReader(upgradeListFile)); String currSchemaVersion; while ((currSchemaVersion = bfReader.readLine()) != null) { upgradeOrderList.add(currSchemaVersion.trim()); {code} BufferedReader / FileReader should be closed upon return from ctor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7180) BufferedReader is not closed in MetaStoreSchemaInfo ctor
[ https://issues.apache.org/jira/browse/HIVE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-7180: -- Attachment: HIVE-7180.3.patch patch #3 - add finally block and close reader quietly BufferedReader is not closed in MetaStoreSchemaInfo ctor Key: HIVE-7180 URL: https://issues.apache.org/jira/browse/HIVE-7180 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Ted Yu Assignee: Alexander Pivovarov Priority: Minor Labels: patch Attachments: HIVE-7180.3.patch, HIVE-7180.patch, HIVE-7180_001.patch Here is related code: {code} BufferedReader bfReader = new BufferedReader(new FileReader(upgradeListFile)); String currSchemaVersion; while ((currSchemaVersion = bfReader.readLine()) != null) { upgradeOrderList.add(currSchemaVersion.trim()); {code} BufferedReader / FileReader should be closed upon return from ctor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7233) File hive-hwi-0.13.1 not found on lib folder
[ https://issues.apache.org/jira/browse/HIVE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603952#comment-14603952 ] Alexander Pivovarov commented on HIVE-7233: --- Lots of modern web project use embedded webservers like Tomcat or Jetty. They are not supposed to be deployed to web container and this is why they should not be packed as WAR. Instead of it the webapp and embedded server are started as regular java program using HWIServer.main() (which calls HWIServer.start()) File hive-hwi-0.13.1 not found on lib folder Key: HIVE-7233 URL: https://issues.apache.org/jira/browse/HIVE-7233 Project: Hive Issue Type: New Feature Components: Web UI Affects Versions: 0.13.1 Reporter: Dinh Hoang Luong Assignee: Thomas Friedrich Attachments: HIVE-7233.1.patch I found that: line 27 of file .../apache-hive-0.13.1-sr/hwi/pom.xml with packagejar/package instead of packagewar/package sorry my english is bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7598) Potential null pointer dereference in MergeTask#closeJob()
[ https://issues.apache.org/jira/browse/HIVE-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603959#comment-14603959 ] Alexander Pivovarov commented on HIVE-7598: --- isCompressed and tableInfo can not be null because they are used in the code later Potential null pointer dereference in MergeTask#closeJob() -- Key: HIVE-7598 URL: https://issues.apache.org/jira/browse/HIVE-7598 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: SUYEON LEE Priority: Minor Attachments: HIVE-7598.patch Call to Utilities.mvFileToFinalPath() passes null as second last parameter, conf. null gets passed to createEmptyBuckets() which dereferences conf directly: {code} boolean isCompressed = conf.getCompressed(); TableDesc tableInfo = conf.getTableInfo(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11116) Can not select data from table which points to remote hdfs location
[ https://issues.apache.org/jira/browse/HIVE-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-6: --- Description: I tried to create new table which points to remote hdfs location and select data from it. It works for hive-0.14 and hive-1.0 but it does not work starting from hive-1.1 to reproduce the issue 1. create folder on remote hdfs {code} hadoop fs -mkdir -p hdfs://remote-nn/tmp/et1 {code} 2. create table {code} CREATE TABLE et1 ( a string ) stored as textfile LOCATION 'hdfs://remote-nn/tmp/et1'; {code} 3. run select {code} select * from et1 limit 10; {code} 4. Should get the following error {code} select * from et1; 15/06/25 13:43:44 [main]: ERROR parse.CalcitePlanner: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to determine if hdfs://remote-nn/tmp/et1is encrypted: java.lang.IllegalArgumentException: Wrong FS: hdfs://remote-nn/tmp/et1, expected: hdfs://localhost:8020 at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1763) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1875) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1689) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1427) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10132) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10147) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:190) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs://remote-nn/tmp/et1, expected: hdfs://localhost:8020 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193) at org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1906) at org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262) at org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1097) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1759) ... 25 more FAILED: SemanticException Unable to determine if hdfs://remote-nn/tmp/et1is encrypted: java.lang.IllegalArgumentException: Wrong FS: hdfs://remote-nn/tmp/et1, expected: hdfs://localhost:8020 15/06/25 13:43:44 [main]: ERROR ql.Driver: FAILED: SemanticException Unable to determine if hdfs://remote-nn/tmp/et1is encrypted: java.lang.IllegalArgumentException: Wrong FS: hdfs://remote-nn/tmp/et1, expected: hdfs://localhost:8020 org.apache.hadoop.hive.ql.parse.SemanticException: Unable to determine if hdfs://remote-nn/tmp/et1is encrypted: java.lang.IllegalArgumentException: Wrong FS: hdfs://remote-nn/tmp/et1, expected: hdfs://localhost:8020 at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1743) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1427) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10132) at
[jira] [Updated] (HIVE-11103) Add banker's rounding BROUND UDF
[ https://issues.apache.org/jira/browse/HIVE-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-11103: --- Attachment: HIVE-11103.1.patch patch #1 Add banker's rounding BROUND UDF Key: HIVE-11103 URL: https://issues.apache.org/jira/browse/HIVE-11103 Project: Hive Issue Type: New Feature Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-11103.1.patch Banker's rounding: the value is rounded to the nearest even number. Also known as Gaussian rounding, and, in German, mathematische Rundung. Example {code} 2 digits2 digits UnroundedStandard roundingGaussian rounding 54.1754 54.18 54.18 343.2050 343.21 343.20 +106.2038+106.20+106.20 ======= 503.5842 503.59 503.58 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600195#comment-14600195 ] Alexander Pivovarov commented on HIVE-9557: --- rename clientnegative/udf_cosine_similarity.q to clientnegative/udf_cosine_similarity_error_1.q then {code} # build hive mvn clean install -Phadoop-2,dist -DskipTests # build itest cd itest mvn clean install -Phadoop-2 -DskipTests # build qtest cd qtest mvn clean install -Phadoop-2 -DskipTests # run q test. it will overwrite q.out file mvn test -Dtest=TestCliDriver -Dqfile=udf_cosine_similarity.q,show_functions.q -Dtest.output.overwrite=true -Phadoop-2 # run negative q file test mvn test -Dtest=TestNegativeCliDriver -Dqfile=udf_cosine_similarity_error_1.q -Dtest.output.overwrite=true -Phadoop-2 {code} create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar Labels: CosineSimilarity, SimilarityMetric, UDF Attachments: udf_cosine_similarity-v01.patch algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599737#comment-14599737 ] Alexander Pivovarov commented on HIVE-9557: --- Hi Nishant, Thank you for the patch. Can you look at the following recommendations/issues - usually patch name should look like HIVE-9557.1.patch - can you attach RB link to the Jira? So, we can leave comments for particular code lines - you have to provide integration tests for the function (q file and q.out file) - the function should be registered - probably you can look at HIVE-9556 as an example - hive code uses 2 spaces for indent - a*b should be separated by space a * b create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar Labels: CosineSimilarity, SimilarityMetric, UDF Attachments: udf_cosine_similarity-v01.patch algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590278#comment-14590278 ] Alexander Pivovarov commented on HIVE-10841: 1.0.1 was released on May 20. How it can be fixed in 1.0.1? http://apache.mirrors.lucidnetworks.net/hive/hive-1.0.1/ [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Laljo John Pullokkaran Fix For: 1.3.0, 1.2.1, 2.0.0, 1.0.2 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, HIVE-10841.2.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590280#comment-14590280 ] Alexander Pivovarov commented on HIVE-10841: Yes, 1.0.2 is better [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Laljo John Pullokkaran Fix For: 1.3.0, 1.2.1, 2.0.0, 1.0.2 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, HIVE-10841.2.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean)
[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10841: --- Fix Version/s: 2.0.0 1.3.0 [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Laljo John Pullokkaran Fix For: 1.3.0, 1.2.1, 2.0.0 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, HIVE-10841.2.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic
[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10841: --- Affects Version/s: 2.0.0 [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Laljo John Pullokkaran Fix For: 1.3.0, 1.2.1, 2.0.0 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, HIVE-10841.2.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589081#comment-14589081 ] Alexander Pivovarov commented on HIVE-10841: Currently the patch is committed to https://github.com/apache/hive/commits/branch-1 https://github.com/apache/hive/commits/branch-1.0 https://github.com/apache/hive/commits/branch-1.2 https://github.com/apache/hive/commits/master I updated Fix Version/s field accordingly [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Laljo John Pullokkaran Fix For: 1.3.0, 1.2.1, 2.0.0 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, HIVE-10841.2.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi
[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10841: --- Affects Version/s: (was: 2.0.0) [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Laljo John Pullokkaran Fix For: 1.3.0, 1.2.1, 2.0.0 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, HIVE-10841.2.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats:
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14583988#comment-14583988 ] Alexander Pivovarov commented on HIVE-10841: Laljo, could you cherry-pick the commit to branch-1 (hive-1.3.0)? https://github.com/apache/hive/commits/branch-1 [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Laljo John Pullokkaran Fix For: 1.2.1 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, HIVE-10841.2.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator
[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0
[ https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582813#comment-14582813 ] Alexander Pivovarov commented on HIVE-10704: +1 Errors in Tez HashTableLoader when estimated table size is 0 Key: HIVE-10704 URL: https://issues.apache.org/jira/browse/HIVE-10704 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jason Dere Assignee: Mostafa Mokhtar Fix For: 1.2.1 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, HIVE-10704.3.patch Couple of issues: - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all tables, the largest small table selection is wrong and could select the large table (which results in NPE) - The memory estimates can either divide-by-zero, or allocate 0 memory if the table size is 0. Try to come up with a sensible default for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581444#comment-14581444 ] Alexander Pivovarov commented on HIVE-10841: 5. Can you add RB link? [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Laljo John Pullokkaran Attachments: HIVE-10841.1.patch, HIVE-10841.2.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats:
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581443#comment-14581443 ] Alexander Pivovarov commented on HIVE-10841: 3. New integration test which reproduces the issue is also needed 4. Is it possible to add JUnit tests for the fixes? [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Laljo John Pullokkaran Attachments: HIVE-10841.1.patch, HIVE-10841.2.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean)
[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0
[ https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581360#comment-14581360 ] Alexander Pivovarov commented on HIVE-10704: I put my comment on RB on May 30, 2015 Errors in Tez HashTableLoader when estimated table size is 0 Key: HIVE-10704 URL: https://issues.apache.org/jira/browse/HIVE-10704 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jason Dere Assignee: Mostafa Mokhtar Fix For: 1.2.1 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, HIVE-10704.3.patch Couple of issues: - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all tables, the largest small table selection is wrong and could select the large table (which results in NPE) - The memory estimates can either divide-by-zero, or allocate 0 memory if the table size is 0. Try to come up with a sensible default for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0
[ https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581361#comment-14581361 ] Alexander Pivovarov commented on HIVE-10704: I put my comment on RB on May 30, 2015 Errors in Tez HashTableLoader when estimated table size is 0 Key: HIVE-10704 URL: https://issues.apache.org/jira/browse/HIVE-10704 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jason Dere Assignee: Mostafa Mokhtar Fix For: 1.2.1 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, HIVE-10704.3.patch Couple of issues: - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all tables, the largest small table selection is wrong and could select the large table (which results in NPE) - The memory estimates can either divide-by-zero, or allocate 0 memory if the table size is 0. Try to come up with a sensible default for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579530#comment-14579530 ] Alexander Pivovarov commented on HIVE-10841: Sure, np. Btw, my apache id is apivovarov [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Laljo John Pullokkaran Attachments: HIVE-10841.1.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578345#comment-14578345 ] Alexander Pivovarov commented on HIVE-10841: [~jpullokkaran], thank you for HIVE-10841.1.patch. It changes 2 files: - SemanticAnalyzer.java - OpProcFactory.java I tried the fix in SemanticAnalyzer.java only. It solve the issue with my test query. So, looks like the fix in SemanticAnalyzer.java is enough to solve the issue. Why do we need fixes in OpProcFactory.java? Should we open separate Jira for them? 2. Looks like we need to rerun bunch of Cli, Tez and Spark tests... [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-10841.1.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int)
[jira] [Updated] (HIVE-10748) Replace StringBuffer with StringBuilder where possible
[ https://issues.apache.org/jira/browse/HIVE-10748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10748: --- Fix Version/s: 2.0.0 Replace StringBuffer with StringBuilder where possible -- Key: HIVE-10748 URL: https://issues.apache.org/jira/browse/HIVE-10748 Project: Hive Issue Type: Improvement Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Fix For: 1.3.0, 2.0.0 Attachments: HIVE-10748.1.patch, HIVE-10748.1.patch, HIVE-10748.2.patch I found 40 places in Hive where new StringBuffer( is used. Where possible, it is recommended that StringBuilder be used in preference to StringBuffer as it will be faster under most implementations https://docs.oracle.com/javase/7/docs/api/java/lang/StringBuilder.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10427) collect_list() and collect_set() should accept struct types as argument
[ https://issues.apache.org/jira/browse/HIVE-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573363#comment-14573363 ] Alexander Pivovarov commented on HIVE-10427: +1 collect_list() and collect_set() should accept struct types as argument --- Key: HIVE-10427 URL: https://issues.apache.org/jira/browse/HIVE-10427 Project: Hive Issue Type: Wish Components: UDF Reporter: Alexander Behm Assignee: Chao Sun Attachments: HIVE-10427.1.patch, HIVE-10427.2.patch, HIVE-10427.3.patch, HIVE-10427.4.patch The collect_list() and collect_set() functions currently only accept scalar argument types. It would be very useful if these functions could also accept struct argument types for creating nested data from flat data. For example, suppose I wanted to create a nested customers/orders table from two flat tables, customers and orders. Then it'd be very convenient to write something like this: {code} insert into table nested_customers_orders select c.*, collect_list(named_struct(oid, o.oid, order_date: o.date...)) from customers c inner join orders o on (c.cid = o.oid) group by c.cid {code} Thanks you for your consideration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573342#comment-14573342 ] Alexander Pivovarov commented on HIVE-10841: Better to put code/sql/plan to \{code\}...\{code\} blocks. It will be easier to read [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0 Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571849#comment-14571849 ] Alexander Pivovarov commented on HIVE-10841: LOG info for the queries with different JOIN operators order L, LA, FR, A, PI, acct - only 2 log messages from ppd.OpProcFactory contain = 120 {code} explain select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn = 120; 15/06/03 16:31:47 [main]: INFO ppd.OpProcFactory: Processing for FIL(25) 15/06/03 16:31:47 [main]: INFO ppd.OpProcFactory: Pushdown Predicates of FIL For Alias : acct 15/06/03 16:31:47 [main]: INFO ppd.OpProcFactory: (_col20 = 120) 15/06/03 16:31:47 [main]: INFO ppd.OpProcFactory: Processing for JOIN(24) 15/06/03 16:31:47 [main]: INFO ppd.OpProcFactory: Pushdown Predicates of JOIN For Alias : acct 15/06/03 16:31:47 [main]: INFO ppd.OpProcFactory: (VALUE._col19 = 120) Stage: Stage-9 Map Reduce Select Operator expressions: _col19 (type: int), 120 (type: int) {code} L, LA, FR, A, acct, PI - 8 log lines from ppd.OpProcFactory contain = 120 {code} explain select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN acct ON A.id = acct.aid JOIN PI ON PI.id = LA.pi_id WHERE L.id = 4436 and acct.brn = 120; 15/06/03 15:45:25 [main]: INFO ppd.OpProcFactory: Processing for FIL(25) 15/06/03 15:45:39 [main]: INFO ppd.OpProcFactory: Pushdown Predicates of FIL For Alias : acct 15/06/03 15:45:39 [main]: INFO ppd.OpProcFactory: (_col20 = 120) 15/06/03 15:46:23 [main]: INFO ppd.OpProcFactory: Processing for JOIN(24) 15/06/03 15:46:23 [main]: INFO ppd.OpProcFactory: Pushdown Predicates of JOIN For Alias : acct 15/06/03 15:46:23 [main]: INFO ppd.OpProcFactory: (VALUE._col19 = 120) 15/06/03 15:46:26 [main]: INFO ppd.OpProcFactory: Processing for RS(21) 15/06/03 15:46:26 [main]: INFO ppd.OpProcFactory: Pushdown Predicates of RS For Alias : acct 15/06/03 15:46:26 [main]: INFO ppd.OpProcFactory: (_col20 = 120) 15/06/03 15:46:43 [main]: INFO ppd.OpProcFactory: Processing for FIL(20) 15/06/03 15:46:49 [main]: INFO ppd.OpProcFactory: Pushdown Predicates of FIL For Alias : acct 15/06/03 15:46:49 [main]: INFO ppd.OpProcFactory: (_col20 = 120) 15/06/03 15:46:52 [main]: INFO ppd.OpProcFactory: Processing for JOIN(19) 15/06/03 15:46:52 [main]: INFO ppd.OpProcFactory: Pushdown Predicates of JOIN For Alias : acct 15/06/03 15:46:52 [main]: INFO ppd.OpProcFactory: (VALUE._col1 = 120) 15/06/03 15:59:18 [main]: INFO ppd.OpProcFactory: Processing for RS(18) 15/06/03 15:59:18 [main]: INFO ppd.OpProcFactory: Pushdown Predicates of RS For Alias : acct 15/06/03 15:59:18 [main]: INFO ppd.OpProcFactory: (brn = 120) 15/06/03 15:59:19 [main]: INFO ppd.OpProcFactory: Processing for FIL(17) 15/06/03 15:59:50 [main]: INFO ppd.OpProcFactory: Pushdown Predicates of FIL For Alias : acct 15/06/03 15:59:50 [main]: INFO ppd.OpProcFactory: (brn = 120) 15/06/03 16:00:20 [main]: INFO ppd.OpProcFactory: Processing for TS(4) 15/06/03 16:00:20 [main]: INFO ppd.OpProcFactory: Pushdown Predicates of TS For Alias : acct 15/06/03 16:00:20 [main]: INFO ppd.OpProcFactory: aid is not null 15/06/03 16:00:20 [main]: INFO ppd.OpProcFactory: (brn = 120) 15/06/03 16:01:38 [main]: INFO optimizer.ConstantPropagateProcFactory: expr Const int 120 fold from Column[VALUE._col19] is removed. 15/06/03 16:01:38 [main]: INFO optimizer.ColumnPrunerProcFactory: RS 21 oldColExprMap: {VALUE._col5=Column[_col5], VALUE._col4=Const int 4436, VALUE._col3=Column[_col3], VALUE._col2=Column[_col2], VALUE._col1=Column[_col1], VALUE._col0=Const int 4436, KEY.reducesinkkey0=Column[_col6], VALUE._col14=Column[_col15], VALUE._col13=Column[_col14], VALUE._col16=Column[_col17], VALUE._col15=Column[_col16], VALUE._col18=Column[_col19], VALUE._col9=Const int 4436, VALUE._col17=Column[_col18], VALUE._col8=Column[_col9], VALUE._col7=Column[_col8], VALUE._col19=Const int 120, VALUE._col6=Column[_col7], VALUE._col20=Column[_col21], VALUE._col11=Column[_col12], VALUE._col21=Column[_col22], VALUE._col12=Column[_col13], VALUE._col22=Column[_col23], VALUE._col10=Column[_col11]} 15/06/03 16:01:38 [main]: INFO optimizer.ColumnPrunerProcFactory: RS 18 oldColExprMap: {VALUE._col4=Column[ROW__ID], VALUE._col3=Column[INPUT__FILE__NAME], VALUE._col2=Column[BLOCK__OFFSET__INSIDE__FILE], VALUE._col1=Const int 120, VALUE._col0=Column[acc_n], KEY.reducesinkkey0=Column[aid]} STAGE PLANS: Stage: Stage-12 acct TableScan alias: acct Statistics: Num rows: 5 Data size: 63 Basic stats: COMPLETE Column stats: NONE Filter
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570374#comment-14570374 ] Alexander Pivovarov commented on HIVE-10841: Thank you Laljo John for looking at the issue. I tried the patch #1. Yes, it makes output to contain just 1 row. Probably the filter is added to Reduce step. BUT, IMHO the patch #1 does not fix the root cause of the problem. I still do not see Filter Operator predicate (brn is not null) for acct table on Map step. {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} {code} acct TableScan alias: acct Statistics: Num rows: 5 Data size: 63 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 3 Data size: 37 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) {code} Hovewer, if you change JOIN order and put acct before PI then the plan will be correct - Map Local Operator Tree for acct table will have Filter Operator predicate (brn is not null) {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN acct ON A.id = acct.aid JOIN PI ON PI.id = LA.pi_id WHERE L.id = 4436 and acct.brn is not null; {code} {code} TableScan alias: acct Statistics: Num rows: 5 Data size: 63 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (aid is not null and brn is not null) (type: boolean) Statistics: Num rows: 2 Data size: 25 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) {code} [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0 Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0
[jira] [Updated] (HIVE-10748) Replace StringBuffer with StringBuilder where possible
[ https://issues.apache.org/jira/browse/HIVE-10748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10748: --- Attachment: HIVE-10748.2.patch patch #2: - rebased to the latest Replace StringBuffer with StringBuilder where possible -- Key: HIVE-10748 URL: https://issues.apache.org/jira/browse/HIVE-10748 Project: Hive Issue Type: Improvement Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: HIVE-10748.1.patch, HIVE-10748.1.patch, HIVE-10748.2.patch I found 40 places in Hive where new StringBuffer( is used. Where possible, it is recommended that StringBuilder be used in preference to StringBuffer as it will be faster under most implementations https://docs.oracle.com/javase/7/docs/api/java/lang/StringBuilder.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568044#comment-14568044 ] Alexander Pivovarov commented on HIVE-10841: Changing the order of JOIN operators fixes the plan. Filter Operator predicates are added to Map step for acct table. [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0 Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats:
[jira] [Commented] (HIVE-10748) Replace StringBuffer with StringBuilder where possible
[ https://issues.apache.org/jira/browse/HIVE-10748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568290#comment-14568290 ] Alexander Pivovarov commented on HIVE-10748: Build 4130 has 3 failed tests. I think errors are unrelated to the patch #2. I checked prev build 4129 - it also has the same 3 failed tests. So, everything looks good! Replace StringBuffer with StringBuilder where possible -- Key: HIVE-10748 URL: https://issues.apache.org/jira/browse/HIVE-10748 Project: Hive Issue Type: Improvement Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: HIVE-10748.1.patch, HIVE-10748.1.patch, HIVE-10748.2.patch I found 40 places in Hive where new StringBuffer( is used. Where possible, it is recommended that StringBuilder be used in preference to StringBuffer as it will be faster under most implementations https://docs.oracle.com/javase/7/docs/api/java/lang/StringBuilder.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov reassigned HIVE-10841: -- Assignee: Alexander Pivovarov [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0 Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col6 (type: int) 1 id
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566239#comment-14566239 ] Alexander Pivovarov commented on HIVE-10841: disabling hive.ppd.remove.duplicatefilters helps {code} set hive.ppd.remove.duplicatefilters=false; {code} [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0 Reporter: Alexander Pivovarov The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator
[jira] [Commented] (HIVE-686) add UDF substring_index
[ https://issues.apache.org/jira/browse/HIVE-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566375#comment-14566375 ] Alexander Pivovarov commented on HIVE-686: -- Added the function description to the Language Manual https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions add UDF substring_index --- Key: HIVE-686 URL: https://issues.apache.org/jira/browse/HIVE-686 Project: Hive Issue Type: New Feature Components: UDF Reporter: Namit Jain Assignee: Alexander Pivovarov Fix For: 1.3.0 Attachments: HIVE-686.1.patch, HIVE-686.1.patch, HIVE-686.patch, HIVE-686.patch SUBSTRING_INDEX(str,delim,count) Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. SUBSTRING_INDEX() performs a case-sensitive match when searching for delim. Examples: {code} SELECT SUBSTRING_INDEX('www.mysql.com', '.', 3); --www.mysql.com SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2); --www.mysql SELECT SUBSTRING_INDEX('www.mysql.com', '.', 1); --www SELECT SUBSTRING_INDEX('www.mysql.com', '.', 0); --'' SELECT SUBSTRING_INDEX('www.mysql.com', '.', -1); --com SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2); --mysql.com SELECT SUBSTRING_INDEX('www.mysql.com', '.', -3); --www.mysql.com {code} {code} --#delim does not exist in str SELECT SUBSTRING_INDEX('www.mysql.com', 'Q', 1); --www.mysql.com --#delim is 2 chars SELECT SUBSTRING_INDEX('www||mysql||com', '||', 2); --www||mysql --#delim is empty string SELECT SUBSTRING_INDEX('www.mysql.com', '', 2); --'' --#str is empty string SELECT SUBSTRING_INDEX('', '.', 2); --'' {code} {code} --#null params SELECT SUBSTRING_INDEX(null, '.', 1); --null SELECT SUBSTRING_INDEX('www.mysql.com', null, 1); --null SELECT SUBSTRING_INDEX('www.mysql.com', '.', null); --null {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564297#comment-14564297 ] Alexander Pivovarov commented on HIVE-10841: if I set hive.optimize.ppd=false then query returns 1 row BUT the plan does not have brn is not null if I set hive.optimize.ppd=true and change JOIN statements order to (A,acct,PI) then query returns 1 row AND the plan HAS brn is not null. {code} set hive.optimize.ppd=true; select acct.ACC_N, acct.brn FROM LA JOIN A ON LA.aid = A.id JOIN acct ON LA.aid = acct.aid JOIN PI ON PI.id = LA.pi_id WHERE LA.loan_id = 4436 and acct.brn is not null; OK 10 122 {code} [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0 Reporter: Alexander Pivovarov The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int)
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564273#comment-14564273 ] Alexander Pivovarov commented on HIVE-10841: L tables also can be removed from the query - it does not affect the result. But I found that order of JOIN statements matters. I tries the following combinations of the JOIN statements (PI,acct,A) (PI,A,acct) (A,PI,acct) (acct,PI,A) (A,act,PI) (act,A,PI) 3 rows are returned only for (A,PI,acct) combination FROM LA JOIN A JOIN PI JOIN acct {code} select acct.ACC_N, acct.brn FROM LA JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON LA.aid = acct.aid WHERE LA.loan_id = 4436 and acct.brn is not null; OK 10 122 NULLNULL NULLNULL {code} [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0 Reporter: Alexander Pivovarov The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0
[jira] [Updated] (HIVE-686) add UDF substring_index
[ https://issues.apache.org/jira/browse/HIVE-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-686: - Attachment: HIVE-686.1.patch reattaching path #1 to rerun tests add UDF substring_index --- Key: HIVE-686 URL: https://issues.apache.org/jira/browse/HIVE-686 Project: Hive Issue Type: New Feature Components: UDF Reporter: Namit Jain Assignee: Alexander Pivovarov Attachments: HIVE-686.1.patch, HIVE-686.1.patch, HIVE-686.patch, HIVE-686.patch SUBSTRING_INDEX(str,delim,count) Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. SUBSTRING_INDEX() performs a case-sensitive match when searching for delim. Examples: {code} SELECT SUBSTRING_INDEX('www.mysql.com', '.', 3); --www.mysql.com SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2); --www.mysql SELECT SUBSTRING_INDEX('www.mysql.com', '.', 1); --www SELECT SUBSTRING_INDEX('www.mysql.com', '.', 0); --'' SELECT SUBSTRING_INDEX('www.mysql.com', '.', -1); --com SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2); --mysql.com SELECT SUBSTRING_INDEX('www.mysql.com', '.', -3); --www.mysql.com {code} {code} --#delim does not exist in str SELECT SUBSTRING_INDEX('www.mysql.com', 'Q', 1); --www.mysql.com --#delim is 2 chars SELECT SUBSTRING_INDEX('www||mysql||com', '||', 2); --www||mysql --#delim is empty string SELECT SUBSTRING_INDEX('www.mysql.com', '', 2); --'' --#str is empty string SELECT SUBSTRING_INDEX('', '.', 2); --'' {code} {code} --#null params SELECT SUBSTRING_INDEX(null, '.', 1); --null SELECT SUBSTRING_INDEX('www.mysql.com', null, 1); --null SELECT SUBSTRING_INDEX('www.mysql.com', '.', null); --null {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565502#comment-14565502 ] Alexander Pivovarov commented on HIVE-10841: I tested one of my queries with set hive.optimize.ppd=false; Disabling ppd kills the performance. 9min vs 60 min. [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0 Reporter: Alexander Pivovarov The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator
[jira] [Commented] (HIVE-10788) Change sort_array to support non-primitive types
[ https://issues.apache.org/jira/browse/HIVE-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565780#comment-14565780 ] Alexander Pivovarov commented on HIVE-10788: +1 Change sort_array to support non-primitive types Key: HIVE-10788 URL: https://issues.apache.org/jira/browse/HIVE-10788 Project: Hive Issue Type: Bug Components: UDF Reporter: Chao Sun Assignee: Chao Sun Attachments: HIVE-10788.1.patch, HIVE-10788.2.patch, HIVE-10788.3.patch Currently {{sort_array}} only support primitive types. As we already support comparison between non-primitive types, it makes sense to remove this restriction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10845) TezJobMonitor uses killedTaskCount instead of killedTaskAttemptCount
[ https://issues.apache.org/jira/browse/HIVE-10845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563339#comment-14563339 ] Alexander Pivovarov commented on HIVE-10845: Siddharth, do you want to change variable name killedTasks to killedTasksAttempts, to make it similar to failedTaskAttempts (1 lines above) TezJobMonitor uses killedTaskCount instead of killedTaskAttemptCount Key: HIVE-10845 URL: https://issues.apache.org/jira/browse/HIVE-10845 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: 1.2.1 Attachments: HIVE-10845.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10846) LLAP: preemption in AM due to failures / out of order scheduling
[ https://issues.apache.org/jira/browse/HIVE-10846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563345#comment-14563345 ] Alexander Pivovarov commented on HIVE-10846: what branch? LLAP: preemption in AM due to failures / out of order scheduling Key: HIVE-10846 URL: https://issues.apache.org/jira/browse/HIVE-10846 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: llap Attachments: HIVE-10846.1.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10841: --- Description: The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10 122 NULLNULL NULLNULL {code} but it should be 1 row {code} 10 122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col6 (type: int) 1 id (type: int) Stage: Stage-9 Map Reduce Map Operator Tree: TableScan alias: la Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (((loan_id is not null and aid is not null) and pi_id is not null) and (loan_id = 4436)) (type: boolean) Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 Inner Join 0 to 2 keys: 0 4436 (type: int)
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564229#comment-14564229 ] Alexander Pivovarov commented on HIVE-10841: if we change is not null to is null or to = 122 then all 3 rows will be NULL or 122 for column brn (second column). and acct.brn is null; {code} 10 NULL NULLNULL NULLNULL {code} and acct.brn = 122; {code} 10 122 NULL122 NULL122 {code} and acct.brn is null; {code} 10 122 NULLNULL NULLNULL {code} [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0 Reporter: Alexander Pivovarov The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats:
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564213#comment-14564213 ] Alexander Pivovarov commented on HIVE-10841: if we look at hive-0.12 plan then we can see that it has brn is not null predicate in Filter Operator {code} acct TableScan alias: acct Filter Operator predicate: expr: brn is not null type: boolean {code} But in hive-1.3.0 plan I do not see brn at all. It only has predicate: aid is not null for acct table. Does it mean that hive-1.3.0 plan is wrong? I checked ppd folder diff btw 0.12.0 and 0.13.0 It was two fixes HIVE-4293 : Predicates following UDTF operator are removed by PPD HIVE-5411 : Migrate expression serialization to Kryo [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0 Reporter: Alexander Pivovarov The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink
[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10841: --- Component/s: Query Planning [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0 Reporter: Alexander Pivovarov The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col6 (type: int) 1 id (type: int) Stage: Stage-9 Map Reduce
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564215#comment-14564215 ] Alexander Pivovarov commented on HIVE-10841: hive-0.12.0 plan {code} ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_JOIN (TOK_JOIN (TOK_JOIN (TOK_TABREF (TOK_TABNAME L)) (TOK_TABREF (TOK_TABNAME LA)) (= (. (TOK_TABLE_OR_COL L) id) (. (TOK_TABLE_OR_COL LA) loan_id))) (TOK_TABREF (TOK_TABNAME FR)) (= (. (TOK_TABLE_OR_COL L) id) (. (TOK_TABLE_OR_COL FR) loan_id))) (TOK_TABREF (TOK_TABNAME A)) (= (. (TOK_TABLE_OR_COL LA) aid) (. (TOK_TABLE_OR_COL A) id))) (TOK_TABREF (TOK_TABNAME PI)) (= (. (TOK_TABLE_OR_COL PI) id) (. (TOK_TABLE_OR_COL LA) pi_id))) (TOK_TABREF (TOK_TABNAME acct)) (= (. (TOK_TABLE_OR_COL A) id) (. (TOK_TABLE_OR_COL acct) aid (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL acct) ACC_N)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL acct) brn))) (TOK_WHERE (and (= (. (TOK_TABLE_OR_COL L) id) 4436) (TOK_FUNCTION TOK_ISNOTNULL (. (TOK_TABLE_OR_COL acct) brn)) STAGE DEPENDENCIES: Stage-11 is a root stage Stage-8 depends on stages: Stage-11 Stage-0 is a root stage STAGE PLANS: Stage: Stage-11 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a HashTable Sink Operator condition expressions: 0 {_col5} 1 2 {acc_n} {brn} handleSkewJoin: false keys: 0 [Column[_col4]] 1 [Column[id]] 2 [Column[aid]] Position of Big Table: 0 acct TableScan alias: acct Filter Operator predicate: expr: brn is not null type: boolean HashTable Sink Operator condition expressions: 0 {_col5} 1 2 {acc_n} {brn} handleSkewJoin: false keys: 0 [Column[_col4]] 1 [Column[id]] 2 [Column[aid]] Position of Big Table: 0 fr TableScan alias: fr Filter Operator predicate: expr: (loan_id = 4436) type: boolean HashTable Sink Operator condition expressions: 0 1 {aid} {pi_id} 2 handleSkewJoin: false keys: 0 [Column[id]] 1 [Column[loan_id]] 2 [Column[loan_id]] Position of Big Table: 1 l TableScan alias: l Filter Operator predicate: expr: (id = 4436) type: boolean HashTable Sink Operator condition expressions: 0 1 {aid} {pi_id} 2 handleSkewJoin: false keys: 0 [Column[id]] 1 [Column[loan_id]] 2 [Column[loan_id]] Position of Big Table: 1 pi TableScan alias: pi HashTable Sink Operator condition expressions: 0 {_col15} {_col16} 1 handleSkewJoin: false keys: 0 [Column[_col2]] 1 [Column[id]] Position of Big Table: 0 Stage: Stage-8 Map Reduce Alias - Map Operator Tree: la TableScan alias: la Filter Operator predicate: expr: (loan_id = 4436) type: boolean Map Join Operator condition map: Inner Join 0 to 1 Inner Join 0 to 2 condition expressions: 0 1 {aid} {pi_id} 2 handleSkewJoin: false keys: 0 [Column[id]] 1 [Column[loan_id]] 2 [Column[loan_id]] outputColumnNames: _col4, _col5 Position of Big Table: 1 Map Join Operator condition map: Inner Join 0 to 1 Inner Join 1 to 2
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564243#comment-14564243 ] Alexander Pivovarov commented on HIVE-10841: I found that JOIN FR can be removed - the result still will be 3 rows But adding or removing JOIN PI changes Filter Operator predicate for acct table if we remove JOIN PI then acct table Filter Operator predicate has brn is not null and query returns 1 row {code} acct TableScan alias: acct Statistics: Num rows: 5 Data size: 63 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (aid is not null and brn is not null) (type: boolean) {code} How it can be possible that removing JOIN PI changes Filter Operator predicate for acct table? The query below returns 1 row. Query plan has brn is not null predicate in Filter Operator for acct table. But if we remove comment before JOIN PI then query plan will not have brn is not null predicate. {code} explain select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN A ON LA.aid = A.id --JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0 Reporter: Alexander Pivovarov The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int)
[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0
[ https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561397#comment-14561397 ] Alexander Pivovarov commented on HIVE-10704: What is the negative side of dividing memory equally? If we have 3 tables (1KB, 0B, 2MB) how we should divided the memory? (MapInteger, Long tableMemorySizes). Is dividing memory equally the best solution? Maybe we should treat 0B table as 1B table and give it minimum amount of the memory. Errors in Tez HashTableLoader when estimated table size is 0 Key: HIVE-10704 URL: https://issues.apache.org/jira/browse/HIVE-10704 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jason Dere Assignee: Mostafa Mokhtar Fix For: 1.2.1 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, HIVE-10704.3.patch Couple of issues: - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all tables, the largest small table selection is wrong and could select the large table (which results in NPE) - The memory estimates can either divide-by-zero, or allocate 0 memory if the table size is 0. Try to come up with a sensible default for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-686) add UDF substring_index
[ https://issues.apache.org/jira/browse/HIVE-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561282#comment-14561282 ] Alexander Pivovarov commented on HIVE-686: -- prev builds - 4057, 4058 have 55 failed tests as well add UDF substring_index --- Key: HIVE-686 URL: https://issues.apache.org/jira/browse/HIVE-686 Project: Hive Issue Type: New Feature Components: UDF Reporter: Namit Jain Assignee: Alexander Pivovarov Attachments: HIVE-686.1.patch, HIVE-686.patch, HIVE-686.patch SUBSTRING_INDEX(str,delim,count) Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. SUBSTRING_INDEX() performs a case-sensitive match when searching for delim. Examples: {code} SELECT SUBSTRING_INDEX('www.mysql.com', '.', 3); --www.mysql.com SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2); --www.mysql SELECT SUBSTRING_INDEX('www.mysql.com', '.', 1); --www SELECT SUBSTRING_INDEX('www.mysql.com', '.', 0); --'' SELECT SUBSTRING_INDEX('www.mysql.com', '.', -1); --com SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2); --mysql.com SELECT SUBSTRING_INDEX('www.mysql.com', '.', -3); --www.mysql.com {code} {code} --#delim does not exist in str SELECT SUBSTRING_INDEX('www.mysql.com', 'Q', 1); --www.mysql.com --#delim is 2 chars SELECT SUBSTRING_INDEX('www||mysql||com', '||', 2); --www||mysql --#delim is empty string SELECT SUBSTRING_INDEX('www.mysql.com', '', 2); --'' --#str is empty string SELECT SUBSTRING_INDEX('', '.', 2); --'' {code} {code} --#null params SELECT SUBSTRING_INDEX(null, '.', 1); --null SELECT SUBSTRING_INDEX('www.mysql.com', null, 1); --null SELECT SUBSTRING_INDEX('www.mysql.com', '.', null); --null {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work for large queries
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10841: --- Description: The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10 122 NULLNULL NULLNULL {code} but it should be 1 row {code} 10 122 {code} 3. workaround is to put acct.brn is not null to join condition {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid and acct.brn is not null WHERE L.id = 4436; {code} I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations have the issue was: The result from the following SELCT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10 122 NULLNULL NULLNULL {code} but it should be 1 row {code} 10 122 {code} 3. workaround is to put acct.brn is not null to join condition {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid and acct.brn is not null WHERE L.id = 4436; {code} [WHERE col is not null] does not work for large queries --- Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Alexander Pivovarov The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 3. workaround is to put acct.brn is not null to join condition {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid and acct.brn is not null WHERE L.id = 4436; {code} I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations have the issue -- This message was
[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work for large queries
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10841: --- Description: The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10 122 NULLNULL NULLNULL {code} but it should be 1 row {code} 10 122 {code} 3. The workaround is to put acct.brn is not null to join condition {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid and acct.brn is not null WHERE L.id = 4436; OK 10 122 {code} I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations have the issue was: The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10 122 NULLNULL NULLNULL {code} but it should be 1 row {code} 10 122 {code} 3. workaround is to put acct.brn is not null to join condition {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid and acct.brn is not null WHERE L.id = 4436; {code} I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations have the issue [WHERE col is not null] does not work for large queries --- Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Alexander Pivovarov The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 3. The workaround is to put acct.brn is not null to join condition {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid and acct.brn is not null WHERE L.id = 4436; OK 10
[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10841: --- Summary: [WHERE col is not null] does not work sometimes for queries with many JOIN statements (was: [WHERE col is not null] does not work for large queries) [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Alexander Pivovarov The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 3. The workaround is to put acct.brn is not null to join condition {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid and acct.brn is not null WHERE L.id = 4436; OK 10122 Time taken: 23.479 seconds, Fetched: 1 row(s) {code} I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations have the issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work for large queries
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10841: --- Description: The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10 122 NULLNULL NULLNULL {code} but it should be 1 row {code} 10 122 {code} 3. The workaround is to put acct.brn is not null to join condition {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid and acct.brn is not null WHERE L.id = 4436; OK 10 122 Time taken: 23.479 seconds, Fetched: 1 row(s) {code} I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations have the issue was: The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10 122 NULLNULL NULLNULL {code} but it should be 1 row {code} 10 122 {code} 3. The workaround is to put acct.brn is not null to join condition {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid and acct.brn is not null WHERE L.id = 4436; OK 10 122 {code} I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations have the issue [WHERE col is not null] does not work for large queries --- Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Alexander Pivovarov The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 3. The workaround is to put acct.brn is not null to join condition {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id =
[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0
[ https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560431#comment-14560431 ] Alexander Pivovarov commented on HIVE-10704: Mostafa, can you check RB link? I'm not sure it shows HIVE-10704.3.patch Errors in Tez HashTableLoader when estimated table size is 0 Key: HIVE-10704 URL: https://issues.apache.org/jira/browse/HIVE-10704 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jason Dere Assignee: Mostafa Mokhtar Fix For: 1.2.1 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, HIVE-10704.3.patch Couple of issues: - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all tables, the largest small table selection is wrong and could select the large table (which results in NPE) - The memory estimates can either divide-by-zero, or allocate 0 memory if the table size is 0. Try to come up with a sensible default for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-686) add UDF substring_index
[ https://issues.apache.org/jira/browse/HIVE-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-686: - Attachment: HIVE-686.1.patch patch #1 - derive substring_index from GenericUDF - add Junit and qtest tests add UDF substring_index --- Key: HIVE-686 URL: https://issues.apache.org/jira/browse/HIVE-686 Project: Hive Issue Type: New Feature Components: UDF Reporter: Namit Jain Assignee: Alexander Pivovarov Attachments: HIVE-686.1.patch, HIVE-686.patch, HIVE-686.patch SUBSTRING_INDEX(str,delim,count) Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. SUBSTRING_INDEX() performs a case-sensitive match when searching for delim. Examples: {code} SELECT SUBSTRING_INDEX('www.mysql.com', '.', 3); --www.mysql.com SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2); --www.mysql SELECT SUBSTRING_INDEX('www.mysql.com', '.', 1); --www SELECT SUBSTRING_INDEX('www.mysql.com', '.', 0); --'' SELECT SUBSTRING_INDEX('www.mysql.com', '.', -1); --com SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2); --mysql.com SELECT SUBSTRING_INDEX('www.mysql.com', '.', -3); --www.mysql.com {code} {code} --#delim does not exist in str SELECT SUBSTRING_INDEX('www.mysql.com', 'Q', 1); --www.mysql.com --#delim is 2 chars SELECT SUBSTRING_INDEX('www||mysql||com', '||', 2); --www||mysql --#delim is empty string SELECT SUBSTRING_INDEX('www.mysql.com', '', 2); --'' --#str is empty string SELECT SUBSTRING_INDEX('', '.', 2); --'' {code} {code} --#null params SELECT SUBSTRING_INDEX(null, '.', 1); --null SELECT SUBSTRING_INDEX('www.mysql.com', null, 1); --null SELECT SUBSTRING_INDEX('www.mysql.com', '.', null); --null {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0
[ https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560432#comment-14560432 ] Alexander Pivovarov commented on HIVE-10704: Mostafa, can you check RB link? I'm not sure it shows HIVE-10704.3.patch Errors in Tez HashTableLoader when estimated table size is 0 Key: HIVE-10704 URL: https://issues.apache.org/jira/browse/HIVE-10704 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jason Dere Assignee: Mostafa Mokhtar Fix For: 1.2.1 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, HIVE-10704.3.patch Couple of issues: - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all tables, the largest small table selection is wrong and could select the large table (which results in NPE) - The memory estimates can either divide-by-zero, or allocate 0 memory if the table size is 0. Try to come up with a sensible default for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)