[jira] [Commented] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly
[ https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525708#comment-14525708 ] Hive QA commented on HIVE-10565: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12729872/HIVE-10565.03.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3703/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3703/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3703/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-3703/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-git-master-source ]] + [[ ! -d apache-git-master-source/.git ]] + [[ ! -d apache-git-master-source ]] + cd apache-git-master-source + git fetch origin error: while accessing https://git-wip-us.apache.org/repos/asf/hive.git/info/refs fatal: HTTP request failed + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12729872 - PreCommit-HIVE-TRUNK-Build LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly Key: HIVE-10565 URL: https://issues.apache.org/jira/browse/HIVE-10565 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 1.2.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 1.2.0, 1.3.0 Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, HIVE-10565.03.patch Filtering can knock out some of the rows for a repeated key, but those knocked out rows need to be included in the LEFT OUTER JOIN result and are currently not when only some rows are filtered out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10339) Allow JDBC Driver to pass HTTP header Key/Value pairs
[ https://issues.apache.org/jira/browse/HIVE-10339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525718#comment-14525718 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-10339: -- [~leftylev] Thanks for the review, the changes look good. I have removed TODOC1.2 label. Thanks Hari Allow JDBC Driver to pass HTTP header Key/Value pairs - Key: HIVE-10339 URL: https://issues.apache.org/jira/browse/HIVE-10339 Project: Hive Issue Type: Improvement Components: Beeline Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Fix For: 1.2.0 Attachments: HIVE-10339.1.patch, HIVE-10339.2.patch Currently Beeline ODBC driver does not support carrying user specified HTTP header. The beeline JDBC driver in HTTP mode connection string is as jdbc:hive2://host:port/db?hive.server2.transport.mode=http;hive.server2.thrift.http.path=http_endpoint, When transport mode is http Beeline/ODBC driver should allow end user to send arbitrary HTTP Header name value pair. All the beeline driver needs to do is to use the user specified name values and call the underlying HTTPClient API to set the header. E.g the Beeline connection string could be jdbc:hive2://host:port/db?hive.server2.transport.mode=http;hive.server2.thrift.http.path=http_endpoint,http.header.name1=value1, And the beeline will call underlying to set HTTP header to name1 and value1 This is required for the end user to send identity in a HTTP header down to Knox via beeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10339) Allow JDBC Driver to pass HTTP header Key/Value pairs
[ https://issues.apache.org/jira/browse/HIVE-10339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-10339: - Labels: (was: TODOC1.2) Allow JDBC Driver to pass HTTP header Key/Value pairs - Key: HIVE-10339 URL: https://issues.apache.org/jira/browse/HIVE-10339 Project: Hive Issue Type: Improvement Components: Beeline Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Fix For: 1.2.0 Attachments: HIVE-10339.1.patch, HIVE-10339.2.patch Currently Beeline ODBC driver does not support carrying user specified HTTP header. The beeline JDBC driver in HTTP mode connection string is as jdbc:hive2://host:port/db?hive.server2.transport.mode=http;hive.server2.thrift.http.path=http_endpoint, When transport mode is http Beeline/ODBC driver should allow end user to send arbitrary HTTP Header name value pair. All the beeline driver needs to do is to use the user specified name values and call the underlying HTTPClient API to set the header. E.g the Beeline connection string could be jdbc:hive2://host:port/db?hive.server2.transport.mode=http;hive.server2.thrift.http.path=http_endpoint,http.header.name1=value1, And the beeline will call underlying to set HTTP header to name1 and value1 This is required for the end user to send identity in a HTTP header down to Knox via beeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10579) Fix -Phadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525720#comment-14525720 ] Hive QA commented on HIVE-10579: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12729914/HIVE-10579.1.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3704/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3704/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3704/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-3704/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-git-master-source ]] + [[ ! -d apache-git-master-source/.git ]] + [[ ! -d apache-git-master-source ]] + cd apache-git-master-source + git fetch origin error: while accessing https://git-wip-us.apache.org/repos/asf/hive.git/info/refs fatal: HTTP request failed + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12729914 - PreCommit-HIVE-TRUNK-Build Fix -Phadoop-1 build Key: HIVE-10579 URL: https://issues.apache.org/jira/browse/HIVE-10579 Project: Hive Issue Type: Bug Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-10579.1.patch, HIVE-10579.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9908) vectorization error binary type not supported, group by with binary columns
[ https://issues.apache.org/jira/browse/HIVE-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-9908: --- Attachment: HIVE-9908.02.patch Rebase and resubmit. vectorization error binary type not supported, group by with binary columns --- Key: HIVE-9908 URL: https://issues.apache.org/jira/browse/HIVE-9908 Project: Hive Issue Type: Bug Reporter: Priyesh Raj Assignee: Matt McCline Attachments: HIVE-9908.01.patch, HIVE-9908.02.patch I am observing run time exception with binary data, when vectorization is enabled and binary data is observed in Group By clause. The exception is unsupported type: binary As per document, exception should not come. Rather normal way of execution should continue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10584) fix Integer references comparison in hcat OutputJobInfo (122)
[ https://issues.apache.org/jira/browse/HIVE-10584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10584: --- Attachment: rb33791.patch patch #1 fix Integer references comparison in hcat OutputJobInfo (122) - Key: HIVE-10584 URL: https://issues.apache.org/jira/browse/HIVE-10584 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: rb33791.patch the code below compares Integer references instead of intValues {code} public int compare(Integer earlier, Integer later) { return (earlier later) ? -1 : ((earlier == later) ? 0 : 1); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10579) Fix -Phadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-10579: - Attachment: HIVE-10579.1.patch Fix -Phadoop-1 build Key: HIVE-10579 URL: https://issues.apache.org/jira/browse/HIVE-10579 Project: Hive Issue Type: Bug Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-10579.1.patch, HIVE-10579.1.patch, HIVE-10579.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10581) fix comparison of String objects using == in Hive line 785
[ https://issues.apache.org/jira/browse/HIVE-10581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525735#comment-14525735 ] Prasanth Jayachandran commented on HIVE-10581: -- +1 fix comparison of String objects using == in Hive line 785 -- Key: HIVE-10581 URL: https://issues.apache.org/jira/browse/HIVE-10581 Project: Hive Issue Type: Bug Components: Metastore Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: rb33789.patch ...ql.metadata.Hive line 785 {code} baseTbl.getTableType() == TableType.VIRTUAL_VIEW.toString() {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10584) fix Integer references comparison in hcat OutputJobInfo (122)
[ https://issues.apache.org/jira/browse/HIVE-10584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525734#comment-14525734 ] Prasanth Jayachandran commented on HIVE-10584: -- +1 fix Integer references comparison in hcat OutputJobInfo (122) - Key: HIVE-10584 URL: https://issues.apache.org/jira/browse/HIVE-10584 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: rb33791.patch the code below compares Integer references instead of intValues {code} public int compare(Integer earlier, Integer later) { return (earlier later) ? -1 : ((earlier == later) ? 0 : 1); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10528) Hiveserver2 in HTTP mode is not applying auth_to_local rules
[ https://issues.apache.org/jira/browse/HIVE-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdelrahman Shettia updated HIVE-10528: --- Attachment: HIVE-10528.1.patch Hiveserver2 in HTTP mode is not applying auth_to_local rules Key: HIVE-10528 URL: https://issues.apache.org/jira/browse/HIVE-10528 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.14.0 Environment: Centos 6 Reporter: Abdelrahman Shettia Assignee: Abdelrahman Shettia Attachments: HIVE-10528.1.patch PROBLEM: Authenticating to HS2 in HTTP mode with Kerberos, auth_to_local mappings do not get applied. Because of this various permissions checks which rely on the local cluster name for a user are going to fail. STEPS TO REPRODUCE: 1. Create kerberos cluster and HS2 in HTTP mode 2. Create a new user, test, along with a kerberos principal for this user 3. Create a separate principal, mapped-test 4. Create an auth_to_local rule to make sure that mapped-test is mapped to test 5. As the test user, connect to HS2 with beeline and create a simple table: {code} CREATE TABLE permtest (field1 int); {code} There is no need to load anything into this table. 6. Establish that it works as the test user: {code} show create table permtest; {code} 7. Drop the test identity and become mapped-test 8. Re-connect to HS2 with beeline, re-run the above command: {code} show create table permtest; {code} You will find that when this is done in HTTP mode, you will get an HDFS error (because of StorageBasedAuthorization doing a HDFS permissions check) and the user will be mapped-test and NOT test as it should be. ANALYSIS: This appears to be HTTP specific and the problem seems to come in {{ThriftHttpServlet$HttpKerberosServerAction.getPrincipalWithoutRealmAndHost()}}: {code} try { fullKerberosName = ShimLoader.getHadoopShims().getKerberosNameShim(fullPrincipal); } catch (IOException e) { throw new HttpAuthenticationException(e); } return fullKerberosName.getServiceName(); {code} getServiceName applies no auth_to_local rules. Seems like maybe this should be getShortName()? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers
[ https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526023#comment-14526023 ] Selina Zhang commented on HIVE-10036: - [~gopalv] Thank you! I included the io.netty to ql/pom.xml and uploaded a new patch. Writing ORC format big table causes OOM - too many fixed sized stream buffers - Key: HIVE-10036 URL: https://issues.apache.org/jira/browse/HIVE-10036 Project: Hive Issue Type: Improvement Reporter: Selina Zhang Assignee: Selina Zhang Labels: orcfile Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch, HIVE-10036.7.patch ORC writer keeps multiple out steams for each column. Each output stream is allocated fixed size ByteBuffer (configurable, default to 256K). For a big table, the memory cost is unbearable. Specially when HCatalog dynamic partition involves, several hundreds files may be open and writing at the same time (same problems for FileSinkOperator). Global ORC memory manager controls the buffer size, but it only got kicked in at 5000 rows interval. An enhancement could be done here, but the problem is reducing the buffer size introduces worse compression and more IOs in read path. Sacrificing the read performance is always not a good choice. I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound to the existing configurable buffer size. Most of the streams does not need large buffer so the performance got improved significantly. Comparing to Facebook's hive-dwrf, I monitored 2x performance gain with this fix. Solving OOM for ORC completely maybe needs lots of effort , but this is definitely a low hanging fruit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10541) Beeline requires newline at the end of each query in a file
[ https://issues.apache.org/jira/browse/HIVE-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526114#comment-14526114 ] Chaoyu Tang commented on HIVE-10541: Thanks [~thejas] for review and [~szehon] for committing the patch. Beeline requires newline at the end of each query in a file --- Key: HIVE-10541 URL: https://issues.apache.org/jira/browse/HIVE-10541 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 0.13.1 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Fix For: 1.3.0 Attachments: HIVE-10541.1.patch, HIVE-10541.patch Beeline requires newline at the end of each query in a file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10589) Thread.wait not in loop in HWISessionItem
[ https://issues.apache.org/jira/browse/HIVE-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10589: --- Attachment: rb33797.patch patch #1 Thread.wait not in loop in HWISessionItem - Key: HIVE-10589 URL: https://issues.apache.org/jira/browse/HIVE-10589 Project: Hive Issue Type: Improvement Components: Web UI Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: rb33797.patch Usually in multi-threading programming Thread.wait() should be in while loop. So, if statement below should be replaced with while. HWISessionItem (121-128) {code} synchronized (runnable) { if (status != WebSessionItemStatus.READY) { try { runnable.wait(); } catch (Exception ex) { } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10528) Hiveserver2 in HTTP mode is not applying auth_to_local rules
[ https://issues.apache.org/jira/browse/HIVE-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdelrahman Shettia updated HIVE-10528: --- Attachment: HIVE-10528.1.patch Hiveserver2 in HTTP mode is not applying auth_to_local rules Key: HIVE-10528 URL: https://issues.apache.org/jira/browse/HIVE-10528 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.14.0 Environment: Centos 6 Reporter: Abdelrahman Shettia Assignee: Abdelrahman Shettia Attachments: HIVE-10528.1.patch PROBLEM: Authenticating to HS2 in HTTP mode with Kerberos, auth_to_local mappings do not get applied. Because of this various permissions checks which rely on the local cluster name for a user are going to fail. STEPS TO REPRODUCE: 1. Create kerberos cluster and HS2 in HTTP mode 2. Create a new user, test, along with a kerberos principal for this user 3. Create a separate principal, mapped-test 4. Create an auth_to_local rule to make sure that mapped-test is mapped to test 5. As the test user, connect to HS2 with beeline and create a simple table: {code} CREATE TABLE permtest (field1 int); {code} There is no need to load anything into this table. 6. Establish that it works as the test user: {code} show create table permtest; {code} 7. Drop the test identity and become mapped-test 8. Re-connect to HS2 with beeline, re-run the above command: {code} show create table permtest; {code} You will find that when this is done in HTTP mode, you will get an HDFS error (because of StorageBasedAuthorization doing a HDFS permissions check) and the user will be mapped-test and NOT test as it should be. ANALYSIS: This appears to be HTTP specific and the problem seems to come in {{ThriftHttpServlet$HttpKerberosServerAction.getPrincipalWithoutRealmAndHost()}}: {code} try { fullKerberosName = ShimLoader.getHadoopShims().getKerberosNameShim(fullPrincipal); } catch (IOException e) { throw new HttpAuthenticationException(e); } return fullKerberosName.getServiceName(); {code} getServiceName applies no auth_to_local rules. Seems like maybe this should be getShortName()? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10528) Hiveserver2 in HTTP mode is not applying auth_to_local rules
[ https://issues.apache.org/jira/browse/HIVE-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdelrahman Shettia updated HIVE-10528: --- Attachment: (was: HIVE-10528.1.patch) Hiveserver2 in HTTP mode is not applying auth_to_local rules Key: HIVE-10528 URL: https://issues.apache.org/jira/browse/HIVE-10528 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.14.0 Environment: Centos 6 Reporter: Abdelrahman Shettia Assignee: Abdelrahman Shettia PROBLEM: Authenticating to HS2 in HTTP mode with Kerberos, auth_to_local mappings do not get applied. Because of this various permissions checks which rely on the local cluster name for a user are going to fail. STEPS TO REPRODUCE: 1. Create kerberos cluster and HS2 in HTTP mode 2. Create a new user, test, along with a kerberos principal for this user 3. Create a separate principal, mapped-test 4. Create an auth_to_local rule to make sure that mapped-test is mapped to test 5. As the test user, connect to HS2 with beeline and create a simple table: {code} CREATE TABLE permtest (field1 int); {code} There is no need to load anything into this table. 6. Establish that it works as the test user: {code} show create table permtest; {code} 7. Drop the test identity and become mapped-test 8. Re-connect to HS2 with beeline, re-run the above command: {code} show create table permtest; {code} You will find that when this is done in HTTP mode, you will get an HDFS error (because of StorageBasedAuthorization doing a HDFS permissions check) and the user will be mapped-test and NOT test as it should be. ANALYSIS: This appears to be HTTP specific and the problem seems to come in {{ThriftHttpServlet$HttpKerberosServerAction.getPrincipalWithoutRealmAndHost()}}: {code} try { fullKerberosName = ShimLoader.getHadoopShims().getKerberosNameShim(fullPrincipal); } catch (IOException e) { throw new HttpAuthenticationException(e); } return fullKerberosName.getServiceName(); {code} getServiceName applies no auth_to_local rules. Seems like maybe this should be getShortName()? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch
[ https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10538: - Attachment: HIVE-10538.1.patch Fix NPE in FileSinkOperator from hashcode mismatch -- Key: HIVE-10538 URL: https://issues.apache.org/jira/browse/HIVE-10538 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.0.0, 1.2.0 Reporter: Peter Slawski Assignee: Peter Slawski Priority: Critical Fix For: 1.2.0, 1.3.0 Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch A Null Pointer Exception occurs when in FileSinkOperator when using bucketed tables and distribute by with multiFileSpray enabled. The following snippet query reproduces this issue: {code} set hive.enforce.bucketing = true; set hive.exec.reducers.max = 20; create table bucket_a(key int, value_a string) clustered by (key) into 256 buckets; create table bucket_b(key int, value_b string) clustered by (key) into 256 buckets; create table bucket_ab(key int, value_a string, value_b string) clustered by (key) into 256 buckets; -- Insert data into bucket_a and bucket_b insert overwrite table bucket_ab select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key = b.key) distribute by key; {code} The following stack trace is logged. {code} 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer (ExecReducer.java:reduce(255)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) ... 8 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers
[ https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Selina Zhang updated HIVE-10036: Attachment: HIVE-10036.7.patch Fixed ql/pom.xml Writing ORC format big table causes OOM - too many fixed sized stream buffers - Key: HIVE-10036 URL: https://issues.apache.org/jira/browse/HIVE-10036 Project: Hive Issue Type: Improvement Reporter: Selina Zhang Assignee: Selina Zhang Labels: orcfile Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch, HIVE-10036.7.patch ORC writer keeps multiple out steams for each column. Each output stream is allocated fixed size ByteBuffer (configurable, default to 256K). For a big table, the memory cost is unbearable. Specially when HCatalog dynamic partition involves, several hundreds files may be open and writing at the same time (same problems for FileSinkOperator). Global ORC memory manager controls the buffer size, but it only got kicked in at 5000 rows interval. An enhancement could be done here, but the problem is reducing the buffer size introduces worse compression and more IOs in read path. Sacrificing the read performance is always not a good choice. I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound to the existing configurable buffer size. Most of the streams does not need large buffer so the performance got improved significantly. Comparing to Facebook's hive-dwrf, I monitored 2x performance gain with this fix. Solving OOM for ORC completely maybe needs lots of effort , but this is definitely a low hanging fruit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10571) HiveMetaStoreClient should close existing thrift connection before its reconnect
[ https://issues.apache.org/jira/browse/HIVE-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-10571: - Attachment: HIVE-10571.patch Attach one more time. HiveMetaStoreClient should close existing thrift connection before its reconnect Key: HIVE-10571 URL: https://issues.apache.org/jira/browse/HIVE-10571 Project: Hive Issue Type: Bug Components: Metastore Reporter: Chaoyu Tang Assignee: Chaoyu Tang Attachments: HIVE-10571.patch, HIVE-10571.patch, HIVE-10571.patch HiveMetaStoreClient should first close its existing thrift connection, no matter it is already dead or still live, before its opening another connection in its reconnect() method. Otherwise, it might lead to resource huge accumulation or leak at HMS site when client keeps on retrying. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10588) implement hashCode method for HWISessionItem
[ https://issues.apache.org/jira/browse/HIVE-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10588: --- Description: HWISessionItem overwrites equals method but not hashCode method. It violates java contract below: If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result. Currently equals and compareTo methods use sessionName in their implementation. sessionName.hashcode() can be used in HWISessionItem.hashCode as well. implement hashCode method for HWISessionItem Key: HIVE-10588 URL: https://issues.apache.org/jira/browse/HIVE-10588 Project: Hive Issue Type: Improvement Components: Web UI Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor HWISessionItem overwrites equals method but not hashCode method. It violates java contract below: If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result. Currently equals and compareTo methods use sessionName in their implementation. sessionName.hashcode() can be used in HWISessionItem.hashCode as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10588) implement hashCode method for HWISessionItem
[ https://issues.apache.org/jira/browse/HIVE-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10588: --- Attachment: rb33796.patch patch #1 implement hashCode method for HWISessionItem Key: HIVE-10588 URL: https://issues.apache.org/jira/browse/HIVE-10588 Project: Hive Issue Type: Improvement Components: Web UI Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: rb33796.patch HWISessionItem overwrites equals method but not hashCode method. It violates java contract below: If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result. Currently equals and compareTo methods use sessionName in their implementation. sessionName.hashcode() can be used in HWISessionItem.hashCode as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10589) Thread.wait not in loop in HWISessionItem
[ https://issues.apache.org/jira/browse/HIVE-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10589: --- Summary: Thread.wait not in loop in HWISessionItem (was: Thread.wait not in loop) Thread.wait not in loop in HWISessionItem - Key: HIVE-10589 URL: https://issues.apache.org/jira/browse/HIVE-10589 Project: Hive Issue Type: Improvement Components: Web UI Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Usually in multi-threading programming Thread.wait() should be in while loop. So, if statement below should be replaced with while. HWISessionItem (121-128) {code} synchronized (runnable) { if (status != WebSessionItemStatus.READY) { try { runnable.wait(); } catch (Exception ex) { } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10528) Hiveserver2 in HTTP mode is not applying auth_to_local rules
[ https://issues.apache.org/jira/browse/HIVE-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdelrahman Shettia updated HIVE-10528: --- Attachment: HIVE-10528.1.patch Hiveserver2 in HTTP mode is not applying auth_to_local rules Key: HIVE-10528 URL: https://issues.apache.org/jira/browse/HIVE-10528 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 1.3.0 Environment: Centos 6 Reporter: Abdelrahman Shettia Assignee: Abdelrahman Shettia Attachments: HIVE-10528.1.patch PROBLEM: Authenticating to HS2 in HTTP mode with Kerberos, auth_to_local mappings do not get applied. Because of this various permissions checks which rely on the local cluster name for a user are going to fail. STEPS TO REPRODUCE: 1. Create kerberos cluster and HS2 in HTTP mode 2. Create a new user, test, along with a kerberos principal for this user 3. Create a separate principal, mapped-test 4. Create an auth_to_local rule to make sure that mapped-test is mapped to test 5. As the test user, connect to HS2 with beeline and create a simple table: {code} CREATE TABLE permtest (field1 int); {code} There is no need to load anything into this table. 6. Establish that it works as the test user: {code} show create table permtest; {code} 7. Drop the test identity and become mapped-test 8. Re-connect to HS2 with beeline, re-run the above command: {code} show create table permtest; {code} You will find that when this is done in HTTP mode, you will get an HDFS error (because of StorageBasedAuthorization doing a HDFS permissions check) and the user will be mapped-test and NOT test as it should be. ANALYSIS: This appears to be HTTP specific and the problem seems to come in {{ThriftHttpServlet$HttpKerberosServerAction.getPrincipalWithoutRealmAndHost()}}: {code} try { fullKerberosName = ShimLoader.getHadoopShims().getKerberosNameShim(fullPrincipal); } catch (IOException e) { throw new HttpAuthenticationException(e); } return fullKerberosName.getServiceName(); {code} getServiceName applies no auth_to_local rules. Seems like maybe this should be getShortName()? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10552) hive 1.1.0 rename column fails: Invalid method name: 'alter_table_with_cascade'
[ https://issues.apache.org/jira/browse/HIVE-10552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang reassigned HIVE-10552: -- Assignee: Chaoyu Tang hive 1.1.0 rename column fails: Invalid method name: 'alter_table_with_cascade' --- Key: HIVE-10552 URL: https://issues.apache.org/jira/browse/HIVE-10552 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 1.1.0 Environment: centos 6.6, cloudera 5.3.3 Reporter: David Watzke Assignee: Chaoyu Tang Priority: Blocker Hi, we're trying out hive 1.1.0 with cloudera 5.3.3 and since hive 1.0.0 there's (what appears to be) a regression. This ALTER command that renames a table column used to work fine in older versions but in hive 1.1.0 it does throws this error: hive CREATE TABLE test_change (a int, b int, c int); OK Time taken: 2.303 seconds hive ALTER TABLE test_change CHANGE a a1 INT; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. Invalid method name: 'alter_table_with_cascade' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10587) ExprNodeColumnDesc should be created with isPartitionColOrVirtualCol true for DP column
[ https://issues.apache.org/jira/browse/HIVE-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-10587: --- Attachment: HIVE-10587.patch A simple patch. Passing true as parameter isPartitionColOrVirtualCol value to ExprNodeColumnDesc constructor. [~ashutoshc], [~szehon] could you take a look to see if it makes sense? Thanks ExprNodeColumnDesc should be created with isPartitionColOrVirtualCol true for DP column --- Key: HIVE-10587 URL: https://issues.apache.org/jira/browse/HIVE-10587 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Attachments: HIVE-10587.patch In SymenticAnalyzer method: Operator genConversionSelectOperator(String dest, QB qb, Operator input, TableDesc table_desc, DynamicPartitionCtx dpCtx) throws SemanticException == The DP column's ExprNodeColumnDesc is created by passing false as the parameter isPartitionColOrVirtualCol value: {code} // DP columns starts with tableFields.size() for (int i = tableFields.size() + (updating() ? 1 : 0); i rowFields.size(); ++i) { TypeInfo rowFieldTypeInfo = rowFields.get(i).getType(); ExprNodeDesc column = new ExprNodeColumnDesc( rowFieldTypeInfo, rowFields.get(i).getInternalName(), , false); expressions.add(column); } {code} I think it should be true instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10528) Hiveserver2 in HTTP mode is not applying auth_to_local rules
[ https://issues.apache.org/jira/browse/HIVE-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526004#comment-14526004 ] Abdelrahman Shettia commented on HIVE-10528: The patch is uploaded and I am going to wait for the test results. Thanks -Rahman Hiveserver2 in HTTP mode is not applying auth_to_local rules Key: HIVE-10528 URL: https://issues.apache.org/jira/browse/HIVE-10528 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 1.3.0 Environment: Centos 6 Reporter: Abdelrahman Shettia Assignee: Abdelrahman Shettia Attachments: HIVE-10528.1.patch PROBLEM: Authenticating to HS2 in HTTP mode with Kerberos, auth_to_local mappings do not get applied. Because of this various permissions checks which rely on the local cluster name for a user are going to fail. STEPS TO REPRODUCE: 1. Create kerberos cluster and HS2 in HTTP mode 2. Create a new user, test, along with a kerberos principal for this user 3. Create a separate principal, mapped-test 4. Create an auth_to_local rule to make sure that mapped-test is mapped to test 5. As the test user, connect to HS2 with beeline and create a simple table: {code} CREATE TABLE permtest (field1 int); {code} There is no need to load anything into this table. 6. Establish that it works as the test user: {code} show create table permtest; {code} 7. Drop the test identity and become mapped-test 8. Re-connect to HS2 with beeline, re-run the above command: {code} show create table permtest; {code} You will find that when this is done in HTTP mode, you will get an HDFS error (because of StorageBasedAuthorization doing a HDFS permissions check) and the user will be mapped-test and NOT test as it should be. ANALYSIS: This appears to be HTTP specific and the problem seems to come in {{ThriftHttpServlet$HttpKerberosServerAction.getPrincipalWithoutRealmAndHost()}}: {code} try { fullKerberosName = ShimLoader.getHadoopShims().getKerberosNameShim(fullPrincipal); } catch (IOException e) { throw new HttpAuthenticationException(e); } return fullKerberosName.getServiceName(); {code} getServiceName applies no auth_to_local rules. Seems like maybe this should be getShortName()? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10552) hive 1.1.0 rename column fails: Invalid method name: 'alter_table_with_cascade'
[ https://issues.apache.org/jira/browse/HIVE-10552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526037#comment-14526037 ] Chaoyu Tang commented on HIVE-10552: [~dwatzke] I wonder if you are still having the problem. If yes, could you let me know the details of your cluster setting and reproduce steps? Thanks hive 1.1.0 rename column fails: Invalid method name: 'alter_table_with_cascade' --- Key: HIVE-10552 URL: https://issues.apache.org/jira/browse/HIVE-10552 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 1.1.0 Environment: centos 6.6, cloudera 5.3.3 Reporter: David Watzke Assignee: Chaoyu Tang Priority: Blocker Hi, we're trying out hive 1.1.0 with cloudera 5.3.3 and since hive 1.0.0 there's (what appears to be) a regression. This ALTER command that renames a table column used to work fine in older versions but in hive 1.1.0 it does throws this error: hive CREATE TABLE test_change (a int, b int, c int); OK Time taken: 2.303 seconds hive ALTER TABLE test_change CHANGE a a1 INT; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. Invalid method name: 'alter_table_with_cascade' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10587) ExprNodeColumnDesc should be created with isPartitionColOrVirtualCol true for DP column
[ https://issues.apache.org/jira/browse/HIVE-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526096#comment-14526096 ] Ashutosh Chauhan commented on HIVE-10587: - +1 pending tests ExprNodeColumnDesc should be created with isPartitionColOrVirtualCol true for DP column --- Key: HIVE-10587 URL: https://issues.apache.org/jira/browse/HIVE-10587 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Attachments: HIVE-10587.patch In SymenticAnalyzer method: Operator genConversionSelectOperator(String dest, QB qb, Operator input, TableDesc table_desc, DynamicPartitionCtx dpCtx) throws SemanticException == The DP column's ExprNodeColumnDesc is created by passing false as the parameter isPartitionColOrVirtualCol value: {code} // DP columns starts with tableFields.size() for (int i = tableFields.size() + (updating() ? 1 : 0); i rowFields.size(); ++i) { TypeInfo rowFieldTypeInfo = rowFields.get(i).getType(); ExprNodeDesc column = new ExprNodeColumnDesc( rowFieldTypeInfo, rowFields.get(i).getInternalName(), , false); expressions.add(column); } {code} I think it should be true instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly
[ https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-10565: Attachment: HIVE-10565.05.patch Add more Q files. Note these Q files are from HIVE-9743. Currently, non-native vector map join produces the wrong results and that will be fixed when HIVE-9743 goes in. LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly Key: HIVE-10565 URL: https://issues.apache.org/jira/browse/HIVE-10565 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 1.2.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 1.2.0, 1.3.0 Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, HIVE-10565.03.patch, HIVE-10565.04.patch, HIVE-10565.05.patch Filtering can knock out some of the rows for a repeated key, but those knocked out rows need to be included in the LEFT OUTER JOIN result and are currently not when only some rows are filtered out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7375) Add option in test infra to compile in other profiles (like hadoop-1)
[ https://issues.apache.org/jira/browse/HIVE-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-7375: Attachment: HIVE-7375.2.patch Thanks for the review, I ended up moving the code to the source-prep script, as it then errors will be part of the reporting phase. Otherwise the error will belost from JIRA comment. Its unfortunate that the #if #end Velocity directives add a tab to the next line, it is harmless but wasnt able to get around that without making the directive impossible to read. Add option in test infra to compile in other profiles (like hadoop-1) - Key: HIVE-7375 URL: https://issues.apache.org/jira/browse/HIVE-7375 Project: Hive Issue Type: Test Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-7375.2.patch, HIVE-7375.patch As we are seeing some commits breaking hadoop-1 compilation due to lack of pre-commit converage, it might be nice to add an option in the test infra to compile on optional profiles as a pre-step before testing on the main profile. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7375) Add option in test infra to compile in other profiles (like hadoop-1)
[ https://issues.apache.org/jira/browse/HIVE-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526124#comment-14526124 ] Szehon Ho commented on HIVE-7375: - Again, tested the generated script locally. Add option in test infra to compile in other profiles (like hadoop-1) - Key: HIVE-7375 URL: https://issues.apache.org/jira/browse/HIVE-7375 Project: Hive Issue Type: Test Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-7375.2.patch, HIVE-7375.patch As we are seeing some commits breaking hadoop-1 compilation due to lack of pre-commit converage, it might be nice to add an option in the test infra to compile on optional profiles as a pre-step before testing on the main profile. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly
[ https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-10565: Attachment: HIVE-10565.04.patch Add some new Q files. LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly Key: HIVE-10565 URL: https://issues.apache.org/jira/browse/HIVE-10565 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 1.2.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 1.2.0, 1.3.0 Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, HIVE-10565.03.patch, HIVE-10565.04.patch Filtering can knock out some of the rows for a repeated key, but those knocked out rows need to be included in the LEFT OUTER JOIN result and are currently not when only some rows are filtered out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10587) ExprNodeColumnDesc should be created with isPartitionColOrVirtualCol true for DP column
[ https://issues.apache.org/jira/browse/HIVE-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526173#comment-14526173 ] Chaoyu Tang commented on HIVE-10587: The failed test seem not related to this patch. ExprNodeColumnDesc should be created with isPartitionColOrVirtualCol true for DP column --- Key: HIVE-10587 URL: https://issues.apache.org/jira/browse/HIVE-10587 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Attachments: HIVE-10587.patch In SymenticAnalyzer method: Operator genConversionSelectOperator(String dest, QB qb, Operator input, TableDesc table_desc, DynamicPartitionCtx dpCtx) throws SemanticException == The DP column's ExprNodeColumnDesc is created by passing false as the parameter isPartitionColOrVirtualCol value: {code} // DP columns starts with tableFields.size() for (int i = tableFields.size() + (updating() ? 1 : 0); i rowFields.size(); ++i) { TypeInfo rowFieldTypeInfo = rowFields.get(i).getType(); ExprNodeDesc column = new ExprNodeColumnDesc( rowFieldTypeInfo, rowFields.get(i).getInternalName(), , false); expressions.add(column); } {code} I think it should be true instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9534) incorrect result set for query that projects a windowed aggregate
[ https://issues.apache.org/jira/browse/HIVE-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526193#comment-14526193 ] Chaoyu Tang commented on HIVE-9534: --- [~rhbutani] I have not looked deeply into code yet, do you think the feature like distinct with window function has not been supported in Hive at this moment, or may it be a bug? Is it related to HIVE-10586? incorrect result set for query that projects a windowed aggregate - Key: HIVE-9534 URL: https://issues.apache.org/jira/browse/HIVE-9534 Project: Hive Issue Type: Bug Components: SQL Reporter: N Campbell Assignee: Chaoyu Tang Result set returned by Hive has one row instead of 5 {code} select avg(distinct tsint.csint) over () from tsint create table if not exists TSINT (RNUM int , CSINT smallint) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; 0|\N 1|-1 2|0 3|1 4|10 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10571) HiveMetaStoreClient should close existing thrift connection before its reconnect
[ https://issues.apache.org/jira/browse/HIVE-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526212#comment-14526212 ] Hive QA commented on HIVE-10571: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12730072/HIVE-10571.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 8886 tests executed *Failed tests:* {noformat} org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3712/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3712/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3712/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12730072 - PreCommit-HIVE-TRUNK-Build HiveMetaStoreClient should close existing thrift connection before its reconnect Key: HIVE-10571 URL: https://issues.apache.org/jira/browse/HIVE-10571 Project: Hive Issue Type: Bug Components: Metastore Reporter: Chaoyu Tang Assignee: Chaoyu Tang Attachments: HIVE-10571.patch, HIVE-10571.patch, HIVE-10571.patch HiveMetaStoreClient should first close its existing thrift connection, no matter it is already dead or still live, before its opening another connection in its reconnect() method. Otherwise, it might lead to resource huge accumulation or leak at HMS site when client keeps on retrying. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7375) Add option in test infra to compile in other profiles (like hadoop-1)
[ https://issues.apache.org/jira/browse/HIVE-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-7375: Description: As we are seeing some commits breaking hadoop-1 compilation due to lack of pre-commit converage, it might be nice to add an option in the test infra to compile on optional profiles as a pre-step before testing on the main profile. NO PRECOMMIT TESTS was:As we are seeing some commits breaking hadoop-1 compilation due to lack of pre-commit converage, it might be nice to add an option in the test infra to compile on optional profiles as a pre-step before testing on the main profile. Add option in test infra to compile in other profiles (like hadoop-1) - Key: HIVE-7375 URL: https://issues.apache.org/jira/browse/HIVE-7375 Project: Hive Issue Type: Test Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-7375.patch As we are seeing some commits breaking hadoop-1 compilation due to lack of pre-commit converage, it might be nice to add an option in the test infra to compile on optional profiles as a pre-step before testing on the main profile. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10587) ExprNodeColumnDesc should be created with isPartitionColOrVirtualCol true for DP column
[ https://issues.apache.org/jira/browse/HIVE-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526160#comment-14526160 ] Hive QA commented on HIVE-10587: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12730049/HIVE-10587.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 8886 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3709/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3709/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3709/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12730049 - PreCommit-HIVE-TRUNK-Build ExprNodeColumnDesc should be created with isPartitionColOrVirtualCol true for DP column --- Key: HIVE-10587 URL: https://issues.apache.org/jira/browse/HIVE-10587 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Attachments: HIVE-10587.patch In SymenticAnalyzer method: Operator genConversionSelectOperator(String dest, QB qb, Operator input, TableDesc table_desc, DynamicPartitionCtx dpCtx) throws SemanticException == The DP column's ExprNodeColumnDesc is created by passing false as the parameter isPartitionColOrVirtualCol value: {code} // DP columns starts with tableFields.size() for (int i = tableFields.size() + (updating() ? 1 : 0); i rowFields.size(); ++i) { TypeInfo rowFieldTypeInfo = rowFields.get(i).getType(); ExprNodeDesc column = new ExprNodeColumnDesc( rowFieldTypeInfo, rowFields.get(i).getInternalName(), , false); expressions.add(column); } {code} I think it should be true instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10590) fix potential NPE in HiveMetaStore.equals
[ https://issues.apache.org/jira/browse/HIVE-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10590: --- Description: The following code will throw NPE if both v1 and v2 are null. HiveMetaStore (2028-2029) {code} String v1 = p1.getValues().get(i), v2 = p2.getValues().get(i); if ((v1 == null v2 != null) || !v1.equals(v2)) return false; {code} was: the following code will throw NPE if both v1 and v2 are null {code} String v1 = p1.getValues().get(i), v2 = p2.getValues().get(i); if ((v1 == null v2 != null) || !v1.equals(v2)) return false; {code} fix potential NPE in HiveMetaStore.equals - Key: HIVE-10590 URL: https://issues.apache.org/jira/browse/HIVE-10590 Project: Hive Issue Type: Bug Components: Metastore Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor The following code will throw NPE if both v1 and v2 are null. HiveMetaStore (2028-2029) {code} String v1 = p1.getValues().get(i), v2 = p2.getValues().get(i); if ((v1 == null v2 != null) || !v1.equals(v2)) return false; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10590) fix potential NPE in HiveMetaStore.equals
[ https://issues.apache.org/jira/browse/HIVE-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10590: --- Description: the following code will throw NPE if both v1 and v2 are null {code} String v1 = p1.getValues().get(i), v2 = p2.getValues().get(i); if ((v1 == null v2 != null) || !v1.equals(v2)) return false; {code} fix potential NPE in HiveMetaStore.equals - Key: HIVE-10590 URL: https://issues.apache.org/jira/browse/HIVE-10590 Project: Hive Issue Type: Bug Components: Metastore Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor the following code will throw NPE if both v1 and v2 are null {code} String v1 = p1.getValues().get(i), v2 = p2.getValues().get(i); if ((v1 == null v2 != null) || !v1.equals(v2)) return false; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9534) incorrect result set for query that projects a windowed aggregate
[ https://issues.apache.org/jira/browse/HIVE-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526190#comment-14526190 ] Chaoyu Tang commented on HIVE-9534: --- I tried the tests (distinct with window function) in MySQL, PostgreSQL and Oracle with following steps: {code} create table testwindow (col1 int, col2 int); insert into testwindow values (1, 1); insert into testwindow values (1, 2); insert into testwindow values (1, 3); insert into testwindow values (2, 1); insert into testwindow values (2, 2); insert into testwindow values (3, 3); --- select avg(distinct col1) over() from testwindow; {code} MySQL: did not work and got the error msg: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '() from testwindow' at line 1 PostgreSQL: did not work and got the error msg: ERROR: DISTINCT is not implemented for window functions Position: 8 Oracle: seemed work but I wonder if it is right, the average is right ( average of 1, 2, 3) with 6 rows: {code} 1 2 2 2 3 2 4 2 5 2 6 2 {code} Hive only returns one row but with correct average: 2 incorrect result set for query that projects a windowed aggregate - Key: HIVE-9534 URL: https://issues.apache.org/jira/browse/HIVE-9534 Project: Hive Issue Type: Bug Components: SQL Reporter: N Campbell Result set returned by Hive has one row instead of 5 {code} select avg(distinct tsint.csint) over () from tsint create table if not exists TSINT (RNUM int , CSINT smallint) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; 0|\N 1|-1 2|0 3|1 4|10 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9534) incorrect result set for query that projects a windowed aggregate
[ https://issues.apache.org/jira/browse/HIVE-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang reassigned HIVE-9534: - Assignee: Chaoyu Tang incorrect result set for query that projects a windowed aggregate - Key: HIVE-9534 URL: https://issues.apache.org/jira/browse/HIVE-9534 Project: Hive Issue Type: Bug Components: SQL Reporter: N Campbell Assignee: Chaoyu Tang Result set returned by Hive has one row instead of 5 {code} select avg(distinct tsint.csint) over () from tsint create table if not exists TSINT (RNUM int , CSINT smallint) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; 0|\N 1|-1 2|0 3|1 4|10 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10591) Support integer type promotion in ORC
[ https://issues.apache.org/jira/browse/HIVE-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10591: - Attachment: HIVE-10591.1.patch Support integer type promotion in ORC - Key: HIVE-10591 URL: https://issues.apache.org/jira/browse/HIVE-10591 Project: Hive Issue Type: New Feature Affects Versions: 1.3.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10591.1.patch ORC currently does not support schema-on-read. If we alter an ORC table with 'int' type to 'bigint' and if we query the altered table ClassCastException will be thrown as the schema on read from table descriptor will expect LongWritable whereas ORC will return IntWritable based on file schema stored within ORC file. OrcSerde currently doesn't do any type conversions or type promotions for performance reasons in inner loop. Since smallints, ints and bigints are stored in the same way in ORC, it will be possible be allow such type promotions without hurting performance. Following type promotions can be supported without any casting smallint - int smallint - bigint int - bigint Tinyint promotion is not possible without casting as tinyints are stored using RLE byte writer whereas smallints, ints and bigints are stored using RLE integer writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10591) Support limited integer type promotion in ORC
[ https://issues.apache.org/jira/browse/HIVE-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10591: - Summary: Support limited integer type promotion in ORC (was: Support integer type promotion in ORC) Support limited integer type promotion in ORC - Key: HIVE-10591 URL: https://issues.apache.org/jira/browse/HIVE-10591 Project: Hive Issue Type: New Feature Affects Versions: 1.3.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10591.1.patch ORC currently does not support schema-on-read. If we alter an ORC table with 'int' type to 'bigint' and if we query the altered table ClassCastException will be thrown as the schema on read from table descriptor will expect LongWritable whereas ORC will return IntWritable based on file schema stored within ORC file. OrcSerde currently doesn't do any type conversions or type promotions for performance reasons in inner loop. Since smallints, ints and bigints are stored in the same way in ORC, it will be possible be allow such type promotions without hurting performance. Following type promotions can be supported without any casting smallint - int smallint - bigint int - bigint Tinyint promotion is not possible without casting as tinyints are stored using RLE byte writer whereas smallints, ints and bigints are stored using RLE integer writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10579) Fix -Phadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525774#comment-14525774 ] Hive QA commented on HIVE-10579: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12730019/HIVE-10579.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 8885 tests executed *Failed tests:* {noformat} org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3706/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3706/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3706/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12730019 - PreCommit-HIVE-TRUNK-Build Fix -Phadoop-1 build Key: HIVE-10579 URL: https://issues.apache.org/jira/browse/HIVE-10579 Project: Hive Issue Type: Bug Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-10579.1.patch, HIVE-10579.1.patch, HIVE-10579.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10571) HiveMetaStoreClient should close existing thrift connection before its reconnect
[ https://issues.apache.org/jira/browse/HIVE-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-10571: --- Attachment: HIVE-10571.patch Reattach the patch to kick off precommit build. HiveMetaStoreClient should close existing thrift connection before its reconnect Key: HIVE-10571 URL: https://issues.apache.org/jira/browse/HIVE-10571 Project: Hive Issue Type: Bug Components: Metastore Reporter: Chaoyu Tang Assignee: Chaoyu Tang Attachments: HIVE-10571.patch, HIVE-10571.patch HiveMetaStoreClient should first close its existing thrift connection, no matter it is already dead or still live, before its opening another connection in its reconnect() method. Otherwise, it might lead to resource huge accumulation or leak at HMS site when client keeps on retrying. -- This message was sent by Atlassian JIRA (v6.3.4#6332)