[jira] [Updated] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-7193: Attachment: HIVE-7193.5.patch Incorporating additional suggestions from the review board. A big change it to revert treating Atn Providers as services (singleton instances thru the life of the HS2). These instances will now be created on every Atn request. The concern was that we dont know what the user-coded CustomAuthenticationProvider could do. Since this is user-written code, we have no control over what it can and cannot do. If each request takes a long time, we could have a bottleneck. Similarly, the PAMAuthenticator could become a bottleneck too. So the decision was to have the AtnFactory be consistent across all forms of Atn. Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.5.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2. property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property We need to include other LDAP properties as part of hive-LDAP authentication like below a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9511) Switch Tez to 0.6.1
[ https://issues.apache.org/jira/browse/HIVE-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-9511: --- Summary: Switch Tez to 0.6.1 (was: Switch Tez to 0.6.0) Switch Tez to 0.6.1 --- Key: HIVE-9511 URL: https://issues.apache.org/jira/browse/HIVE-9511 Project: Hive Issue Type: Improvement Components: Tez Reporter: Damien Carol Assignee: Damien Carol Attachments: HIVE-9511.2.patch, HIVE-9511.3.patch.txt, HIVE-9511.patch.txt Tez 0.6.0 has been released. Research to switch to version 0.6.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9511) Switch Tez to 0.6.1
[ https://issues.apache.org/jira/browse/HIVE-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-9511: --- Component/s: Tez Switch Tez to 0.6.1 --- Key: HIVE-9511 URL: https://issues.apache.org/jira/browse/HIVE-9511 Project: Hive Issue Type: Improvement Components: Tez Reporter: Damien Carol Assignee: Damien Carol Attachments: HIVE-9511.2.patch, HIVE-9511.3.patch.txt, HIVE-9511.patch.txt Tez 0.6.1 has been released. Research to switch to version 0.6.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9511) Switch Tez to 0.6.1
[ https://issues.apache.org/jira/browse/HIVE-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-9511: --- Description: Tez 0.6.1 has been released. Research to switch to version 0.6.1 was: Tez 0.6.0 has been released. Research to switch to version 0.6.0 Switch Tez to 0.6.1 --- Key: HIVE-9511 URL: https://issues.apache.org/jira/browse/HIVE-9511 Project: Hive Issue Type: Improvement Components: Tez Reporter: Damien Carol Assignee: Damien Carol Attachments: HIVE-9511.2.patch, HIVE-9511.3.patch.txt, HIVE-9511.patch.txt Tez 0.6.1 has been released. Research to switch to version 0.6.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9511) Switch Tez to 0.6.1
[ https://issues.apache.org/jira/browse/HIVE-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-9511: --- Attachment: HIVE-9511.4.patch.txt Updated to TEZ 0.6.1. Switch Tez to 0.6.1 --- Key: HIVE-9511 URL: https://issues.apache.org/jira/browse/HIVE-9511 Project: Hive Issue Type: Improvement Components: Tez Reporter: Damien Carol Assignee: Damien Carol Attachments: HIVE-9511.2.patch, HIVE-9511.3.patch.txt, HIVE-9511.4.patch.txt, HIVE-9511.patch.txt Tez 0.6.1 has been released. Research to switch to version 0.6.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others
[ https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-7018: --- Attachment: HIVE-7018.5.patch The scripts are called in sequence, I should not put same script in both upgrade-1.2.0-to-1.3.0.mysql.sql and upgrade-1.2.0-to-2.0.0.mysql.sql Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others - Key: HIVE-7018 URL: https://issues.apache.org/jira/browse/HIVE-7018 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Yongzhi Chen Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch, HIVE-7018.3.patch, HIVE-7018.4.patch, HIVE-7018.5.patch It appears that at least postgres and oracle do not have the LINK_TARGET_ID column while mysql does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11018) Turn on cbo in more q files
[ https://issues.apache.org/jira/browse/HIVE-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-11018: Attachment: HIVE-11018.1.patch Reupload to trigger QA run. Turn on cbo in more q files --- Key: HIVE-11018 URL: https://issues.apache.org/jira/browse/HIVE-11018 Project: Hive Issue Type: Task Components: Tests Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11018.1.patch, HIVE-11018.patch There are few tests in which cbo was turned off for various reasons. Those reasons don't exists anymore. For those tests, we should turn on cbo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8329) Enable postgres for storing stats
[ https://issues.apache.org/jira/browse/HIVE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol resolved HIVE-8329. Resolution: Won't Fix Some JIRAs added postgres script to testing infra. Enable postgres for storing stats - Key: HIVE-8329 URL: https://issues.apache.org/jira/browse/HIVE-8329 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Attachments: HIVE-8329.1.patch, HIVE-8329.1.patch, HIVE-8329.1.patch Simple patch to enable postgresql as JDBC publisher for statistics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11023) Disable directSQL if datanucleus.identifierFactory = datanucleus2
[ https://issues.apache.org/jira/browse/HIVE-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11023: Priority: Critical (was: Major) Disable directSQL if datanucleus.identifierFactory = datanucleus2 - Key: HIVE-11023 URL: https://issues.apache.org/jira/browse/HIVE-11023 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.3.0, 1.2.1, 2.0.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Critical We hit an interesting bug in a case where datanucleus.identifierFactory = datanucleus2 . The problem is that directSql handgenerates SQL strings assuming datanucleus1 naming scheme. If a user has their metastore JDO managed by datanucleus.identifierFactory = datanucleus2 , the SQL strings we generate are incorrect. One simple example of what this results in is the following: whenever DN persists a field which is held as a ListT, it winds up storing each T as a separate line in the appropriate mapping table, and has a column called INTEGER_IDX, which holds the position in the list. Then, upon reading, it automatically reads all relevant lines with an ORDER BY INTEGER_IDX, which results in the list retaining its order. In DN2 naming scheme, the column is called IDX, instead of INTEGER_IDX. If the user has run appropriate metatool upgrade scripts, it is highly likely that they have both columns, INTEGER_IDX and IDX. Whenever they use JDO, such as with all writes, it will then use the IDX field, and when they do any sort of optimized reads, such as through directSQL, it will ORDER BY INTEGER_IDX. An immediate danger is seen when we consider that the schema of a table is stored as a ListFieldSchema , and while IDX has 0,1,2,3,... , INTEGER_IDX will contain 0,0,0,0,... and thus, any attempt to describe the table or fetch schema for the table can come up mixed up in the table's native hashing order, rather than sorted by the index. This can then result in schema ordering being different from the actual table. For eg:, if a user has a (a:int,b:string,c:string), a describe on this may return (c:string, a:int, b: string), and thus, queries which are inserting after selecting from another table can have ClassCastExceptions when trying to insert data in the wong order - this is how we discovered this bug. This problem, however, can be far worse, if there are no type problems - it is possible, for eg., that if a,bc were all strings, that that insert query would succeed but mix up the order, which then results in user table data being mixed up. This has the potential to be very bad. We should write a tool to help convert metastores that use datanucleus2 to datanucleus1(more difficult, needs more one-time testing) or change directSql to support both(easier to code, but increases test-coverage matrix significantly and we should really then be testing against both schemes). But in the short term, we should disable directSql if we see that the identifierfactory is datanucleus2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10999: --- Attachment: HIVE-10999.1-spark.patch Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10999: --- Attachment: (was: HIVE-10999.1-spark.patch) Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8133) Support Postgres via DirectSQL
[ https://issues.apache.org/jira/browse/HIVE-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588121#comment-14588121 ] Damien Carol commented on HIVE-8133: As we switched to MySQL backed metastore in our production cluster, I can't continue to work on this one. Support Postgres via DirectSQL -- Key: HIVE-8133 URL: https://issues.apache.org/jira/browse/HIVE-8133 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Damien Carol -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10999: --- Attachment: (was: HIVE-10999.1-spark.patch) Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9853) Bad version tested in org/apache/hive/hcatalog/templeton/TestWebHCatE2e.java
[ https://issues.apache.org/jira/browse/HIVE-9853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588113#comment-14588113 ] Damien Carol commented on HIVE-9853: [~laurent.gay] As nobody want to review the patch, do you mind if I close this one? Are you still blocked? Bad version tested in org/apache/hive/hcatalog/templeton/TestWebHCatE2e.java Key: HIVE-9853 URL: https://issues.apache.org/jira/browse/HIVE-9853 Project: Hive Issue Type: Test Affects Versions: 1.0.0 Reporter: Laurent GAY Assignee: Damien Carol Attachments: correct_version_test.patch The test getHiveVersion in class org.apache.hive.hcatalog.templeton.TestWebHCatE2e check bad format of version. It checks 0.[0-9]+.[0-9]+.* and not 1.[0-9]+.[0-9]+.* This test is failed for hive, tag release-1.0.0 I propose a patch to correct it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8133) Support Postgres via DirectSQL
[ https://issues.apache.org/jira/browse/HIVE-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-8133: --- Assignee: (was: Damien Carol) Support Postgres via DirectSQL -- Key: HIVE-8133 URL: https://issues.apache.org/jira/browse/HIVE-8133 Project: Hive Issue Type: Sub-task Reporter: Brock Noland -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10999: --- Attachment: HIVE-10999.1-spark.patch Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10991) CBO: Calcite Operator To Hive Operator (Calcite Return Path): NonBlockingOpDeDupProc did not kick in rcfile_merge2.q
[ https://issues.apache.org/jira/browse/HIVE-10991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588258#comment-14588258 ] Ashutosh Chauhan commented on HIVE-10991: - +1 CBO: Calcite Operator To Hive Operator (Calcite Return Path): NonBlockingOpDeDupProc did not kick in rcfile_merge2.q Key: HIVE-10991 URL: https://issues.apache.org/jira/browse/HIVE-10991 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Jesus Camacho Rodriguez Attachments: HIVE-10991.patch NonBlockingOpDeDupProc did not kick in rcfile_merge2.q in return path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11022) Support collecting lists in user defined order
[ https://issues.apache.org/jira/browse/HIVE-11022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Haeusler updated HIVE-11022: Description: Hive currently supports aggregation of lists in order of input rows with the UDF collect_list. Unfortunately, the order is not well defined when map-side aggregations are used. Hive could support collecting lists in user-defined order by providing a UDF COLLECT_LIST_SORTED(valueColumn, sortColumn[, limit]), that would return a list of values sorted in a user defined order. An optional limit parameter can restrict this to the n first values within that order. Especially in the limit case, this can be efficiently pre-aggregated and reduces the amount of data transferred to reducers. was: Hive currently supports aggregation of lists in order of input rows with the UDF collect_list. Unfortunately, the order is not well defined when map-side aggregations are used. Hive could support collecting lists in user-defined order by providing a UDF COLLECT_LIST_SORTED(valueColumn, sortColumn[, limit]), that would return a list of values sorted in a user defined order. An optional limit parameter can restrict this to the n first values within that order. Especially in the limit case, this can be efficiently pre-aggregated and reduce the amount of data transferred to reducers. Support collecting lists in user defined order -- Key: HIVE-11022 URL: https://issues.apache.org/jira/browse/HIVE-11022 Project: Hive Issue Type: New Feature Components: UDF Reporter: Michael Haeusler Hive currently supports aggregation of lists in order of input rows with the UDF collect_list. Unfortunately, the order is not well defined when map-side aggregations are used. Hive could support collecting lists in user-defined order by providing a UDF COLLECT_LIST_SORTED(valueColumn, sortColumn[, limit]), that would return a list of values sorted in a user defined order. An optional limit parameter can restrict this to the n first values within that order. Especially in the limit case, this can be efficiently pre-aggregated and reduces the amount of data transferred to reducers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11008) webhcat GET /jobs retries on getting job details from history server is too agressive
[ https://issues.apache.org/jira/browse/HIVE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588483#comment-14588483 ] Thejas M Nair commented on HIVE-11008: -- [~jianhe] would have more background on the fix from [~cwelch]. What is the behavior in above case mentioned by [~ekoifman] ? I understand that in above case as well we can have the RM having the job information, but History server not having it. Would you recommend having retries in that case ? Can that result in timeouts ? webhcat GET /jobs retries on getting job details from history server is too agressive - Key: HIVE-11008 URL: https://issues.apache.org/jira/browse/HIVE-11008 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 1.2.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-11008.1.patch Webhcat jobs api gets the list of jobs from RM and then gets details from history server. RM has a policy of retaining fixed number of jobs to accommodate for the memory it has, while HistoryServer retains jobs based on their age. As a result, jobs that RM returns might not be present in HistoryServer and can result in a failure. HistoryServer also ends up retrying on failures even if they happen because the job actually does not exist. The retries to get details from HistoryServer in such cases is too aggressive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11008) webhcat GET /jobs retries on getting job details from history server is too agressive
[ https://issues.apache.org/jira/browse/HIVE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588367#comment-14588367 ] Thejas M Nair commented on HIVE-11008: -- As mentioned in description, this issue happens because of difference between the jobs retained by RM and job history server, and that is applicable only to showJobList. That problem is applicable only to showJobList() call, when showDetails gets set to true. This is not an ideal solution, but since the jobclient is not able to distinguish between real failures that it needs to retry on (eg transient fs errors) and failures due to job not existing, we don't have any good alternative. For showJobId(), it is better to still retry. If we move this to StatusDelegator.run(), we will have to pass some boolean to it, so that this is set only in case of showJobList() call. Please let me know if you think that is better. webhcat GET /jobs retries on getting job details from history server is too agressive - Key: HIVE-11008 URL: https://issues.apache.org/jira/browse/HIVE-11008 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 1.2.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-11008.1.patch Webhcat jobs api gets the list of jobs from RM and then gets details from history server. RM has a policy of retaining fixed number of jobs to accommodate for the memory it has, while HistoryServer retains jobs based on their age. As a result, jobs that RM returns might not be present in HistoryServer and can result in a failure. HistoryServer also ends up retrying on failures even if they happen because the job actually does not exist. The retries to get details from HistoryServer in such cases is too aggressive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.
[ https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-9736: --- Affects Version/s: 1.2.1 StorageBasedAuthProvider should batch namenode-calls where possible. Key: HIVE-9736 URL: https://issues.apache.org/jira/browse/HIVE-9736 Project: Hive Issue Type: Bug Components: Metastore, Security Affects Versions: 1.2.1 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Labels: TODOC1.2 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch Consider a table partitioned by 2 keys (dt, region). Say a dt partition could have 1 associated regions. Consider that the user does: {code:sql} ALTER TABLE my_table DROP PARTITION (dt='20150101'); {code} As things stand now, {{StorageBasedAuthProvider}} will make individual {{DistributedFileSystem.listStatus()}} calls for each partition-directory, and authorize each one separately. It'd be faster to batch the calls, and examine multiple FileStatus objects at once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11006) improve logging wrt ACID module
[ https://issues.apache.org/jira/browse/HIVE-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588395#comment-14588395 ] Sushanth Sowmyan commented on HIVE-11006: - +1 for inclusion to 1.2.1, please add it to the Release Status wiki page, and when you commit to master/branch-1, you can commit it to 1.2.1 as well. improve logging wrt ACID module --- Key: HIVE-11006 URL: https://issues.apache.org/jira/browse/HIVE-11006 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11006.patch especially around metastore DB operations (TxnHandler) which are retried or fail for some reason. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11018) Turn on cbo in more q files
[ https://issues.apache.org/jira/browse/HIVE-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588488#comment-14588488 ] Hive QA commented on HIVE-11018: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12739884/HIVE-11018.1.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9008 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_hybridgrace_hashjoin_2 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4273/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4273/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4273/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12739884 - PreCommit-HIVE-TRUNK-Build Turn on cbo in more q files --- Key: HIVE-11018 URL: https://issues.apache.org/jira/browse/HIVE-11018 Project: Hive Issue Type: Task Components: Tests Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11018.1.patch, HIVE-11018.patch There are few tests in which cbo was turned off for various reasons. Those reasons don't exists anymore. For those tests, we should turn on cbo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11008) webhcat GET /jobs retries on getting job details from history server is too agressive
[ https://issues.apache.org/jira/browse/HIVE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588401#comment-14588401 ] Eugene Koifman commented on HIVE-11008: --- suppose the call http://www.myserver.com/templeton/v1/jobs/job123 is made and and job123 doesn't exist. Why would the same retry logic not kick in? webhcat GET /jobs retries on getting job details from history server is too agressive - Key: HIVE-11008 URL: https://issues.apache.org/jira/browse/HIVE-11008 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 1.2.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-11008.1.patch Webhcat jobs api gets the list of jobs from RM and then gets details from history server. RM has a policy of retaining fixed number of jobs to accommodate for the memory it has, while HistoryServer retains jobs based on their age. As a result, jobs that RM returns might not be present in HistoryServer and can result in a failure. HistoryServer also ends up retrying on failures even if they happen because the job actually does not exist. The retries to get details from HistoryServer in such cases is too aggressive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588466#comment-14588466 ] Hive QA commented on HIVE-10999: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12739892/HIVE-10999.1-spark.patch {color:red}ERROR:{color} -1 due to 706 failed/errored test(s), 7406 tests executed *Failed tests:* {noformat} TestCliDriver-alter_file_format.q-udf_tan.q-bucket_map_join_tez1.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-alter_table_not_sorted.q-ppd_join3.q-authorization_delete_own_table.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-authorization_1_sql_std.q-disallow_incompatible_type_change_off.q-encryption_insert_values.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-authorization_create_temp_table.q-skewjoinopt10.q-mapjoin_subquery2.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-authorization_parts.q-parquet_map_of_maps.q-join_vc.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-authorization_role_grant2.q-alter_char2.q-avro_joins_native.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-authorization_update.q-udf_pmod.q-leadlag_queries.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-auto_join18.q-smb_mapjoin_7.q-join_merge_multi_expressions.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-auto_join9.q-udtf_posexplode.q-udf_least.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-auto_join_reordering_values.q-authorization_cli_stdconfigauth.q-subquery_in.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-avro_decimal_native.q-udf_E.q-bucketmapjoin4.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-ba_table3.q-tez_union_dynamic_partition.q-union30.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-bool_literal.q-authorization_cli_createtab.q-udf_when.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-bucketcontext_4.q-orc_ends_with_nulls.q-correlationoptimizer9.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-bucketmapjoin3.q-vector_partition_diff_num_cols.q-stats2.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-bucketsortoptimize_insert_7.q-dynpart_sort_optimization2.q-decimal_3.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-cluster.q-groupby_sort_6.q-tez_schema_evolution.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-columnstats_partlvl_dp.q-input31.q-leadlag.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-compute_stats_string.q-show_columns.q-noalias_subq1.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-cp_mj_rc.q-decimal_2.q-union32.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-create_func1.q-enforce_order.q-interval_comparison.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-create_genericudf.q-dynamic_partition_insert.q-auto_join10.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-describe_xpath.q-autogen_colalias.q-skewjoinopt3.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-escape_distributeby1.q-ambiguitycheck.q-udf_bitwise_and.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-groupby3_map.q-current_date_timestamp.q-skewjoinopt8.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-groupby4.q-convert_enum_to_string.q-load_dyn_part3.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-groupby8_map.q-insert_values_tmp_table.q-union_remove_11.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-groupby_complex_types.q-groupby_map_ppr_multi_distinct.q-vector_decimal_round.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-groupby_grouping_id2.q-udf_decode.q-protectmode.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-groupby_grouping_sets5.q-auto_sortmerge_join_13.q-show_tblproperties.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-index_bitmap_rc.q-desc_tbl_part_cols.q-bucketmapjoin10.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-infer_bucket_sort.q-nonreserved_keywords_input37.q-udf_nvl.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-infer_bucket_sort_list_bucket.q-parquet_avro_array_of_primitives.q-fileformat_sequencefile.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-infer_bucket_sort_multi_insert.q-insert_compressed.q-udf4.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-input10.q-orc_empty_files.q-ppd_multi_insert.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-input19.q-index_auth.q-input16.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-interval_udf.q-metadataonly1.q-union13.q-and-12-more -
[jira] [Updated] (HIVE-11007) CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's mapInputToDP should depends on the last SEL
[ https://issues.apache.org/jira/browse/HIVE-11007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-11007: --- Attachment: HIVE-11007.02.patch CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's mapInputToDP should depends on the last SEL - Key: HIVE-11007 URL: https://issues.apache.org/jira/browse/HIVE-11007 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11007.01.patch, HIVE-11007.02.patch In dynamic partitioning case, for example, we are going to have TS0-SEL1-SEL2-FS3. The dpCtx's mapInputToDP is populated by SEL1 rather than SEL2, which causes error in return path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10999: --- Attachment: (was: HIVE-10999.1-spark.patch) Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11008) webhcat GET /jobs retries on getting job details from history server is too agressive
[ https://issues.apache.org/jira/browse/HIVE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588584#comment-14588584 ] Eugene Koifman commented on HIVE-11008: --- whether you start with Server.showJobId() or Server.showJobList() you end up in StatusDelegator.run(), i.e. the calls to Hadoop daemons are exactly the same so this has to behave the same way... webhcat GET /jobs retries on getting job details from history server is too agressive - Key: HIVE-11008 URL: https://issues.apache.org/jira/browse/HIVE-11008 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 1.2.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-11008.1.patch Webhcat jobs api gets the list of jobs from RM and then gets details from history server. RM has a policy of retaining fixed number of jobs to accommodate for the memory it has, while HistoryServer retains jobs based on their age. As a result, jobs that RM returns might not be present in HistoryServer and can result in a failure. HistoryServer also ends up retrying on failures even if they happen because the job actually does not exist. The retries to get details from HistoryServer in such cases is too aggressive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-1643) support range scans and non-key columns in HBase filter pushdown
[ https://issues.apache.org/jira/browse/HIVE-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588759#comment-14588759 ] Ran Postar commented on HIVE-1643: -- I understand that pushing down the expression to hbase is complicated especially when working with multiple tables. But is it possible to add a hinting mechanism? We can added to each from table a hint for startRow and stopRow for the scan, and when HiveHBaseTableInputFormat scan the table it will added the startRow stopRow. And for the first Part we will leave the user the possibility to optimize specific table scan. support range scans and non-key columns in HBase filter pushdown Key: HIVE-1643 URL: https://issues.apache.org/jira/browse/HIVE-1643 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.9.0 Reporter: John Sichi Assignee: bharath v Labels: patch Attachments: HIVE-1643.patch, Hive-1643.2.patch, hbase_handler.patch HIVE-1226 added support for WHERE rowkey=3. We would like to support WHERE rowkey BETWEEN 10 and 20, as well as predicates on non-rowkeys (plus conjunctions etc). Non-rowkey conditions can't be used to filter out entire ranges, but they can be used to push the per-row filter processing as far down as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10984) After lock table shared explicit lock, lock database exclusive should fail.
[ https://issues.apache.org/jira/browse/HIVE-10984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10984: Description: The following statements will fail since tb1 and its database are locked in shared, and exclusive lock on database fails as expected. {noformat} use db1; lock table tbl1 shared; lock database db1 exclusive; {noformat} While the following similar statements will pass since the current database is different. {noformat} use default; lock table db1.tbl1 shared; lock database db1 exclusive; {noformat} Seems both case should fail. Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable failure cases. was: There is an issue in ZooKeeperHiveLockManager.java, in which when locking exclusively on a table, it doesn't lock the database object (which does if it's from the query). The current implementation of ZooKeeperHiveLockManager will lock the the object and the parents, and won't check the children when it tries to acquire lock on certain object. Then it will cause the following scenario which should not be allowed but right now it goes through. {noformat} use default; lock table db1.tbl1 shared; lock database db1 exclusive; {noformat} Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable failure cases. After lock table shared explicit lock, lock database exclusive should fail. --- Key: HIVE-10984 URL: https://issues.apache.org/jira/browse/HIVE-10984 Project: Hive Issue Type: Bug Components: Locking Reporter: Aihua Xu Assignee: Aihua Xu The following statements will fail since tb1 and its database are locked in shared, and exclusive lock on database fails as expected. {noformat} use db1; lock table tbl1 shared; lock database db1 exclusive; {noformat} While the following similar statements will pass since the current database is different. {noformat} use default; lock table db1.tbl1 shared; lock database db1 exclusive; {noformat} Seems both case should fail. Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable failure cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10984) After lock table shared explicit lock, lock database exclusive should fail.
[ https://issues.apache.org/jira/browse/HIVE-10984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10984: Description: The following statements will fail since tb1 and its database are locked in shared, and exclusive lock on database fails as expected. {noformat} use db1; lock table tbl1 shared; lock database db1 exclusive; {noformat} While the following similar statements will pass just because the current database is different. {noformat} use default; lock table db1.tbl1 shared; lock database db1 exclusive; {noformat} Seems both case should fail. Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable failure cases. was: The following statements will fail since tb1 and its database are locked in shared, and exclusive lock on database fails as expected. {noformat} use db1; lock table tbl1 shared; lock database db1 exclusive; {noformat} While the following similar statements will pass since the current database is different. {noformat} use default; lock table db1.tbl1 shared; lock database db1 exclusive; {noformat} Seems both case should fail. Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable failure cases. After lock table shared explicit lock, lock database exclusive should fail. --- Key: HIVE-10984 URL: https://issues.apache.org/jira/browse/HIVE-10984 Project: Hive Issue Type: Bug Components: Locking Reporter: Aihua Xu Assignee: Aihua Xu The following statements will fail since tb1 and its database are locked in shared, and exclusive lock on database fails as expected. {noformat} use db1; lock table tbl1 shared; lock database db1 exclusive; {noformat} While the following similar statements will pass just because the current database is different. {noformat} use default; lock table db1.tbl1 shared; lock database db1 exclusive; {noformat} Seems both case should fail. Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable failure cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10984) After lock table shared explicit lock, lock database exclusive should fail.
[ https://issues.apache.org/jira/browse/HIVE-10984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10984: Description: The following statements will fail since tb1 and its database are locked in shared, and exclusive lock on database fails as expected. {noformat} use db1; lock table db1.tbl1 shared; lock database db1 exclusive; {noformat} While the following similar statements will pass just because the current database is different. {noformat} use default; lock table db1.tbl1 shared; lock database db1 exclusive; {noformat} Seems both case should fail. Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable failure cases. was: The following statements will fail since tb1 and its database are locked in shared, and exclusive lock on database fails as expected. {noformat} use db1; lock table tbl1 shared; lock database db1 exclusive; {noformat} While the following similar statements will pass just because the current database is different. {noformat} use default; lock table db1.tbl1 shared; lock database db1 exclusive; {noformat} Seems both case should fail. Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable failure cases. After lock table shared explicit lock, lock database exclusive should fail. --- Key: HIVE-10984 URL: https://issues.apache.org/jira/browse/HIVE-10984 Project: Hive Issue Type: Bug Components: Locking Reporter: Aihua Xu Assignee: Aihua Xu The following statements will fail since tb1 and its database are locked in shared, and exclusive lock on database fails as expected. {noformat} use db1; lock table db1.tbl1 shared; lock database db1 exclusive; {noformat} While the following similar statements will pass just because the current database is different. {noformat} use default; lock table db1.tbl1 shared; lock database db1 exclusive; {noformat} Seems both case should fail. Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable failure cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-7193: --- Description: Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} was: Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2. property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property We need to include other LDAP properties as part of hive-LDAP authentication like below a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.5.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11026) Make vector_outer_join* test more robust
[ https://issues.apache.org/jira/browse/HIVE-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-11026: Attachment: HIVE-11026.patch Make vector_outer_join* test more robust Key: HIVE-11026 URL: https://issues.apache.org/jira/browse/HIVE-11026 Project: Hive Issue Type: Test Components: Tests Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11026.patch Different file sizes on different OSes result in different Data Size in explain output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.
[ https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588792#comment-14588792 ] Hive QA commented on HIVE-9736: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12739895/HIVE-9736.8.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9008 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbort {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4275/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4275/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4275/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12739895 - PreCommit-HIVE-TRUNK-Build StorageBasedAuthProvider should batch namenode-calls where possible. Key: HIVE-9736 URL: https://issues.apache.org/jira/browse/HIVE-9736 Project: Hive Issue Type: Bug Components: Metastore, Security Affects Versions: 1.2.1 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Labels: TODOC1.2 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch, HIVE-9736.8.patch Consider a table partitioned by 2 keys (dt, region). Say a dt partition could have 1 associated regions. Consider that the user does: {code:sql} ALTER TABLE my_table DROP PARTITION (dt='20150101'); {code} As things stand now, {{StorageBasedAuthProvider}} will make individual {{DistributedFileSystem.listStatus()}} calls for each partition-directory, and authorize each one separately. It'd be faster to batch the calls, and examine multiple FileStatus objects at once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.
[ https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588814#comment-14588814 ] Sushanth Sowmyan commented on HIVE-9736: Looks like we have a regression : org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition is failing while it shouldn't. This happened in the 9th May run as well. Error Message : expected:1 but was:0 Stacktrace: {noformat} java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.dropPartitionByOtherUser(TestStorageBasedMetastoreAuthorizationDrops.java:202) at org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition(TestStorageBasedMetastoreAuthorizationDrops.java:172) {noformat} [~mithun], if we can look at this and resolve this, we can get this into 1.2.1, but if not, then I'm afraid this will have to be deferred out of branch-1.2, and make it in 1.3/2.0 . StorageBasedAuthProvider should batch namenode-calls where possible. Key: HIVE-9736 URL: https://issues.apache.org/jira/browse/HIVE-9736 Project: Hive Issue Type: Bug Components: Metastore, Security Affects Versions: 1.2.1 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Labels: TODOC1.2 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch, HIVE-9736.8.patch Consider a table partitioned by 2 keys (dt, region). Say a dt partition could have 1 associated regions. Consider that the user does: {code:sql} ALTER TABLE my_table DROP PARTITION (dt='20150101'); {code} As things stand now, {{StorageBasedAuthProvider}} will make individual {{DistributedFileSystem.listStatus()}} calls for each partition-directory, and authorize each one separately. It'd be faster to batch the calls, and examine multiple FileStatus objects at once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7292) Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588837#comment-14588837 ] Raj Sharma commented on HIVE-7292: -- Can I run below command in Hive 1.1 or 1.2 to switch engine from MapReduce to Spark? hive set hive.execution.engine=spark; Hive on Spark - Key: HIVE-7292 URL: https://issues.apache.org/jira/browse/HIVE-7292 Project: Hive Issue Type: Improvement Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5 Attachments: Hive-on-Spark.pdf Spark as an open-source data analytics cluster computing framework has gained significant momentum recently. Many Hive users already have Spark installed as their computing backbone. To take advantages of Hive, they still need to have either MapReduce or Tez on their cluster. This initiative will provide user a new alternative so that those user can consolidate their backend. Secondly, providing such an alternative further increases Hive's adoption as it exposes Spark users to a viable, feature-rich de facto standard SQL tools on Hadoop. Finally, allowing Hive to run on Spark also has performance benefits. Hive queries, especially those involving multiple reducer stages, will run faster, thus improving user experience as Tez does. This is an umbrella JIRA which will cover many coming subtask. Design doc will be attached here shortly, and will be on the wiki as well. Feedback from the community is greatly appreciated! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10479) CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias in columnInfo which triggers PPD
[ https://issues.apache.org/jira/browse/HIVE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-10479: --- Attachment: HIVE-10479.02.patch address [~jcamachorodriguez]'s comments CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias in columnInfo which triggers PPD Key: HIVE-10479 URL: https://issues.apache.org/jira/browse/HIVE-10479 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-10479.01.patch, HIVE-10479.02.patch, HIVE-10479.patch in ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 477, when aliases contains empty string and key is an empty string too, it assumes that aliases contains key. This will trigger incorrect PPD. To reproduce it, apply the HIVE-10455 and run cbo_subq_notin.q. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11027) Hive on tez: Bucket map joins fail when hashcode goes negative
[ https://issues.apache.org/jira/browse/HIVE-11027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-11027: --- Affects Version/s: 0.13 0.14.0 Hive on tez: Bucket map joins fail when hashcode goes negative -- Key: HIVE-11027 URL: https://issues.apache.org/jira/browse/HIVE-11027 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.14.0, 1.0.0, 0.13 Reporter: Vikram Dixit K Assignee: Prasanth Jayachandran Seeing an issue when dynamic sort optimization is enabled while doing an insert into bucketed table. We seem to be flipping the negative sign on the hashcode instead of taking the complement of it for routing the data correctly. This results in correctness issues in bucket map joins in hive on tez when the hash code goes negative. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.
[ https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-9736: --- Attachment: HIVE-9736.8.patch One more attempt to get this patch in - updating the patch slightly so as to not remove the dependency on java.util.Set in HadoopShims.java (since there is another function that now depends on it) Once the tests pass this time, I will get it in. StorageBasedAuthProvider should batch namenode-calls where possible. Key: HIVE-9736 URL: https://issues.apache.org/jira/browse/HIVE-9736 Project: Hive Issue Type: Bug Components: Metastore, Security Affects Versions: 1.2.1 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Labels: TODOC1.2 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch, HIVE-9736.8.patch Consider a table partitioned by 2 keys (dt, region). Say a dt partition could have 1 associated regions. Consider that the user does: {code:sql} ALTER TABLE my_table DROP PARTITION (dt='20150101'); {code} As things stand now, {{StorageBasedAuthProvider}} will make individual {{DistributedFileSystem.listStatus()}} calls for each partition-directory, and authorize each one separately. It'd be faster to batch the calls, and examine multiple FileStatus objects at once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11026) Make vector_outer_join* test more robust
[ https://issues.apache.org/jira/browse/HIVE-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan reassigned HIVE-11026: --- Assignee: Ashutosh Chauhan Make vector_outer_join* test more robust Key: HIVE-11026 URL: https://issues.apache.org/jira/browse/HIVE-11026 Project: Hive Issue Type: Test Components: Tests Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Different file sizes on different OSes result in different Data Size in explain output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10984) After lock table shared explicit lock, lock database exclusive should fail.
[ https://issues.apache.org/jira/browse/HIVE-10984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10984: Summary: After lock table shared explicit lock, lock database exclusive should fail. (was: Lock table explicit lock command doesn't lock the database object.) After lock table shared explicit lock, lock database exclusive should fail. --- Key: HIVE-10984 URL: https://issues.apache.org/jira/browse/HIVE-10984 Project: Hive Issue Type: Bug Components: Locking Reporter: Aihua Xu Assignee: Aihua Xu There is an issue in ZooKeeperHiveLockManager.java, in which when locking exclusively on a table, it doesn't lock the database object (which does if it's from the query). The current implementation of ZooKeeperHiveLockManager will lock the the object and the parents, and won't check the children when it tries to acquire lock on certain object. Then it will cause the following scenario which should not be allowed but right now it goes through. {noformat} use default; lock table db1.tbl1 shared; lock database db1 exclusive; {noformat} Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable failure cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11023) Disable directSQL if datanucleus.identifierFactory = datanucleus2
[ https://issues.apache.org/jira/browse/HIVE-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588641#comment-14588641 ] Hive QA commented on HIVE-11023: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12739893/HIVE-11023.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9008 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4274/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4274/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4274/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12739893 - PreCommit-HIVE-TRUNK-Build Disable directSQL if datanucleus.identifierFactory = datanucleus2 - Key: HIVE-11023 URL: https://issues.apache.org/jira/browse/HIVE-11023 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.3.0, 1.2.1, 2.0.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Critical Attachments: HIVE-11023.patch We hit an interesting bug in a case where datanucleus.identifierFactory = datanucleus2 . The problem is that directSql handgenerates SQL strings assuming datanucleus1 naming scheme. If a user has their metastore JDO managed by datanucleus.identifierFactory = datanucleus2 , the SQL strings we generate are incorrect. One simple example of what this results in is the following: whenever DN persists a field which is held as a ListT, it winds up storing each T as a separate line in the appropriate mapping table, and has a column called INTEGER_IDX, which holds the position in the list. Then, upon reading, it automatically reads all relevant lines with an ORDER BY INTEGER_IDX, which results in the list retaining its order. In DN2 naming scheme, the column is called IDX, instead of INTEGER_IDX. If the user has run appropriate metatool upgrade scripts, it is highly likely that they have both columns, INTEGER_IDX and IDX. Whenever they use JDO, such as with all writes, it will then use the IDX field, and when they do any sort of optimized reads, such as through directSQL, it will ORDER BY INTEGER_IDX. An immediate danger is seen when we consider that the schema of a table is stored as a ListFieldSchema , and while IDX has 0,1,2,3,... , INTEGER_IDX will contain 0,0,0,0,... and thus, any attempt to describe the table or fetch schema for the table can come up mixed up in the table's native hashing order, rather than sorted by the index. This can then result in schema ordering being different from the actual table. For eg:, if a user has a (a:int,b:string,c:string), a describe on this may return (c:string, a:int, b: string), and thus, queries which are inserting after selecting from another table can have ClassCastExceptions when trying to insert data in the wong order - this is how we discovered this bug. This problem, however, can be far worse, if there are no type problems - it is possible, for eg., that if a,bc were all strings, that that insert query would succeed but mix up the order, which then results in user table data being mixed up. This has the potential to be very bad. We should write a tool to help convert metastores that use datanucleus2 to datanucleus1(more difficult, needs more one-time testing) or change directSql to support both(easier to code, but increases test-coverage matrix significantly and we should really then be testing against both schemes). But in the short term, we should disable directSql if we see that the identifierfactory is datanucleus2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11025) In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly
[ https://issues.apache.org/jira/browse/HIVE-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-11025: Attachment: HIVE-11025.patch In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly Key: HIVE-11025 URL: https://issues.apache.org/jira/browse/HIVE-11025 Project: Hive Issue Type: Sub-task Components: PTF-Windowing Affects Versions: 2.0.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-11025.patch Given data and the following query, {noformat} deptno empno bonussalary 307698 NULL2850.0 307900 NULL950.0 307844 0 1500.0 select avg(salary) over (partition by deptno order by bonus range 200 preceding) from emp2; {noformat} It produces incorrect result for the row in which bonus=0 1900.0 1900.0 1766.7 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11025) In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly
[ https://issues.apache.org/jira/browse/HIVE-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-11025: Description: Given data and the following query, {noformat} deptno empno bonussalary 307698 NULL2850.0 307900 NULL950.0 307844 0 1500.0 select avg(salary) over (partition by deptno order by bonus range 200 preceding) from emp2; {noformat} It produces incorrect result for the row in which bonus=0 1900.0 1900.0 1766.7 was: Given data and the following query, {noformat} deptno empno bonussalary 307698 NULL2850.0 307900 NULL950.0 307844 0 1500.0 select avg(salary) over (partition by deptno order by bonus range 200 preceding) from emp2; {noformat} It produces incorrect result for the row in which bonus=0 1900.0 1900.0 1766.7 In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly Key: HIVE-11025 URL: https://issues.apache.org/jira/browse/HIVE-11025 Project: Hive Issue Type: Sub-task Components: PTF-Windowing Affects Versions: 2.0.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-11025.patch Given data and the following query, {noformat} deptno empno bonussalary 307698 NULL2850.0 307900 NULL950.0 307844 0 1500.0 select avg(salary) over (partition by deptno order by bonus range 200 preceding) from emp2; {noformat} It produces incorrect result for the row in which bonus=0 1900.0 1900.0 1766.7 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588746#comment-14588746 ] Hive QA commented on HIVE-10999: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12739930/HIVE-10999.1-spark.patch {color:red}ERROR:{color} -1 due to 604 failed/errored test(s), 7286 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.initializationError org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_auto_sortmerge_join_16 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketizedhiveinputformat org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin6 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_empty_dir_in_table org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_external_table_with_space_in_location_path org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap_auto org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_bucketed_table org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_merge org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_leftsemijoin_mr org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_parallel_orderby org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_quotedid_smb org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_remote_script org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority2 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_smb_mapjoin_8 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter_partitioned org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_truncate_column_buckets org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_uber_reduce org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_add_part_multiple org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_orc org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_stats_orc org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_annotate_stats_join org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join0 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join12 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join13 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join15 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join16 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join17 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18_multi_distinct
[jira] [Commented] (HIVE-11006) improve logging wrt ACID module
[ https://issues.apache.org/jira/browse/HIVE-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588711#comment-14588711 ] Eugene Koifman commented on HIVE-11006: --- actually it did pick it up but there was a glitch http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4272/console improve logging wrt ACID module --- Key: HIVE-11006 URL: https://issues.apache.org/jira/browse/HIVE-11006 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11006.2.patch, HIVE-11006.patch especially around metastore DB operations (TxnHandler) which are retried or fail for some reason. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7292) Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588593#comment-14588593 ] Raj Sharma commented on HIVE-7292: -- When will Spark be shipped with Hive as an option of Hive engine along with Tez and MapReduce? Hive on Spark - Key: HIVE-7292 URL: https://issues.apache.org/jira/browse/HIVE-7292 Project: Hive Issue Type: Improvement Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5 Attachments: Hive-on-Spark.pdf Spark as an open-source data analytics cluster computing framework has gained significant momentum recently. Many Hive users already have Spark installed as their computing backbone. To take advantages of Hive, they still need to have either MapReduce or Tez on their cluster. This initiative will provide user a new alternative so that those user can consolidate their backend. Secondly, providing such an alternative further increases Hive's adoption as it exposes Spark users to a viable, feature-rich de facto standard SQL tools on Hadoop. Finally, allowing Hive to run on Spark also has performance benefits. Hive queries, especially those involving multiple reducer stages, will run faster, thus improving user experience as Tez does. This is an umbrella JIRA which will cover many coming subtask. Design doc will be attached here shortly, and will be on the wiki as well. Feedback from the community is greatly appreciated! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7292) Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588596#comment-14588596 ] Chao Sun commented on HIVE-7292: Hi Raj, as mentioned by Xuefu above, Hive on Spark is already available in Hive 1.1 and 1.2. Please check it out. Hive on Spark - Key: HIVE-7292 URL: https://issues.apache.org/jira/browse/HIVE-7292 Project: Hive Issue Type: Improvement Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5 Attachments: Hive-on-Spark.pdf Spark as an open-source data analytics cluster computing framework has gained significant momentum recently. Many Hive users already have Spark installed as their computing backbone. To take advantages of Hive, they still need to have either MapReduce or Tez on their cluster. This initiative will provide user a new alternative so that those user can consolidate their backend. Secondly, providing such an alternative further increases Hive's adoption as it exposes Spark users to a viable, feature-rich de facto standard SQL tools on Hadoop. Finally, allowing Hive to run on Spark also has performance benefits. Hive queries, especially those involving multiple reducer stages, will run faster, thus improving user experience as Tez does. This is an umbrella JIRA which will cover many coming subtask. Design doc will be attached here shortly, and will be on the wiki as well. Feedback from the community is greatly appreciated! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11006) improve logging wrt ACID module
[ https://issues.apache.org/jira/browse/HIVE-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-11006: -- Attachment: HIVE-11006.2.patch attaching patch again - for some reason the build bot didn't pick it up improve logging wrt ACID module --- Key: HIVE-11006 URL: https://issues.apache.org/jira/browse/HIVE-11006 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11006.2.patch, HIVE-11006.patch especially around metastore DB operations (TxnHandler) which are retried or fail for some reason. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11018) Turn on cbo in more q files
[ https://issues.apache.org/jira/browse/HIVE-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-11018: Attachment: HIVE-11018.2.patch Turn on cbo in more q files --- Key: HIVE-11018 URL: https://issues.apache.org/jira/browse/HIVE-11018 Project: Hive Issue Type: Task Components: Tests Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11018.1.patch, HIVE-11018.2.patch, HIVE-11018.patch There are few tests in which cbo was turned off for various reasons. Those reasons don't exists anymore. For those tests, we should turn on cbo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11007) CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's mapInputToDP should depends on the last SEL
[ https://issues.apache.org/jira/browse/HIVE-11007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588952#comment-14588952 ] Hive QA commented on HIVE-11007: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12739912/HIVE-11007.02.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 8981 tests executed *Failed tests:* {noformat} TestContribCliDriver - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4276/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4276/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4276/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12739912 - PreCommit-HIVE-TRUNK-Build CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's mapInputToDP should depends on the last SEL - Key: HIVE-11007 URL: https://issues.apache.org/jira/browse/HIVE-11007 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11007.01.patch, HIVE-11007.02.patch In dynamic partitioning case, for example, we are going to have TS0-SEL1-SEL2-FS3. The dpCtx's mapInputToDP is populated by SEL1 rather than SEL2, which causes error in return path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11028) Tez: table self join and join with another table fails with IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HIVE-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588988#comment-14588988 ] Jason Dere commented on HIVE-11028: --- It looks like this is caused because TezCompiler invokes ConstantPropagate and this is removing some columns, but without a corresponding call to ColumnPruner to remove outputColumnNames from the join operator. Talking to [~jpullokkaran] and [~hagleitn], the use of ConstantPropagate in TezCompiler is to remove extra (and unnecessary) AND true predicates generated during dynamic partition pruning. One solution is to eliminate just those expressions (referred to in ConstantPropagate as short-cutting), as opposed to doing full constant folding. I'll try to add an option to ConstantPropagate where we can specify that we only want to perform expression short-cutting rather than full constant folding. Tez: table self join and join with another table fails with IndexOutOfBoundsException - Key: HIVE-11028 URL: https://issues.apache.org/jira/browse/HIVE-11028 Project: Hive Issue Type: Bug Components: Query Planning Reporter: Jason Dere Assignee: Jason Dere {noformat} create table tez_self_join1(id1 int, id2 string, id3 string); insert into table tez_self_join1 values(1, 'aa','bb'), (2, 'ab','ab'), (3,'ba','ba'); create table tez_self_join2(id1 int); insert into table tez_self_join2 values(1),(2),(3); explain select s.id2, s.id3 from ( select self1.id1, self1.id2, self1.id3 from tez_self_join1 self1 join tez_self_join1 self2 on self1.id2=self2.id3 ) s join tez_self_join2 on s.id1=tez_self_join2.id1 where s.id2='ab'; {noformat} fails with error: {noformat} 2015-06-16 15:41:55,759 ERROR [main]: ql.Driver (SessionState.java:printError(979)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 3, vertexId=vertex_1434494327112_0002_4_04, diagnostics=[Task failed, taskId=task_1434494327112_0002_4_04_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:118) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:109) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:290) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:275) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getJoinOutputObjectInspector(CommonJoinOperator.java:175) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.initializeOp(CommonJoinOperator.java:313) at org.apache.hadoop.hive.ql.exec.AbstractMapJoinOperator.initializeOp(AbstractMapJoinOperator.java:71) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.initializeOp(CommonMergeJoinOperator.java:99) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
[jira] [Updated] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-10996: --- Affects Version/s: 2.0.0 1.3.0 Aggregation / Projection over Multi-Join Inner Query producing incorrect results Key: HIVE-10996 URL: https://issues.apache.org/jira/browse/HIVE-10996 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0 Reporter: Gautam Kowshik Assignee: Jesus Camacho Rodriguez Priority: Critical Attachments: explain_q1.txt, explain_q2.txt We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. The following query (Q1) produces no results: {code} select s from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; {code} While this one (Q2) does produce results : {code} select * from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; 1 21 20 Bob 1234 1 31 30 Bob 1234 3 51 50 Jeff1234 {code} The setup to test this is: {code} create table purchase_history (s string, product string, price double, timestamp int); insert into purchase_history values ('1', 'Belt', 20.00, 21); insert into purchase_history values ('1', 'Socks', 3.50, 31); insert into purchase_history values ('3', 'Belt', 20.00, 51); insert into purchase_history values ('4', 'Shirt', 15.50, 59); create table cart_history (s string, cart_id int, timestamp int); insert into cart_history values ('1', 1, 10); insert into cart_history values ('1', 2, 20); insert into cart_history values ('1', 3, 30); insert into cart_history values ('1', 4, 40); insert into cart_history values ('3', 5, 50); insert into cart_history values ('4', 6, 60); create table events (s string, st2 string, n int, timestamp int); insert into events values ('1', 'Bob', 1234, 20); insert into events values ('1', 'Bob', 1234, 30); insert into events values ('1', 'Bob', 1234, 25); insert into events values ('2', 'Sam', 1234, 30); insert into events values ('3', 'Jeff', 1234, 50); insert into events values ('4', 'Ted', 1234, 60); {code} I realize select * and select s are not all that interesting in this context but what lead us to this issue was select count(distinct s) was not returning results. The above queries are the simplified queries that produce the issue. I will note that if I convert the inner join to a table and select from that the issue does not appear. Update: Found that turning off hive.optimize.remove.identity.project fixes this issue. This optimization was introduced in https://issues.apache.org/jira/browse/HIVE-8435 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-10996: --- Attachment: HIVE-10996.patch Triggering a QA run. Aggregation / Projection over Multi-Join Inner Query producing incorrect results Key: HIVE-10996 URL: https://issues.apache.org/jira/browse/HIVE-10996 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0 Reporter: Gautam Kowshik Assignee: Jesus Camacho Rodriguez Priority: Critical Attachments: HIVE-10996.patch, explain_q1.txt, explain_q2.txt We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. The following query (Q1) produces no results: {code} select s from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; {code} While this one (Q2) does produce results : {code} select * from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; 1 21 20 Bob 1234 1 31 30 Bob 1234 3 51 50 Jeff1234 {code} The setup to test this is: {code} create table purchase_history (s string, product string, price double, timestamp int); insert into purchase_history values ('1', 'Belt', 20.00, 21); insert into purchase_history values ('1', 'Socks', 3.50, 31); insert into purchase_history values ('3', 'Belt', 20.00, 51); insert into purchase_history values ('4', 'Shirt', 15.50, 59); create table cart_history (s string, cart_id int, timestamp int); insert into cart_history values ('1', 1, 10); insert into cart_history values ('1', 2, 20); insert into cart_history values ('1', 3, 30); insert into cart_history values ('1', 4, 40); insert into cart_history values ('3', 5, 50); insert into cart_history values ('4', 6, 60); create table events (s string, st2 string, n int, timestamp int); insert into events values ('1', 'Bob', 1234, 20); insert into events values ('1', 'Bob', 1234, 30); insert into events values ('1', 'Bob', 1234, 25); insert into events values ('2', 'Sam', 1234, 30); insert into events values ('3', 'Jeff', 1234, 50); insert into events values ('4', 'Ted', 1234, 60); {code} I realize select * and select s are not all that interesting in this context but what lead us to this issue was select count(distinct s) was not returning results. The above queries are the simplified queries that produce the issue. I will note that if I convert the inner join to a table and select from that the issue does not appear. Update: Found that turning off hive.optimize.remove.identity.project fixes this issue. This optimization was introduced in https://issues.apache.org/jira/browse/HIVE-8435 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7292) Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589152#comment-14589152 ] Rui Li commented on HIVE-7292: -- [~riomario] - Yes you can. You can follow this [wiki|https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started] to see what else you need to do to run Hive on Spark. Hive on Spark - Key: HIVE-7292 URL: https://issues.apache.org/jira/browse/HIVE-7292 Project: Hive Issue Type: Improvement Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5 Attachments: Hive-on-Spark.pdf Spark as an open-source data analytics cluster computing framework has gained significant momentum recently. Many Hive users already have Spark installed as their computing backbone. To take advantages of Hive, they still need to have either MapReduce or Tez on their cluster. This initiative will provide user a new alternative so that those user can consolidate their backend. Secondly, providing such an alternative further increases Hive's adoption as it exposes Spark users to a viable, feature-rich de facto standard SQL tools on Hadoop. Finally, allowing Hive to run on Spark also has performance benefits. Hive queries, especially those involving multiple reducer stages, will run faster, thus improving user experience as Tez does. This is an umbrella JIRA which will cover many coming subtask. Design doc will be attached here shortly, and will be on the wiki as well. Feedback from the community is greatly appreciated! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589250#comment-14589250 ] Xuefu Zhang commented on HIVE-10999: Yeah. I built that jar from Spark branch-1.4, using make-distribution.sh and renamed it. Do you know how to make non-SNAPSHOT build from spark? Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10707) CBO: debug logging OOMs
[ https://issues.apache.org/jira/browse/HIVE-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V reassigned HIVE-10707: -- Assignee: Gopal V CBO: debug logging OOMs --- Key: HIVE-10707 URL: https://issues.apache.org/jira/browse/HIVE-10707 Project: Hive Issue Type: Bug Components: CBO Reporter: Gopal V Assignee: Gopal V Priority: Trivial {code} hive source xcross.sql; OK Time taken: 0.837 seconds Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) at java.lang.StringBuilder.append(StringBuilder.java:136) at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:111) at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119) at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119) at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119) at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119) {code} The query contains 360 join clauses, wrapped in a UNION ALL. Looks like {{genOpTree}} does {code} this.ctx.setCboInfo(Plan optimized by CBO.); this.ctx.setCboSucceeded(true); LOG.debug(newAST.dump()); } {code} the debug logging OOMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11026) Make vector_outer_join* test more robust
[ https://issues.apache.org/jira/browse/HIVE-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589317#comment-14589317 ] Ashutosh Chauhan commented on HIVE-11026: - Test failure is unrelated. [~prasanth_j] can you take a look? Make vector_outer_join* test more robust Key: HIVE-11026 URL: https://issues.apache.org/jira/browse/HIVE-11026 Project: Hive Issue Type: Test Components: Tests Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11026.patch Different file sizes on different OSes result in different Data Size in explain output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-3958) support partial scan for analyze command - RCFile
[ https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-3958: - Description: analyze commands allows us to collect statistics on existing tables/partitions. It works great but might be slow since it scans all files. There are 2 ways to speed it up: 1. collect stats without file scan. It may not collect all stats but good and fast enough for use case. HIVE-3917 addresses it 2. collect stats via partial file scan. It doesn't scan all content of files but part of it to get file metadata. some examples are https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) and HFile of Hbase (Edit: That link should be https://cwiki.apache.org/confluence/display/Hive/RCFileCat.) This jira is targeted to address the #2. More specifically RCFile format. was: analyze commands allows us to collect statistics on existing tables/partitions. It works great but might be slow since it scans all files. There are 2 ways to speed it up: 1. collect stats without file scan. It may not collect all stats but good and fast enough for use case. HIVE-3917 addresses it 2. collect stats via partial file scan. It doesn't scan all content of files but part of it to get file metadata. some examples are https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) and HFile of Hbase This jira is targeted to address the #2. More specifically RCFile format. support partial scan for analyze command - RCFile - Key: HIVE-3958 URL: https://issues.apache.org/jira/browse/HIVE-3958 Project: Hive Issue Type: Improvement Reporter: Gang Tim Liu Assignee: Gang Tim Liu Fix For: 0.11.0 Attachments: HIVE-3958.patch.1, HIVE-3958.patch.2, HIVE-3958.patch.3, HIVE-3958.patch.4, HIVE-3958.patch.5, HIVE-3958.patch.6 analyze commands allows us to collect statistics on existing tables/partitions. It works great but might be slow since it scans all files. There are 2 ways to speed it up: 1. collect stats without file scan. It may not collect all stats but good and fast enough for use case. HIVE-3917 addresses it 2. collect stats via partial file scan. It doesn't scan all content of files but part of it to get file metadata. some examples are https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) and HFile of Hbase (Edit: That link should be https://cwiki.apache.org/confluence/display/Hive/RCFileCat.) This jira is targeted to address the #2. More specifically RCFile format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10707) CBO: debug logging OOMs
[ https://issues.apache.org/jira/browse/HIVE-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10707: --- Attachment: HIVE-10707.1.patch CBO: debug logging OOMs --- Key: HIVE-10707 URL: https://issues.apache.org/jira/browse/HIVE-10707 Project: Hive Issue Type: Bug Components: CBO Reporter: Gopal V Assignee: Gopal V Priority: Trivial Attachments: HIVE-10707.1.patch {code} hive source xcross.sql; OK Time taken: 0.837 seconds Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) at java.lang.StringBuilder.append(StringBuilder.java:136) at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:111) at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119) at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119) at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119) at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119) {code} The query contains 360 join clauses, wrapped in a UNION ALL. Looks like {{genOpTree}} does {code} this.ctx.setCboInfo(Plan optimized by CBO.); this.ctx.setCboSucceeded(true); LOG.debug(newAST.dump()); } {code} the debug logging OOMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10707) CBO: debug logging OOMs
[ https://issues.apache.org/jira/browse/HIVE-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10707: --- Affects Version/s: 2.0.0 CBO: debug logging OOMs --- Key: HIVE-10707 URL: https://issues.apache.org/jira/browse/HIVE-10707 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 2.0.0 Reporter: Gopal V Assignee: Gopal V Priority: Trivial Attachments: HIVE-10707.1.patch {code} hive source xcross.sql; OK Time taken: 0.837 seconds Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) at java.lang.StringBuilder.append(StringBuilder.java:136) at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:111) at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119) at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119) at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119) at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119) {code} The query contains 360 join clauses, wrapped in a UNION ALL. Looks like {{genOpTree}} does {code} this.ctx.setCboInfo(Plan optimized by CBO.); this.ctx.setCboSucceeded(true); LOG.debug(newAST.dump()); } {code} the debug logging OOMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589289#comment-14589289 ] Rui Li commented on HIVE-10999: --- I think you can use the [release tag|https://github.com/apache/spark/releases/tag/v1.4.0] to get a non-SNAPSHOT build. Also you should rename the dir to {{spark-1.4.0-bin-hadoop2-without-hive}} as I mentioned above. Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11007) CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's mapInputToDP should depends on the last SEL
[ https://issues.apache.org/jira/browse/HIVE-11007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589320#comment-14589320 ] Pengcheng Xiong commented on HIVE-11007: The test failures are unrelated. [~ashutoshc] or [~jpullokkaran], could you please take a look? Thanks. CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's mapInputToDP should depends on the last SEL - Key: HIVE-11007 URL: https://issues.apache.org/jira/browse/HIVE-11007 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11007.01.patch, HIVE-11007.02.patch In dynamic partitioning case, for example, we are going to have TS0-SEL1-SEL2-FS3. The dpCtx's mapInputToDP is populated by SEL1 rather than SEL2, which causes error in return path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11026) Make vector_outer_join* test more robust
[ https://issues.apache.org/jira/browse/HIVE-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589337#comment-14589337 ] Prasanth Jayachandran commented on HIVE-11026: -- +1. Where you able to verify this on other OSes? Make vector_outer_join* test more robust Key: HIVE-11026 URL: https://issues.apache.org/jira/browse/HIVE-11026 Project: Hive Issue Type: Test Components: Tests Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11026.patch Different file sizes on different OSes result in different Data Size in explain output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10479) CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias in columnInfo which triggers PPD
[ https://issues.apache.org/jira/browse/HIVE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589247#comment-14589247 ] Hive QA commented on HIVE-10479: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12739980/HIVE-10479.02.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9008 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4280/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4280/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4280/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12739980 - PreCommit-HIVE-TRUNK-Build CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias in columnInfo which triggers PPD Key: HIVE-10479 URL: https://issues.apache.org/jira/browse/HIVE-10479 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-10479.01.patch, HIVE-10479.02.patch, HIVE-10479.patch in ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 477, when aliases contains empty string and key is an empty string too, it assumes that aliases contains key. This will trigger incorrect PPD. To reproduce it, apply the HIVE-10455 and run cbo_subq_notin.q. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10974) Use Configuration::getRaw() for the Base64 data
[ https://issues.apache.org/jira/browse/HIVE-10974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589259#comment-14589259 ] Gopal V commented on HIVE-10974: Commited to master, thanks [~sershe] Use Configuration::getRaw() for the Base64 data --- Key: HIVE-10974 URL: https://issues.apache.org/jira/browse/HIVE-10974 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.0.0 Reporter: Gopal V Assignee: Gopal V Fix For: 2.0.0 Attachments: HIVE-10974.1.patch Inspired by the Twitter HadoopSummit talk {code} if (HiveConf.getBoolVar(conf, ConfVars.HIVE_RPC_QUERY_PLAN)) { LOG.debug(Loading plan from string: +path.toUri().getPath()); String planString = conf.get(path.toUri().getPath()); {code} Use getRaw() in other places where Base64 data is present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10974) Use Configuration::getRaw() for the Base64 data
[ https://issues.apache.org/jira/browse/HIVE-10974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10974: --- Affects Version/s: (was: 1.2.0) 2.0.0 Use Configuration::getRaw() for the Base64 data --- Key: HIVE-10974 URL: https://issues.apache.org/jira/browse/HIVE-10974 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.0.0 Reporter: Gopal V Assignee: Gopal V Fix For: 2.0.0 Attachments: HIVE-10974.1.patch Inspired by the Twitter HadoopSummit talk {code} if (HiveConf.getBoolVar(conf, ConfVars.HIVE_RPC_QUERY_PLAN)) { LOG.debug(Loading plan from string: +path.toUri().getPath()); String planString = conf.get(path.toUri().getPath()); {code} Use getRaw() in other places where Base64 data is present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11026) Make vector_outer_join* test more robust
[ https://issues.apache.org/jira/browse/HIVE-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589312#comment-14589312 ] Hive QA commented on HIVE-11026: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12739981/HIVE-11026.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9008 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4281/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4281/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4281/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12739981 - PreCommit-HIVE-TRUNK-Build Make vector_outer_join* test more robust Key: HIVE-11026 URL: https://issues.apache.org/jira/browse/HIVE-11026 Project: Hive Issue Type: Test Components: Tests Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11026.patch Different file sizes on different OSes result in different Data Size in explain output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10479) CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias in columnInfo which triggers PPD
[ https://issues.apache.org/jira/browse/HIVE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589322#comment-14589322 ] Pengcheng Xiong commented on HIVE-10479: [~jcamachorodriguez], the test failures are unrelated and it failed on the previous runs too. Could you please take a look? Thanks. CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias in columnInfo which triggers PPD Key: HIVE-10479 URL: https://issues.apache.org/jira/browse/HIVE-10479 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-10479.01.patch, HIVE-10479.02.patch, HIVE-10479.patch in ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 477, when aliases contains empty string and key is an empty string too, it assumes that aliases contains key. This will trigger incorrect PPD. To reproduce it, apply the HIVE-10455 and run cbo_subq_notin.q. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11031) ORC concatenation of old files can fail while merging column statistics
[ https://issues.apache.org/jira/browse/HIVE-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-11031: - Attachment: HIVE-11031.patch ORC concatenation of old files can fail while merging column statistics --- Key: HIVE-11031 URL: https://issues.apache.org/jira/browse/HIVE-11031 Project: Hive Issue Type: Bug Affects Versions: 1.0.0, 1.2.0, 1.1.0, 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-11031.patch Column statistics in ORC are optional protobuf fields. Old ORC files might not have statistics for newly added types like decimal, date, timestamp etc. But column statistics merging assumes column statistics exists for these types and invokes merge. For example, merging of TimestampColumnStatistics directly casts the received ColumnStatistics object without doing instanceof check. If the ORC file contains time stamp column statistics then this will work else it will throw ClassCastException. Also, the file merge operator swallows the exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang reassigned HIVE-10999: -- Assignee: Rui Li (was: Xuefu Zhang) Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11006) improve logging wrt ACID module
[ https://issues.apache.org/jira/browse/HIVE-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589176#comment-14589176 ] Hive QA commented on HIVE-11006: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12739960/HIVE-11006.2.patch {color:green}SUCCESS:{color} +1 9008 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4279/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4279/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4279/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12739960 - PreCommit-HIVE-TRUNK-Build improve logging wrt ACID module --- Key: HIVE-11006 URL: https://issues.apache.org/jira/browse/HIVE-11006 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11006.2.patch, HIVE-11006.patch especially around metastore DB operations (TxnHandler) which are retried or fail for some reason. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11018) Turn on cbo in more q files
[ https://issues.apache.org/jira/browse/HIVE-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589177#comment-14589177 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-11018: -- +1 Turn on cbo in more q files --- Key: HIVE-11018 URL: https://issues.apache.org/jira/browse/HIVE-11018 Project: Hive Issue Type: Task Components: Tests Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11018.1.patch, HIVE-11018.2.patch, HIVE-11018.patch There are few tests in which cbo was turned off for various reasons. Those reasons don't exists anymore. For those tests, we should turn on cbo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11023) Disable directSQL if datanucleus.identifierFactory = datanucleus2
[ https://issues.apache.org/jira/browse/HIVE-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589182#comment-14589182 ] Sushanth Sowmyan commented on HIVE-11023: - [~sershe], could you please review? Thanks! Disable directSQL if datanucleus.identifierFactory = datanucleus2 - Key: HIVE-11023 URL: https://issues.apache.org/jira/browse/HIVE-11023 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.3.0, 1.2.1, 2.0.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Critical Attachments: HIVE-11023.patch We hit an interesting bug in a case where datanucleus.identifierFactory = datanucleus2 . The problem is that directSql handgenerates SQL strings assuming datanucleus1 naming scheme. If a user has their metastore JDO managed by datanucleus.identifierFactory = datanucleus2 , the SQL strings we generate are incorrect. One simple example of what this results in is the following: whenever DN persists a field which is held as a ListT, it winds up storing each T as a separate line in the appropriate mapping table, and has a column called INTEGER_IDX, which holds the position in the list. Then, upon reading, it automatically reads all relevant lines with an ORDER BY INTEGER_IDX, which results in the list retaining its order. In DN2 naming scheme, the column is called IDX, instead of INTEGER_IDX. If the user has run appropriate metatool upgrade scripts, it is highly likely that they have both columns, INTEGER_IDX and IDX. Whenever they use JDO, such as with all writes, it will then use the IDX field, and when they do any sort of optimized reads, such as through directSQL, it will ORDER BY INTEGER_IDX. An immediate danger is seen when we consider that the schema of a table is stored as a ListFieldSchema , and while IDX has 0,1,2,3,... , INTEGER_IDX will contain 0,0,0,0,... and thus, any attempt to describe the table or fetch schema for the table can come up mixed up in the table's native hashing order, rather than sorted by the index. This can then result in schema ordering being different from the actual table. For eg:, if a user has a (a:int,b:string,c:string), a describe on this may return (c:string, a:int, b: string), and thus, queries which are inserting after selecting from another table can have ClassCastExceptions when trying to insert data in the wong order - this is how we discovered this bug. This problem, however, can be far worse, if there are no type problems - it is possible, for eg., that if a,bc were all strings, that that insert query would succeed but mix up the order, which then results in user table data being mixed up. This has the potential to be very bad. We should write a tool to help convert metastores that use datanucleus2 to datanucleus1(more difficult, needs more one-time testing) or change directSql to support both(easier to code, but increases test-coverage matrix significantly and we should really then be testing against both schemes). But in the short term, we should disable directSql if we see that the identifierfactory is datanucleus2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-6384) Implement all Hive data types in Parquet
[ https://issues.apache.org/jira/browse/HIVE-6384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu resolved HIVE-6384. Resolution: Fixed Fix Version/s: 1.2.0 Resolved since all sub tasks are resolved in 1.2.0. Implement all Hive data types in Parquet Key: HIVE-6384 URL: https://issues.apache.org/jira/browse/HIVE-6384 Project: Hive Issue Type: Task Reporter: Brock Noland Assignee: Ferdinand Xu Labels: Parquet Fix For: 1.2.0 Uber JIRA to track implementation of binary, timestamp, date, char, varchar or decimal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7261) Calculation works wrong when hive.groupby.skewindata is true and count(*) count(distinct) group by work simultaneously
[ https://issues.apache.org/jira/browse/HIVE-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589193#comment-14589193 ] Chengbing Liu commented on HIVE-7261: - HIVE-10971 solved the same problem, mark as a duplicate. Calculation works wrong when hive.groupby.skewindata is true and count(*) count(distinct) group by work simultaneously Key: HIVE-7261 URL: https://issues.apache.org/jira/browse/HIVE-7261 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Environment: hive0.12 hadoop1.0.4 Reporter: Chris Chen 【Phenomenon】 The query results are not the same as when hive.groupby.skewindata was setted to true and false. 【my question】 I want to calculate the count(*) and count(distinct) simultaneously ,otherwise it will cost 2 MR job to calculate. But when i set the hive.groupby.skewindata to be true, the count(*) result shoud not be same as the count(distinct) , but the real result is same, so it's wrong. And I find the difference of its query plan which the Reduce Operator Tree-Group By Operator-mode is mergepartial when skew is set to false and Reduce Operator Tree-Group By Operator-mode is complete when skew is set to true. So i'm confused the root cause of the error. 【sql】 select ds,appid,eventname,active,{color:red}count(distinct(guid)), count(*) {color}from eventinfo_tmp where ds='20140612' and length(eventname)1000 and eventname like '%alibaba%' group by ds,appid,eventname,active; 【the others hive configaration exclude hive.groupby.skewindata】 hive.exec.compress.output=true hive.exec.compress.intermediate=true io.seqfile.compression.type=BLOCK mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec hive.map.aggr=true hive.stats.autogather=false hive.exec.scratchdir=/user/complat/tmp mapred.job.queue.name=complat hive.exec.mode.local.auto=false hive.exec.mode.local.auto.inputbytes.max=500 hive.exec.mode.local.auto.tasks.max=10 hive.exec.mode.local.auto.input.files.max=1000 hive.exec.dynamic.partition=true hive.exec.dynamic.partition.mode=nonstrict hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat mapred.max.split.size=1 mapred.min.split.size.per.node=1 mapred.min.split.size.per.rack=1 【result】 when hive.groupby.skewindata=true the result is : 20140612 8 alibaba 1 {color:red}87 147{color} when it=false the result is : 20140612 8 alibaba 1 {color:red}87 87{color} 【query plan】 ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME eventinfo_tmp))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL ds)) (TOK_SELEXPR (TOK_TABLE_OR_COL appid)) (TOK_SELEXPR (TOK_TABLE_OR_COL eventname)) (TOK_SELEXPR (TOK_TABLE_OR_COL active)) (TOK_SELEXPR (TOK_FUNCTIONDI count (TOK_TABLE_OR_COL guid))) (TOK_SELEXPR (TOK_FUNCTIONSTAR count))) (TOK_WHERE (and (and (= (TOK_TABLE_OR_COL ds) '20140612') ( (TOK_FUNCTION length (TOK_TABLE_OR_COL eventname)) 1000)) (like (TOK_TABLE_OR_COL eventname) '%tvvideo_setting%'))) (TOK_GROUPBY (TOK_TABLE_OR_COL ds) (TOK_TABLE_OR_COL appid) (TOK_TABLE_OR_COL eventname) (TOK_TABLE_OR_COL active STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: eventinfo_tmp TableScan alias: eventinfo_tmp Filter Operator predicate: expr: ((length(eventname) 1000) and (eventname like '%tvvideo_setting%')) type: boolean Select Operator expressions: expr: ds type: string expr: appid type: string expr: eventname type: string expr: active type: int expr: guid type: string outputColumnNames: ds, appid, eventname, active, guid Group By Operator aggregations: expr: count(DISTINCT guid) expr: count() bucketGroup: false keys: expr: ds type: string expr: appid type: string expr: eventname type: string expr: active type: int expr: guid
[jira] [Resolved] (HIVE-7261) Calculation works wrong when hive.groupby.skewindata is true and count(*) count(distinct) group by work simultaneously
[ https://issues.apache.org/jira/browse/HIVE-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu resolved HIVE-7261. - Resolution: Duplicate Calculation works wrong when hive.groupby.skewindata is true and count(*) count(distinct) group by work simultaneously Key: HIVE-7261 URL: https://issues.apache.org/jira/browse/HIVE-7261 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Environment: hive0.12 hadoop1.0.4 Reporter: Chris Chen 【Phenomenon】 The query results are not the same as when hive.groupby.skewindata was setted to true and false. 【my question】 I want to calculate the count(*) and count(distinct) simultaneously ,otherwise it will cost 2 MR job to calculate. But when i set the hive.groupby.skewindata to be true, the count(*) result shoud not be same as the count(distinct) , but the real result is same, so it's wrong. And I find the difference of its query plan which the Reduce Operator Tree-Group By Operator-mode is mergepartial when skew is set to false and Reduce Operator Tree-Group By Operator-mode is complete when skew is set to true. So i'm confused the root cause of the error. 【sql】 select ds,appid,eventname,active,{color:red}count(distinct(guid)), count(*) {color}from eventinfo_tmp where ds='20140612' and length(eventname)1000 and eventname like '%alibaba%' group by ds,appid,eventname,active; 【the others hive configaration exclude hive.groupby.skewindata】 hive.exec.compress.output=true hive.exec.compress.intermediate=true io.seqfile.compression.type=BLOCK mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec hive.map.aggr=true hive.stats.autogather=false hive.exec.scratchdir=/user/complat/tmp mapred.job.queue.name=complat hive.exec.mode.local.auto=false hive.exec.mode.local.auto.inputbytes.max=500 hive.exec.mode.local.auto.tasks.max=10 hive.exec.mode.local.auto.input.files.max=1000 hive.exec.dynamic.partition=true hive.exec.dynamic.partition.mode=nonstrict hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat mapred.max.split.size=1 mapred.min.split.size.per.node=1 mapred.min.split.size.per.rack=1 【result】 when hive.groupby.skewindata=true the result is : 20140612 8 alibaba 1 {color:red}87 147{color} when it=false the result is : 20140612 8 alibaba 1 {color:red}87 87{color} 【query plan】 ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME eventinfo_tmp))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL ds)) (TOK_SELEXPR (TOK_TABLE_OR_COL appid)) (TOK_SELEXPR (TOK_TABLE_OR_COL eventname)) (TOK_SELEXPR (TOK_TABLE_OR_COL active)) (TOK_SELEXPR (TOK_FUNCTIONDI count (TOK_TABLE_OR_COL guid))) (TOK_SELEXPR (TOK_FUNCTIONSTAR count))) (TOK_WHERE (and (and (= (TOK_TABLE_OR_COL ds) '20140612') ( (TOK_FUNCTION length (TOK_TABLE_OR_COL eventname)) 1000)) (like (TOK_TABLE_OR_COL eventname) '%tvvideo_setting%'))) (TOK_GROUPBY (TOK_TABLE_OR_COL ds) (TOK_TABLE_OR_COL appid) (TOK_TABLE_OR_COL eventname) (TOK_TABLE_OR_COL active STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: eventinfo_tmp TableScan alias: eventinfo_tmp Filter Operator predicate: expr: ((length(eventname) 1000) and (eventname like '%tvvideo_setting%')) type: boolean Select Operator expressions: expr: ds type: string expr: appid type: string expr: eventname type: string expr: active type: int expr: guid type: string outputColumnNames: ds, appid, eventname, active, guid Group By Operator aggregations: expr: count(DISTINCT guid) expr: count() bucketGroup: false keys: expr: ds type: string expr: appid type: string expr: eventname type: string expr: active type: int expr: guid type: string mode: hash
[jira] [Updated] (HIVE-11033) BloomFilter index is not honored by ORC reader
[ https://issues.apache.org/jira/browse/HIVE-11033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Yan updated HIVE-11033: - Description: There is a bug in the org.apache.hadoop.hive.ql.io.orc.ReaderImpl class which caused the bloom filter index saved in the ORC file not being used. The root cause is the bloomFilterIndices variable defined in the SargApplier class superseded the one defined in its parent class. Therefore, in the ReaderImpl.pickRowGroups() {code} protected boolean[] pickRowGroups() throws IOException { // if we don't have a sarg or indexes, we read everything if (sargApp == null) { return null; } readRowIndex(currentStripe, included, sargApp.sargColumns); return sargApp.pickRowGroups(stripes.get(currentStripe), indexes); } {code} The bloomFilterIndices populated by readRowIndex() is not picked up by sargApp object. One solution is to simply pass it to the sargApp.pickRowGroups() {noformat} 18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original 174d173 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; 178c177 sarg, options.getColumnNames(), strideRate, types, included.length, bloomFilterIndices); --- sarg, options.getColumnNames(), strideRate, types, included.length); 204a204 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; 673c673 ListOrcProto.Type types, int includedCount, OrcProto.BloomFilterIndex[] bloomFilterIndices) { --- ListOrcProto.Type types, int includedCount) { 677c677 this.bloomFilterIndices = bloomFilterIndices; --- bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; {noformat} was: There is a bug in the org.apache.hadoop.hive.ql.io.orc.ReaderImpl class which caused the bloom filter index saved in the ORC file not being used. The reason is because the bloomFilterIndices variable defined in the SargApplier class superseded from its parent class. Here is one way to fix it {noformat} 18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original 174d173 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; 178c177 sarg, options.getColumnNames(), strideRate, types, included.length, bloomFilterIndices); --- sarg, options.getColumnNames(), strideRate, types, included.length); 204a204 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; 673c673 ListOrcProto.Type types, int includedCount, OrcProto.BloomFilterIndex[] bloomFilterIndices) { --- ListOrcProto.Type types, int includedCount) { 677c677 this.bloomFilterIndices = bloomFilterIndices; --- bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; {noformat} BloomFilter index is not honored by ORC reader -- Key: HIVE-11033 URL: https://issues.apache.org/jira/browse/HIVE-11033 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Allan Yan There is a bug in the org.apache.hadoop.hive.ql.io.orc.ReaderImpl class which caused the bloom filter index saved in the ORC file not being used. The root cause is the bloomFilterIndices variable defined in the SargApplier class superseded the one defined in its parent class. Therefore, in the ReaderImpl.pickRowGroups() {code} protected boolean[] pickRowGroups() throws IOException { // if we don't have a sarg or indexes, we read everything if (sargApp == null) { return null; } readRowIndex(currentStripe, included, sargApp.sargColumns); return sargApp.pickRowGroups(stripes.get(currentStripe), indexes); } {code} The bloomFilterIndices populated by readRowIndex() is not picked up by sargApp object. One solution is to simply pass it to the sargApp.pickRowGroups() {noformat} 18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original 174d173 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; 178c177 sarg, options.getColumnNames(), strideRate, types, included.length, bloomFilterIndices); --- sarg, options.getColumnNames(), strideRate, types, included.length); 204a204 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; 673c673 ListOrcProto.Type types, int includedCount, OrcProto.BloomFilterIndex[] bloomFilterIndices) { --- ListOrcProto.Type types, int includedCount) { 677c677 this.bloomFilterIndices = bloomFilterIndices; --- bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; {noformat} -- This
[jira] [Updated] (HIVE-11033) BloomFilter index is not honored by ORC reader
[ https://issues.apache.org/jira/browse/HIVE-11033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Yan updated HIVE-11033: - Description: There is a bug in the org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl class which caused the bloom filter index saved in the ORC file not being used. The root cause is the bloomFilterIndices variable defined in the SargApplier class superseded the one defined in its parent class. Therefore, in the ReaderImpl.pickRowGroups() {code} protected boolean[] pickRowGroups() throws IOException { // if we don't have a sarg or indexes, we read everything if (sargApp == null) { return null; } readRowIndex(currentStripe, included, sargApp.sargColumns); return sargApp.pickRowGroups(stripes.get(currentStripe), indexes); } {code} The bloomFilterIndices populated by readRowIndex() is not picked up by sargApp object. One solution is to make SargApplier.bloomFilterIndices a reference of the one defined in its parent class. {noformat} 18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original 174d173 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; 178c177 sarg, options.getColumnNames(), strideRate, types, included.length, bloomFilterIndices); --- sarg, options.getColumnNames(), strideRate, types, included.length); 204a204 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; 673c673 ListOrcProto.Type types, int includedCount, OrcProto.BloomFilterIndex[] bloomFilterIndices) { --- ListOrcProto.Type types, int includedCount) { 677c677 this.bloomFilterIndices = bloomFilterIndices; --- bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; {noformat} was: There is a bug in the org.apache.hadoop.hive.ql.io.orc.ReaderImpl class which caused the bloom filter index saved in the ORC file not being used. The root cause is the bloomFilterIndices variable defined in the SargApplier class superseded the one defined in its parent class. Therefore, in the ReaderImpl.pickRowGroups() {code} protected boolean[] pickRowGroups() throws IOException { // if we don't have a sarg or indexes, we read everything if (sargApp == null) { return null; } readRowIndex(currentStripe, included, sargApp.sargColumns); return sargApp.pickRowGroups(stripes.get(currentStripe), indexes); } {code} The bloomFilterIndices populated by readRowIndex() is not picked up by sargApp object. One solution is to simply pass it to the sargApp.pickRowGroups() {noformat} 18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original 174d173 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; 178c177 sarg, options.getColumnNames(), strideRate, types, included.length, bloomFilterIndices); --- sarg, options.getColumnNames(), strideRate, types, included.length); 204a204 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; 673c673 ListOrcProto.Type types, int includedCount, OrcProto.BloomFilterIndex[] bloomFilterIndices) { --- ListOrcProto.Type types, int includedCount) { 677c677 this.bloomFilterIndices = bloomFilterIndices; --- bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; {noformat} BloomFilter index is not honored by ORC reader -- Key: HIVE-11033 URL: https://issues.apache.org/jira/browse/HIVE-11033 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Allan Yan There is a bug in the org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl class which caused the bloom filter index saved in the ORC file not being used. The root cause is the bloomFilterIndices variable defined in the SargApplier class superseded the one defined in its parent class. Therefore, in the ReaderImpl.pickRowGroups() {code} protected boolean[] pickRowGroups() throws IOException { // if we don't have a sarg or indexes, we read everything if (sargApp == null) { return null; } readRowIndex(currentStripe, included, sargApp.sargColumns); return sargApp.pickRowGroups(stripes.get(currentStripe), indexes); } {code} The bloomFilterIndices populated by readRowIndex() is not picked up by sargApp object. One solution is to make SargApplier.bloomFilterIndices a reference of the one defined in its parent class. {noformat} 18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original 174d173 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; 178c177
[jira] [Updated] (HIVE-11033) BloomFilter index is not honored by ORC reader
[ https://issues.apache.org/jira/browse/HIVE-11033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Yan updated HIVE-11033: - Description: There is a bug in the org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl class which caused the bloom filter index saved in the ORC file not being used. The root cause is the bloomFilterIndices variable defined in the SargApplier class superseded the one defined in its parent class. Therefore, in the ReaderImpl.pickRowGroups() {code} protected boolean[] pickRowGroups() throws IOException { // if we don't have a sarg or indexes, we read everything if (sargApp == null) { return null; } readRowIndex(currentStripe, included, sargApp.sargColumns); return sargApp.pickRowGroups(stripes.get(currentStripe), indexes); } {code} The bloomFilterIndices populated by readRowIndex() is not picked up by sargApp object. One solution is to make SargApplier.bloomFilterIndices a reference to its parent counterpart. {noformat} 18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original 174d173 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; 178c177 sarg, options.getColumnNames(), strideRate, types, included.length, bloomFilterIndices); --- sarg, options.getColumnNames(), strideRate, types, included.length); 204a204 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; 673c673 ListOrcProto.Type types, int includedCount, OrcProto.BloomFilterIndex[] bloomFilterIndices) { --- ListOrcProto.Type types, int includedCount) { 677c677 this.bloomFilterIndices = bloomFilterIndices; --- bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; {noformat} was: There is a bug in the org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl class which caused the bloom filter index saved in the ORC file not being used. The root cause is the bloomFilterIndices variable defined in the SargApplier class superseded the one defined in its parent class. Therefore, in the ReaderImpl.pickRowGroups() {code} protected boolean[] pickRowGroups() throws IOException { // if we don't have a sarg or indexes, we read everything if (sargApp == null) { return null; } readRowIndex(currentStripe, included, sargApp.sargColumns); return sargApp.pickRowGroups(stripes.get(currentStripe), indexes); } {code} The bloomFilterIndices populated by readRowIndex() is not picked up by sargApp object. One solution is to make SargApplier.bloomFilterIndices a reference of the one defined in its parent class. {noformat} 18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original 174d173 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; 178c177 sarg, options.getColumnNames(), strideRate, types, included.length, bloomFilterIndices); --- sarg, options.getColumnNames(), strideRate, types, included.length); 204a204 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; 673c673 ListOrcProto.Type types, int includedCount, OrcProto.BloomFilterIndex[] bloomFilterIndices) { --- ListOrcProto.Type types, int includedCount) { 677c677 this.bloomFilterIndices = bloomFilterIndices; --- bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; {noformat} BloomFilter index is not honored by ORC reader -- Key: HIVE-11033 URL: https://issues.apache.org/jira/browse/HIVE-11033 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Allan Yan There is a bug in the org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl class which caused the bloom filter index saved in the ORC file not being used. The root cause is the bloomFilterIndices variable defined in the SargApplier class superseded the one defined in its parent class. Therefore, in the ReaderImpl.pickRowGroups() {code} protected boolean[] pickRowGroups() throws IOException { // if we don't have a sarg or indexes, we read everything if (sargApp == null) { return null; } readRowIndex(currentStripe, included, sargApp.sargColumns); return sargApp.pickRowGroups(stripes.get(currentStripe), indexes); } {code} The bloomFilterIndices populated by readRowIndex() is not picked up by sargApp object. One solution is to make SargApplier.bloomFilterIndices a reference to its parent counterpart. {noformat} 18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original 174d173 bloomFilterIndices = new
[jira] [Commented] (HIVE-11034) Multiple join table producing different results
[ https://issues.apache.org/jira/browse/HIVE-11034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589214#comment-14589214 ] Srini Pindi commented on HIVE-11034: Please see attachments for test data and query info. Multiple join table producing different results --- Key: HIVE-11034 URL: https://issues.apache.org/jira/browse/HIVE-11034 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Environment: Linux 2.6.32-279.19.1.el6.x86_64 Reporter: Srini Pindi Priority: Critical Attachments: hive_issue.zip, steps_to_reproduce_.docx Join between one main table with other tables with different join columns returns wrong results in hive. Changing the order of the joins between main table and other tables is producing different results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11034) Multiple join table producing different results
[ https://issues.apache.org/jira/browse/HIVE-11034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srini Pindi updated HIVE-11034: --- Attachment: steps_to_reproduce_.docx hive_issue.zip Multiple join table producing different results --- Key: HIVE-11034 URL: https://issues.apache.org/jira/browse/HIVE-11034 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Environment: Linux 2.6.32-279.19.1.el6.x86_64 Reporter: Srini Pindi Priority: Critical Attachments: hive_issue.zip, steps_to_reproduce_.docx Join between one main table with other tables with different join columns returns wrong results in hive. Changing the order of the joins between main table and other tables is producing different results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11034) Joining multiple tables producing different results with different order of join
[ https://issues.apache.org/jira/browse/HIVE-11034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srini Pindi updated HIVE-11034: --- Summary: Joining multiple tables producing different results with different order of join (was: Multiple join table producing different results) Joining multiple tables producing different results with different order of join Key: HIVE-11034 URL: https://issues.apache.org/jira/browse/HIVE-11034 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Environment: Linux 2.6.32-279.19.1.el6.x86_64 Reporter: Srini Pindi Priority: Critical Attachments: hive_issue.zip, steps_to_reproduce_.docx Join between one main table with other tables with different join columns returns wrong results in hive. Changing the order of the joins between main table and other tables is producing different results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589223#comment-14589223 ] Rui Li commented on HIVE-10999: --- Hi [~xuefuz], the problem seems to be incorrect naming of the spark-bin tar we packed. We expect the decompressed dir to be {noformat}spark-${spark.version}-bin-hadoop2-without-hive{noformat} Previously we got {{spark-1.3.1-bin-hadoop2-without-hive}} which was correct. But now we have {{spark-1.4.0-SNAPSHOT-bin-2.4.0}}. So during test we can't locate the spark-submit properly. Would you mind take a look at how we packed the tar, especially why it's still a SNAPSHOT? Thanks. Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589224#comment-14589224 ] Chaoyu Tang commented on HIVE-7193: --- Thanks [~ngangam] for the patch. It looks good to me. Regarding to the concern you had whether the AtnProvider should be changed to be implemented as a singleton, I agree with you that you would not address it in this patch for following reasons: 1. The existing code does not implement AtnProvider as a singleton. Making such change might have some backward compatibility issue. For example, what if a user has already implemented and used a CustomAuthenticationProvider which is not for a singleton? 2. The patch only adds several additional read and processing of HiveConf properties in LdapAuthenticationProviderImpl constructor. Compared to LDAP authentication itself, its overhead should be trivial and it should not be a performance bottleneck. 3. In case it turns out the performance is not desirable due to AtnProvider instantiation, we might consider moving some static logic from constructor to a static block to improve runtime performance. Or open a separate JIRA to initiate the investigation to performance implementation (including singleton etc). But this patch will mainly focuses on the LDAP enhancement. 4. As for your concern dont know what the user-coded CustomAuthenticationProvider could do, even if you change the AuthenticationProviderFactory and allow it to be implemented as a singleton, but like you said, we still have no control how he implements the singleton. In addition, the enhancement including its new configuration properties should be properly documented. Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.5.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10841: --- Fix Version/s: 2.0.0 1.3.0 [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Laljo John Pullokkaran Fix For: 1.3.0, 1.2.1, 2.0.0 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, HIVE-10841.2.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic
[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10841: --- Affects Version/s: 2.0.0 [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Laljo John Pullokkaran Fix For: 1.3.0, 1.2.1, 2.0.0 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, HIVE-10841.2.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589081#comment-14589081 ] Alexander Pivovarov commented on HIVE-10841: Currently the patch is committed to https://github.com/apache/hive/commits/branch-1 https://github.com/apache/hive/commits/branch-1.0 https://github.com/apache/hive/commits/branch-1.2 https://github.com/apache/hive/commits/master I updated Fix Version/s field accordingly [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Laljo John Pullokkaran Fix For: 1.3.0, 1.2.1, 2.0.0 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, HIVE-10841.2.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi
[jira] [Commented] (HIVE-11018) Turn on cbo in more q files
[ https://issues.apache.org/jira/browse/HIVE-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589080#comment-14589080 ] Hive QA commented on HIVE-11018: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12739953/HIVE-11018.2.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9008 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28 org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4277/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4277/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4277/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12739953 - PreCommit-HIVE-TRUNK-Build Turn on cbo in more q files --- Key: HIVE-11018 URL: https://issues.apache.org/jira/browse/HIVE-11018 Project: Hive Issue Type: Task Components: Tests Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11018.1.patch, HIVE-11018.2.patch, HIVE-11018.patch There are few tests in which cbo was turned off for various reasons. Those reasons don't exists anymore. For those tests, we should turn on cbo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10841: --- Affects Version/s: (was: 2.0.0) [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Laljo John Pullokkaran Fix For: 1.3.0, 1.2.1, 2.0.0 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, HIVE-10841.2.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats:
[jira] [Updated] (HIVE-10994) Hive.moveFile should not fail on a no-op move
[ https://issues.apache.org/jira/browse/HIVE-10994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-10994: Fix Version/s: 2.0.0 Hive.moveFile should not fail on a no-op move - Key: HIVE-10994 URL: https://issues.apache.org/jira/browse/HIVE-10994 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 1.2.1, 2.0.0 Attachments: HIVE-10994.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587589#comment-14587589 ] Lefty Leverenz commented on HIVE-10841: --- Does branch-1.0 mean version 1.0.1 or is it the same as branch-1 (version 1.3.0)? Today I've seen three commits to refs/heads/branch-1.0 but they don't show Fix Version 1.0.1 on the jira (HIVE-10273, HIVE-10685, and HIVE-10841). Many other commits go to refs/heads/branch-1 so I'm confused. Perhaps we need more details in the wiki. * [Understanding Hive Branches | https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-UnderstandingHiveBranches] [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Laljo John Pullokkaran Fix For: 1.2.1 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, HIVE-10841.2.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
[jira] [Updated] (HIVE-11004) PermGen OOM error in Hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-11004: Description: Periodically Hiveserver2 will become unresponsive and looking in the logs there is the following error: {noformat} 2:28:22.965 PM ERROR org.apache.hadoop.hive.ql.io.orc.OrcInputFormat Unexpected Exception java.lang.OutOfMemoryError: PermGen space 2:28:22.969 PM WARNorg.apache.hive.service.cli.thrift.ThriftCLIService Error fetching results: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.RuntimeException: serious problem at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:343) at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:250) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:656) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:451) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:672) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1655) at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:338) ... 13 more Caused by: java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$Context.waitForTasks(OrcInputFormat.java:478) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:944) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:969) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:362) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:294) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445) ... 17 more Caused by: java.lang.OutOfMemoryError: PermGen space {noformat} There does not appear to be an obvious trigger for this (other than the fact that the error mentions ORC). If further details would be helpful in diagnosing the issue please let me know and I'll supply them. was: Periodically Hiveserver2 will become unresponsive and looking in the logs there is the following error: 2:28:22.965 PM ERROR org.apache.hadoop.hive.ql.io.orc.OrcInputFormat Unexpected Exception java.lang.OutOfMemoryError: PermGen space 2:28:22.969 PM WARNorg.apache.hive.service.cli.thrift.ThriftCLIService Error fetching results: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.RuntimeException: serious problem at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:343) at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:250) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:656) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:451) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:672) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at
[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.
[ https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587618#comment-14587618 ] Daniel Haviv commented on HIVE-10165: - Hi, Are there any plans to merge this to the trunk or is it going to be available only as patch ? Thanks, Daniel Improve hive-hcatalog-streaming extensibility and support updates and deletes. -- Key: HIVE-10165 URL: https://issues.apache.org/jira/browse/HIVE-10165 Project: Hive Issue Type: Improvement Components: HCatalog Affects Versions: 1.2.0 Reporter: Elliot West Assignee: Elliot West Labels: streaming_api Attachments: HIVE-10165.0.patch, HIVE-10165.4.patch, HIVE-10165.5.patch, HIVE-10165.6.patch, HIVE-10165.7.patch, mutate-system-overview.png h3. Overview I'd like to extend the [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest] API so that it also supports the writing of record updates and deletes in addition to the already supported inserts. h3. Motivation We have many Hadoop processes outside of Hive that merge changed facts into existing datasets. Traditionally we achieve this by: reading in a ground-truth dataset and a modified dataset, grouping by a key, sorting by a sequence and then applying a function to determine inserted, updated, and deleted rows. However, in our current scheme we must rewrite all partitions that may potentially contain changes. In practice the number of mutated records is very small when compared with the records contained in a partition. This approach results in a number of operational issues: * Excessive amount of write activity required for small data changes. * Downstream applications cannot robustly read these datasets while they are being updated. * Due to scale of the updates (hundreds or partitions) the scope for contention is high. I believe we can address this problem by instead writing only the changed records to a Hive transactional table. This should drastically reduce the amount of data that we need to write and also provide a means for managing concurrent access to the data. Our existing merge processes can read and retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to an updated form of the hive-hcatalog-streaming API which will then have the required data to perform an update or insert in a transactional manner. h3. Benefits * Enables the creation of large-scale dataset merge processes * Opens up Hive transactional functionality in an accessible manner to processes that operate outside of Hive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.
[ https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587638#comment-14587638 ] Elliot West commented on HIVE-10165: My hope is that it'll be merged to trunk. Thanks. Improve hive-hcatalog-streaming extensibility and support updates and deletes. -- Key: HIVE-10165 URL: https://issues.apache.org/jira/browse/HIVE-10165 Project: Hive Issue Type: Improvement Components: HCatalog Affects Versions: 1.2.0 Reporter: Elliot West Assignee: Elliot West Labels: streaming_api Attachments: HIVE-10165.0.patch, HIVE-10165.4.patch, HIVE-10165.5.patch, HIVE-10165.6.patch, HIVE-10165.7.patch, mutate-system-overview.png h3. Overview I'd like to extend the [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest] API so that it also supports the writing of record updates and deletes in addition to the already supported inserts. h3. Motivation We have many Hadoop processes outside of Hive that merge changed facts into existing datasets. Traditionally we achieve this by: reading in a ground-truth dataset and a modified dataset, grouping by a key, sorting by a sequence and then applying a function to determine inserted, updated, and deleted rows. However, in our current scheme we must rewrite all partitions that may potentially contain changes. In practice the number of mutated records is very small when compared with the records contained in a partition. This approach results in a number of operational issues: * Excessive amount of write activity required for small data changes. * Downstream applications cannot robustly read these datasets while they are being updated. * Due to scale of the updates (hundreds or partitions) the scope for contention is high. I believe we can address this problem by instead writing only the changed records to a Hive transactional table. This should drastically reduce the amount of data that we need to write and also provide a means for managing concurrent access to the data. Our existing merge processes can read and retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to an updated form of the hive-hcatalog-streaming API which will then have the required data to perform an update or insert in a transactional manner. h3. Benefits * Enables the creation of large-scale dataset merge processes * Opens up Hive transactional functionality in an accessible manner to processes that operate outside of Hive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10815) Let HiveMetaStoreClient Choose MetaStore Randomly
[ https://issues.apache.org/jira/browse/HIVE-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587666#comment-14587666 ] Thiruvel Thirumoolan commented on HIVE-10815: - Thanks for pointing that out. Would it help to reuse that in the constructor or open()? Let HiveMetaStoreClient Choose MetaStore Randomly - Key: HIVE-10815 URL: https://issues.apache.org/jira/browse/HIVE-10815 Project: Hive Issue Type: Improvement Components: HiveServer2, Metastore Affects Versions: 1.2.0 Reporter: Nemon Lou Assignee: Nemon Lou Attachments: HIVE-10815.patch Currently HiveMetaStoreClient using a fixed order to choose MetaStore URIs when multiple metastores configured. Choosing MetaStore Randomly will be good for load balance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11019) Can't create an Avro table with uniontype column correctly
[ https://issues.apache.org/jira/browse/HIVE-11019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li reassigned HIVE-11019: -- Assignee: Bing Li Can't create an Avro table with uniontype column correctly -- Key: HIVE-11019 URL: https://issues.apache.org/jira/browse/HIVE-11019 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Bing Li Assignee: Bing Li I tried the example in https://cwiki.apache.org/confluence/display/Hive/AvroSerDe And found that it can't create an AVRO table correctly with uniontype hive create table avro_union(union1 uniontypeFLOAT, BOOLEAN, STRING)STORED AS AVRO; OK Time taken: 0.083 seconds hive describe avro_union; OK union1 uniontypevoid,float,boolean,string Time taken: 0.058 seconds, Fetched: 1 row(s) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10815) Let HiveMetaStoreClient Choose MetaStore Randomly
[ https://issues.apache.org/jira/browse/HIVE-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587743#comment-14587743 ] Nemon Lou commented on HIVE-10815: -- The mechanism in promoteRandomMetaStoreURI() has some limitations: if there's only two metastores,then the second one will always promote ,making if fixed order again. Changing position of the first metastore with a random one is reasonable when need retry,and is better than changing positions of all the metastores randomly before reconnect . That why i keep it and adding a new random mechanism in the constructor. Here is a piece of code that do the promote: {code:java} /** * Swaps the first element of the metastoreUris array with a random element from the * remainder of the array. */ private void promoteRandomMetaStoreURI() { if (metastoreUris.length = 1) { return; } Random rng = new Random(); int index = rng.nextInt(metastoreUris.length - 1) + 1; URI tmp = metastoreUris[0]; metastoreUris[0] = metastoreUris[index]; metastoreUris[index] = tmp; } {code} Let HiveMetaStoreClient Choose MetaStore Randomly - Key: HIVE-10815 URL: https://issues.apache.org/jira/browse/HIVE-10815 Project: Hive Issue Type: Improvement Components: HiveServer2, Metastore Affects Versions: 1.2.0 Reporter: Nemon Lou Assignee: Nemon Lou Attachments: HIVE-10815.patch Currently HiveMetaStoreClient using a fixed order to choose MetaStore URIs when multiple metastores configured. Choosing MetaStore Randomly will be good for load balance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11005) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : Regression on the latest master
[ https://issues.apache.org/jira/browse/HIVE-11005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587897#comment-14587897 ] Jesus Camacho Rodriguez commented on HIVE-11005: These issues should be solved when HIVE-10533 goes in. CBO: Calcite Operator To Hive Operator (Calcite Return Path) : Regression on the latest master -- Key: HIVE-11005 URL: https://issues.apache.org/jira/browse/HIVE-11005 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Jesus Camacho Rodriguez Test cbo_join.q and cbo_views.q on return path failed. Part of the stack trace is {code} 2015-06-15 09:51:53,377 ERROR [main]: parse.CalcitePlanner (CalcitePlanner.java:genOPTree(282)) - CBO failed, skipping CBO. java.lang.IndexOutOfBoundsException: index (0) must be less than size (0) at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305) at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284) at com.google.common.collect.EmptyImmutableList.get(EmptyImmutableList.java:80) at org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveInsertExchange4JoinRule.onMatch(HiveInsertExchange4JoinRule.java:101) at org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:326) at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:515) at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:392) at org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:255) at org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:125) at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:207) at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:194) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:888) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:771) at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:876) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)