[jira] [Created] (HIVE-23771) load数据到hive,limit 显示用户名中文正确,where 用户名乱码,并且不能使用用户名比对
wang created HIVE-23771: --- Summary: load数据到hive,limit 显示用户名中文正确,where 用户名乱码,并且不能使用用户名比对 Key: HIVE-23771 URL: https://issues.apache.org/jira/browse/HIVE-23771 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 2.1.1 Reporter: wang Fix For: 2.1.1 Attachments: image-2020-06-29-15-04-23-999.png, image-2020-06-29-15-08-25-923.png, image-2020-06-29-15-10-10-310.png 建表语句:create table smg_t_usr_inf_23( Usr_ID string, RlgnSvcPltfrmUsr_TpCd string, Rlgn_InsID string, Usr_Nm string , ) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="|@|") stored as textfile 导入数据:LOAD DATA LOCAL INPATH '/home/ap/USR_INF 20200622_0001.dat' INTO TABLE usr_inf select * from usr_inf limit 10;显示数据: !image-2020-06-29-15-04-23-999.png! select * from usr_inf where usr_nm = '胡学玲' ;无显示数据: !image-2020-06-29-15-08-25-923.png! 其他select * from usr_inf where usr_id='***';显示数据 !image-2020-06-29-15-10-10-310.png! . 求大神解答,为什么导入的数据是中文但是where就有问题,直接insert into table aa select * from usr_inf;新表 的usr_nm 字段也是同上 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] Updated: (HIVE-1746) Support for using ALTER to set IDXPROPERTIES
[ https://issues.apache.org/jira/browse/HIVE-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1746: --- Attachment: 1746.prelim.patch Preliminary patch. I think I need to add an alter_index function to the HiveMetaStoreClient, which I think requires editing the thrift files. I'm not sure if that is the correct way to go about that... is there a better way to allow us to change the properties on an existing index? If that is correct, how do I generate the new ThriftHiveMetaStoreClient.java? Support for using ALTER to set IDXPROPERTIES Key: HIVE-1746 URL: https://issues.apache.org/jira/browse/HIVE-1746 Project: Hive Issue Type: Improvement Reporter: Marquis Wang Assignee: Marquis Wang Attachments: 1746.prelim.patch Hive-1498 has support for IDXPROPERTIES on index creation, so now we want to support ALTERing those properties. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1746) Support for using ALTER to set IDXPROPERTIES
[ https://issues.apache.org/jira/browse/HIVE-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1746: --- Attachment: HIVE-1746.2.patch New patch. Includes thrift generated files and should work now. Support for using ALTER to set IDXPROPERTIES Key: HIVE-1746 URL: https://issues.apache.org/jira/browse/HIVE-1746 Project: Hive Issue Type: Improvement Reporter: Marquis Wang Assignee: Marquis Wang Attachments: 1746.prelim.patch, HIVE-1746.2.patch Hive-1498 has support for IDXPROPERTIES on index creation, so now we want to support ALTERing those properties. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1746) Support for using ALTER to set IDXPROPERTIES
[ https://issues.apache.org/jira/browse/HIVE-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1746: --- Attachment: HIVE-1746.3.patch Support for using ALTER to set IDXPROPERTIES Key: HIVE-1746 URL: https://issues.apache.org/jira/browse/HIVE-1746 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.7.0 Reporter: Marquis Wang Assignee: Marquis Wang Fix For: 0.7.0 Attachments: 1746.prelim.patch, HIVE-1746.2.patch, HIVE-1746.3.patch Hive-1498 has support for IDXPROPERTIES on index creation, so now we want to support ALTERing those properties. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1746) Support for using ALTER to set IDXPROPERTIES
[ https://issues.apache.org/jira/browse/HIVE-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930483#action_12930483 ] Marquis Wang commented on HIVE-1746: New patch. Eliminates println calls, adds private updateModifiedParameters method, and pass the database name into AlterIndexDesc. Otherwise the same. Support for using ALTER to set IDXPROPERTIES Key: HIVE-1746 URL: https://issues.apache.org/jira/browse/HIVE-1746 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.7.0 Reporter: Marquis Wang Assignee: Marquis Wang Fix For: 0.7.0 Attachments: 1746.prelim.patch, HIVE-1746.2.patch, HIVE-1746.3.patch Hive-1498 has support for IDXPROPERTIES on index creation, so now we want to support ALTERing those properties. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1496) enhance CREATE INDEX to support immediate index build
[ https://issues.apache.org/jira/browse/HIVE-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang reassigned HIVE-1496: -- Assignee: Marquis Wang (was: Russell Melick) enhance CREATE INDEX to support immediate index build - Key: HIVE-1496 URL: https://issues.apache.org/jira/browse/HIVE-1496 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.7.0 Reporter: John Sichi Assignee: Marquis Wang Fix For: 0.7.0 Currently we only support WITH DEFERRED REBUILD. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1803) Implement bitmap indexing in Hive
Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Reporter: Marquis Wang Assignee: Marquis Wang Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Attachment: bitmap_index_2.png Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Reporter: Marquis Wang Assignee: Marquis Wang Attachments: bitmap_index_1.png, bitmap_index_2.png Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Attachment: bitmap_index_1.png Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Reporter: Marquis Wang Assignee: Marquis Wang Attachments: bitmap_index_1.png, bitmap_index_2.png Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934088#action_12934088 ] Marquis Wang commented on HIVE-1803: Added a proposed design document on Hive wiki at http://wiki.apache.org/hadoop/Hive/IndexDev/Bitmap Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Reporter: Marquis Wang Assignee: Marquis Wang Attachments: bitmap_index_1.png, bitmap_index_2.png Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Attachment: HIVE-1803.2.patch We fixed the problem in BitmapCollectSet by looking at the PercentileApprox UDAF to figure out how to use an array an input to a UDAF. This new patch is a working implementation of bitmap indexing. The new test index_bitmap.q shows how to use the index. However, I am unable to add the test itself, and get errors when I run ant test -Dtestcase=TestCliDriver -Dqfile=index_bitmap.q -Doverwrite=true -Dtest.silent=false It says Exception: java.lang.RuntimeException: The table default__srcpart_srcpart_index_proj__ is an index table. Please do drop index instead. wrt to the ALTER INDEX REBUILD line in the test. We're pretty confused about whether we're doing the new test incorrectly and would appreciate any help. While we're working to get around that we're also going to go ahead and work on a compressed bitmap, since this implementation does no compression. Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, bitmap_index_1.png, bitmap_index_2.png Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12999657#comment-12999657 ] Marquis Wang commented on HIVE-1803: Thanks Jeff. We've actually seen this and have a patch in the works (next couple days) that uses it. Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, bitmap_index_1.png, bitmap_index_2.png Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Status: Patch Available (was: Open) Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Attachment: HIVE-1803.7.patch New patch which I believe takes care of all the issues in the review for patch 6. Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Status: Patch Available (was: Open) Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2078) Row-level indexing in bitmap indexes
Row-level indexing in bitmap indexes Key: HIVE-2078 URL: https://issues.apache.org/jira/browse/HIVE-2078 Project: Hive Issue Type: Improvement Reporter: Marquis Wang Priority: Minor Row-level indexing would greatly improve bitmap indexes. Without row-level indexing, bitmap indexes are useless without using multiple indexes and combining their bitmaps, since a block is likely to have all distinct values a column has, as there are millions of rows in one block. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Attachment: HIVE-1803.8.patch New patch with minimal changes (got rid of some unused imports) Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Status: Patch Available (was: Open) John, I'm resubmitting the patch for inclusion and opened a new ticket for creating row-level indexing. Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Attachment: HIVE-1803.9.patch Uploaded new patch that addresses John's comments on patch 8. Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Status: Patch Available (was: Open) Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Attachment: HIVE-1803.10.patch Update patch to include more missing javadocs. Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Status: Patch Available (was: Open) Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.11.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.patch Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Attachment: unit-tests.patch HIVE-1803.11.patch New patch that fixes the minor javadocs comments from patch 10. A unit-tests patch that updates all the unit tests that were affected by the virtual column change. Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.11.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.patch Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Attachment: unit-tests.2.patch New unit tests patch that should fix some more tests. John, I didn't see any failures in TestMTQueries even before adding this new patch. I'm not sure why that would be, but I definitely fixed some things in the other two tests. Also this patch only includes the unit tests, so you will need to include patch 11 as well. Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.11.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, unit-tests.patch Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017069#comment-13017069 ] Marquis Wang commented on HIVE-1803: I re-pulled from trunk and made a new patch and there was no difference between the two. If you have the original unit-tests.patch applied then this patch will fail. Can you try patching HIVE-1803.11.patch followed by unit-tests.2.patch on a clean checkout? Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.11.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, unit-tests.patch Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Status: Patch Available (was: Open) Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.11.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, unit-tests.patch Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Attachment: unit-tests.3.patch New patch for unit tests that hopefully shouldn't conflict this time. I looked into changing the code so that the outputColumnNames in explains are not affected by virtual columns, but didn't really get anywhere. Besides, wouldn't I have the same problem with commits since the unit tests were changed for the first two virtual columns added? I figured I'd go ahead and submit this patch again and if you thought I should keep on looking into that you can not accept it. :-) Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.11.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, unit-tests.3.patch, unit-tests.patch Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Status: Patch Available (was: Open) Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.11.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, unit-tests.3.patch, unit-tests.patch Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Attachment: HIVE-1803.12.patch New patch that implements John's suggestions about adding the hive.exec.rowoffset configuration variable. This patch fixes the issues with column numbers in explains. John, I'm still seeing some test failures in tests such as combine2.q, bucketmapjoin1.q, bucketmapjoin4.q. It looks like one of the numRows outputs is saying zero rows instead of some non-zero number before in an explain in each of these tests. I'm not really sure what could be causing this and don't see anything in this patch that can affect these tests. Do you have any ideas? Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.11.patch, HIVE-1803.12.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, unit-tests.3.patch, unit-tests.patch Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023097#comment-13023097 ] Marquis Wang commented on HIVE-1803: I don't see anything that needs to be deleted in my checkout. Where is the stats temp database? Also, if you think it might just be something on our side, can you just run the tests and see if it passes for you? When I ran them I didn't see any other issues besides those, I don't think. Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.11.patch, HIVE-1803.12.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, unit-tests.3.patch, unit-tests.patch Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Status: Patch Available (was: Open) Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.11.patch, HIVE-1803.12.patch, HIVE-1803.13.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, unit-tests.3.patch, unit-tests.patch Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Attachment: HIVE-1803.13.patch New patch that updates HADOOP_CLASSPATH and doesn't change tests except adding new tests and show_functions.q. Fingers crossed for this one passing. I'm optimistic. Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.11.patch, HIVE-1803.12.patch, HIVE-1803.13.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, unit-tests.3.patch, unit-tests.patch Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Attachment: HIVE-1803.14.patch HIVE-1803.14.patch The issue with the last patch was the order in which VirtualColumn.getRegistry().iterator() was returning. The old code stored the virtual column registry as a HashMap, so I added the columns to the registry in the order the HashMap would have returned them. This patch fixes that. I'm still seeing errors in groupby1.q through groupby6.q. It looks like various numbers are returning wrong, but it doesn't appear to be related to the virtual columns. I can't really tell whether there is a pattern to it. can you take a look? {noformat} [junit] stringCNTR_NAME_GBY_28_NUM_INPUT_ROWS/string [junit] 1345c1341 [junit] stringCNTR_NAME_GBY_4_NUM_OUTPUT_ROWS/string [junit] --- [junit] stringCNTR_NAME_GBY_28_NUM_OUTPUT_ROWS/string [junit] 1348c1344 [junit] stringCNTR_NAME_GBY_4_TIME_TAKEN/string [junit] --- [junit] stringCNTR_NAME_GBY_28_TIME_TAKEN/string [junit] 1351c1347 [junit] stringCNTR_NAME_GBY_4_FATAL_ERROR/string [junit] --- [junit] stringCNTR_NAME_GBY_28_FATAL_ERROR/string {/noformat} Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.11.patch, HIVE-1803.12.patch, HIVE-1803.13.patch, HIVE-1803.14.patch, HIVE-1803.14.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, unit-tests.3.patch, unit-tests.patch Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Attachment: HIVE-1803.15.patch HIVE-1803.15.patch New patch that updates the groupby tests in TestParse. The number from the operator ID was not consistent, it gives different results when I run just one test at a time vs. all the tests at once, which is why I thought they needed to be updated. The result as it was before works for those tests still. Another thing needed to be changed for me though, for the groupby tests: {noformat} @@ -521,7 +521,8 @@ stringsum/string /void void property=mode - object class=org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator$Mode method=valueOf + object class=java.lang.Enum method=valueOf + classorg.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator$Mode/class stringPARTIAL1/string /object /void {noformat} The new patch updates those tests. Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.11.patch, HIVE-1803.12.patch, HIVE-1803.13.patch, HIVE-1803.14.patch, HIVE-1803.14.patch, HIVE-1803.15.patch, HIVE-1803.15.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, unit-tests.3.patch, unit-tests.patch Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2131) Bitmap Operation UDF doesn't clear return list
Bitmap Operation UDF doesn't clear return list -- Key: HIVE-2131 URL: https://issues.apache.org/jira/browse/HIVE-2131 Project: Hive Issue Type: Bug Reporter: Marquis Wang Assignee: Marquis Wang The AbstractGenericUDFEWAHBitmapBop.java does not clear the return list when evaluate() is called, causing each subsequent call to a bitmap operation to return the wrong values. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2131) Bitmap Operation UDF doesn't clear return list
[ https://issues.apache.org/jira/browse/HIVE-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-2131: --- Attachment: HIVE-2131.1.patch Small patch that solves this problem. Bitmap Operation UDF doesn't clear return list -- Key: HIVE-2131 URL: https://issues.apache.org/jira/browse/HIVE-2131 Project: Hive Issue Type: Bug Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-2131.1.patch The AbstractGenericUDFEWAHBitmapBop.java does not clear the return list when evaluate() is called, causing each subsequent call to a bitmap operation to return the wrong values. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2131) Bitmap Operation UDF doesn't clear return list
[ https://issues.apache.org/jira/browse/HIVE-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-2131: --- Status: Patch Available (was: Open) Bitmap Operation UDF doesn't clear return list -- Key: HIVE-2131 URL: https://issues.apache.org/jira/browse/HIVE-2131 Project: Hive Issue Type: Bug Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-2131.1.patch The AbstractGenericUDFEWAHBitmapBop.java does not clear the return list when evaluate() is called, causing each subsequent call to a bitmap operation to return the wrong values. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2131) Bitmap Operation UDF doesn't clear return list
[ https://issues.apache.org/jira/browse/HIVE-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-2131: --- Attachment: HIVE-2131.2.patch I've updated the udf_bitmap_and and udf_bitmap_or tests so that they would have detected the bug in the old code. Bitmap Operation UDF doesn't clear return list -- Key: HIVE-2131 URL: https://issues.apache.org/jira/browse/HIVE-2131 Project: Hive Issue Type: Bug Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-2131.1.patch, HIVE-2131.2.patch The AbstractGenericUDFEWAHBitmapBop.java does not clear the return list when evaluate() is called, causing each subsequent call to a bitmap operation to return the wrong values. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2036) Update bitmap indexes for automatic usage
[ https://issues.apache.org/jira/browse/HIVE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036449#comment-13036449 ] Marquis Wang commented on HIVE-2036: Making notes on how to do this: One of the difficult/different parts about using bitmap indexes is that the only time they become useful is when multiple indexes are combined. Thus, you need a query that joins the various bitmap index tables and returns the blocks that contain the rows we want. Thus the two parts to writing the automatic use index handler for bitmap indexes are: 1. Figuring out what indexes to use: As mentioned above, you may need to extend the IndexPredicateAnalyzer to support ORs and possibly to return a tree of predicates (I don't think it already does this). 2. Building a query that accesses the index tables: This is an example query that I know works for querying the index tables in the query {noformat} SELECT * FROM lineitem WHERE L_QUANTITY = 50.0 AND L_DISCOUNT = 0.08 AND L_TAX = 0.01; {noformat} {noformat} SELECT bucketname AS `_bucketname`, COLLECT_SET(offset) as `_offsets` FROM (SELECT `_bucketname` AS bucketname, `_offset` AS offset FROM (SELECT ab.`_bucketname`, ab.`_offset`, EWAH_BITMAP_AND(ab.bitmap, c.`_bitmaps`) as bitmap FROM (SELECT a.`_bucketname`, b.`_offset`, EWAH_BITMAP_AND(a.`_bitmaps`, b.`_bitmaps`) as bitmap FROM (SELECT * FROM default__lineitem_quantity__ WHERE L_QUANTITY = 50.0) a JOIN (SELECT * FROM default__lineitem_discount__ WHERE L_DISCOUNT = 0.08) b ON a.`_bucketname` = b.`_bucketname` AND a.`_offset` = b.`_offset`) ab JOIN (SELECT * FROM default__lineitem_tax__ WHERE L_TAX = 0.01) c ON ab.`_bucketname` = c.`_bucketname` AND ab.`_offset` = c.`_offset`) abc WHERE NOT EWAH_BITMAP_EMPTY(abc.bitmap) ) t GROUP BY bucketname; {noformat} This format is perfect for joining any number of AND predicates. I'm pretty sure you can figure out how to expand them to include OR predicates and different grounping of predicates as well. If you make any changes/extensions to the format you should be sure to test them to make sure they have the performance characteristics you want. Update bitmap indexes for automatic usage - Key: HIVE-2036 URL: https://issues.apache.org/jira/browse/HIVE-2036 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: Russell Melick Assignee: Jeffrey Lym HIVE-1644 will provide automatic usage of indexes, and HIVE-1803 adds bitmap index support. The bitmap code will need to be extended after it is committed to enable automatic use of indexing. Most work will be focused in the BitmapIndexHandler, which needs to generate the re-entrant QL index query. There may also be significant work in the IndexPredicateAnalyzer to support predicates with OR's, instead of just AND's as it is currently. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2036) Update bitmap indexes for automatic usage
[ https://issues.apache.org/jira/browse/HIVE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036709#comment-13036709 ] Marquis Wang commented on HIVE-2036: Russell is right. hive.index.compact.file is deprecated and replaced with hive.index.blockfilter.file (I think). I kept the former around for backwards-compatibility reasons, but we should try to avoid using it. Update bitmap indexes for automatic usage - Key: HIVE-2036 URL: https://issues.apache.org/jira/browse/HIVE-2036 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: Russell Melick Assignee: Syed S. Albiz HIVE-1644 will provide automatic usage of indexes, and HIVE-1803 adds bitmap index support. The bitmap code will need to be extended after it is committed to enable automatic use of indexing. Most work will be focused in the BitmapIndexHandler, which needs to generate the re-entrant QL index query. There may also be significant work in the IndexPredicateAnalyzer to support predicates with OR's, instead of just AND's as it is currently. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-942) use bucketing for group by
[ https://issues.apache.org/jira/browse/HIVE-942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404730#comment-13404730 ] Lianhui Wang commented on HIVE-942: --- i think in HIVE-931 ,the group by keys must be the same with the sort keys. but in the case that the group by keys contain the sort keys, it may be complete it to use the hash table on the mapper. for example: t is a bucket table, sort by c1,c2. sql: select t.c1,t.c2,t.c3.sum(t.c4) from t group by t.c1,t.c2,t.c3. i think generally that only use the hash table on the mapper.so do not do anything on the reducer. use bucketing for group by -- Key: HIVE-942 URL: https://issues.apache.org/jira/browse/HIVE-942 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Group by on a bucketed column can be completely performed on the mapper if the split can be adjusted to span the key boundary. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3254) Reuse RunningJob
[ https://issues.apache.org/jira/browse/HIVE-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13424569#comment-13424569 ] Lianhui Wang commented on HIVE-3254: yes, i think that can do. but maybe the newRj is null.so you must check the null. because the jobtracker always cache the fixed-size completed job's infos. if the job that you get have completed,maybe the JT removed the job's information. Reuse RunningJob - Key: HIVE-3254 URL: https://issues.apache.org/jira/browse/HIVE-3254 Project: Hive Issue Type: Bug Reporter: binlijin private MapRedStats progress(ExecDriverTaskHandle th) throws IOException { while (!rj.isComplete()) { try { Thread.sleep(pullInterval); } catch (InterruptedException e) { } RunningJob newRj = jc.getJob(rj.getJobID()); } } Should we reuse the RunningJob? If not, why? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3329) Support bucket filtering when where expression or join key expression has the bucket key
Lianhui Wang created HIVE-3329: -- Summary: Support bucket filtering when where expression or join key expression has the bucket key Key: HIVE-3329 URL: https://issues.apache.org/jira/browse/HIVE-3329 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Lianhui Wang in HIVE-3306, it introduces a context. example: select /* + MAPJOIN(a) */ count FROM bucket_small a JOIN bucket_big b ON a.key + a.key = b.key also there are some other contexts.i know the following example: 1. join expression is ON (a.key = b.key and a.key=10); 2. select * from bucket_small where a.key=10; 3. the table is a partition table,that maybe complex. example: CREATE TABLE srcbucket_part (key string, value string) partitioned by (ds string) CLUSTERED BY (key) INTO 4 BUCKETS STORED AS RCFile; select * from srcbucket_part where key='455' and ds='2008-04-08'; maybe complex sql is: select * from srcbucket_part where (key='455' and ds='2008-04-08') or ds='2008-04-09'; these contexts should not scan full table's files and only scan the some bucket files in the table path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3306) SMBJoin/BucketMapJoin should be allowed only when join key expression is exactly matches with sort/cluster key
[ https://issues.apache.org/jira/browse/HIVE-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427127#comment-13427127 ] Lianhui Wang commented on HIVE-3306: @Namit, i created a new jira HIVE-3329, maybe there has some tasks. now i finish the work that the table is not partition table. next i will work for the partition table. SMBJoin/BucketMapJoin should be allowed only when join key expression is exactly matches with sort/cluster key -- Key: HIVE-3306 URL: https://issues.apache.org/jira/browse/HIVE-3306 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Navis Assignee: Navis Priority: Minor CREATE TABLE bucket_small (key int, value string) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; load data local inpath '/home/navis/apache/oss-hive/data/files/srcsortbucket1outof4.txt' INTO TABLE bucket_small; load data local inpath '/home/navis/apache/oss-hive/data/files/srcsortbucket2outof4.txt' INTO TABLE bucket_small; CREATE TABLE bucket_big (key int, value string) CLUSTERED BY (key) SORTED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE; load data local inpath '/home/navis/apache/oss-hive/data/files/srcsortbucket1outof4.txt' INTO TABLE bucket_big; load data local inpath '/home/navis/apache/oss-hive/data/files/srcsortbucket2outof4.txt' INTO TABLE bucket_big; load data local inpath '/home/navis/apache/oss-hive/data/files/srcsortbucket3outof4.txt' INTO TABLE bucket_big; load data local inpath '/home/navis/apache/oss-hive/data/files/srcsortbucket4outof4.txt' INTO TABLE bucket_big; select count(*) FROM bucket_small a JOIN bucket_big b ON a.key + a.key = b.key; select /* + MAPJOIN(a) */ count(*) FROM bucket_small a JOIN bucket_big b ON a.key + a.key = b.key; returns 116 (same) But with BucketMapJoin or SMBJoin, it returns 61. But this should not be allowed cause hash(a.key) != hash(a.key + a.key). Bucket context should be utilized only with exact matching join expression with sort/cluster key. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3430) group by followed by join with the same key should be optimized
[ https://issues.apache.org/jira/browse/HIVE-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717883#comment-13717883 ] Lianhui Wang commented on HIVE-3430: Yin Huai,very nice work! group by followed by join with the same key should be optimized --- Key: HIVE-3430 URL: https://issues.apache.org/jira/browse/HIVE-3430 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.10.0 Reporter: Namit Jain -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x
[ https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739221#comment-13739221 ] tagus wang commented on HIVE-4423: -- this has a bug in this: System.arraycopy(buffer, buffer.length - prefix - 1, buffer, 0, prefix); it should be System.arraycopy(buffer, buffer.length - prefix, buffer, 0, prefix); Improve RCFile::sync(long) 10x -- Key: HIVE-4423 URL: https://issues.apache.org/jira/browse/HIVE-4423 Project: Hive Issue Type: Improvement Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM) Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: optimization Fix For: 0.12.0 Attachments: HIVE-4423.patch RCFile::sync(long) takes approx ~1 second everytime it gets called because of the inner loops in the function. From what was observed with HDFS-4710, single byte reads are an order of magnitude slower than larger 512 byte buffer reads. Even when disk I/O is buffered to this size, there is overhead due to the synchronized read() methods in BlockReaderLocal RemoteBlockReader classes. Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) call will speed this function 10x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5100) RCFile::sync(long) missing 1 byte in System.arraycopy()
tagus wang created HIVE-5100: Summary: RCFile::sync(long) missing 1 byte in System.arraycopy() Key: HIVE-5100 URL: https://issues.apache.org/jira/browse/HIVE-5100 Project: Hive Issue Type: Bug Reporter: tagus wang this has a bug in this: System.arraycopy(buffer, buffer.length - prefix - 1, buffer, 0, prefix); it should be System.arraycopy(buffer, buffer.length - prefix, buffer, 0, prefix); it is missing 1 byte at the end. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x
[ https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740559#comment-13740559 ] tagus wang commented on HIVE-4423: -- Gopal V, i report it in HIVE-5100, but i cannot assign it to you. so you need to help yourself. Improve RCFile::sync(long) 10x -- Key: HIVE-4423 URL: https://issues.apache.org/jira/browse/HIVE-4423 Project: Hive Issue Type: Improvement Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM) Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: optimization Fix For: 0.12.0 Attachments: HIVE-4423.patch RCFile::sync(long) takes approx ~1 second everytime it gets called because of the inner loops in the function. From what was observed with HDFS-4710, single byte reads are an order of magnitude slower than larger 512 byte buffer reads. Even when disk I/O is buffered to this size, there is overhead due to the synchronized read() methods in BlockReaderLocal RemoteBlockReader classes. Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) call will speed this function 10x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5168) Extend Hive for spatial query support
Fusheng Wang created HIVE-5168: -- Summary: Extend Hive for spatial query support Key: HIVE-5168 URL: https://issues.apache.org/jira/browse/HIVE-5168 Project: Hive Issue Type: New Feature Reporter: Fusheng Wang I would like to propose to incorporate a newly developed spatial querying component into Hive. We have recently developed a high performance MapReduce based spatial querying system Hadoop-GIS, to support large scale spatial queries and analytics. Hadoop-GIS is a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through space partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects on MapReduce. Hadoop-GIS takes advantage of global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. We have an alpha release. We look forward to contributors in Hive community to contribute to the system. github: https://github.com/hadoop-gis Hadoop-GIS wiki: https://web.cci.emory.edu/confluence/display/HadoopGIS References: 1. Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, Joel Saltz: Hadoop-GIS: A High Performance Spatial Data Warehousing System Over MapReduce. In Proceedings of the 39th International Conference on Very Large Databases (VLDB'2013), Trento, Italy, August 26-30, 2013. http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p726-aji.pdf 2. Ablimit Aji, Fusheng Wang and Joel Saltz: Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data. In Proceedings of the 20th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2012), Redondo Beach, California, USA, November 6-9, 2012. http://confluence.cci.emory.edu:8090/download/attachments/6193390/SIGSpatial2012TechReport.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5168) Extend Hive for spatial query support
[ https://issues.apache.org/jira/browse/HIVE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754055#comment-13754055 ] Fusheng Wang commented on HIVE-5168: The DesignDocs wiki doesn't allow uploads from non-admin users. Should I update it here? Extend Hive for spatial query support - Key: HIVE-5168 URL: https://issues.apache.org/jira/browse/HIVE-5168 Project: Hive Issue Type: New Feature Reporter: Fusheng Wang Labels: Hadoop-GIS, Spatial, I would like to propose to incorporate a newly developed spatial querying component into Hive. We have recently developed a high performance MapReduce based spatial querying system Hadoop-GIS, to support large scale spatial queries and analytics. Hadoop-GIS is a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through space partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects on MapReduce. Hadoop-GIS takes advantage of global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. We have an alpha release. We look forward to contributors in Hive community to contribute to the system. github: https://github.com/hadoop-gis Hadoop-GIS wiki: https://web.cci.emory.edu/confluence/display/HadoopGIS References: 1. Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, Joel Saltz: Hadoop-GIS: A High Performance Spatial Data Warehousing System Over MapReduce. In Proceedings of the 39th International Conference on Very Large Databases (VLDB'2013), Trento, Italy, August 26-30, 2013. http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p726-aji.pdf 2. Ablimit Aji, Fusheng Wang and Joel Saltz: Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data. In Proceedings of the 20th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2012), Redondo Beach, California, USA, November 6-9, 2012. http://confluence.cci.emory.edu:8090/download/attachments/6193390/SIGSpatial2012TechReport.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5168) Extend Hive for spatial query support
[ https://issues.apache.org/jira/browse/HIVE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771817#comment-13771817 ] Fusheng Wang commented on HIVE-5168: A draft design document has been uploaded: https://cwiki.apache.org/confluence/display/Hive/Spatial+queries Extend Hive for spatial query support - Key: HIVE-5168 URL: https://issues.apache.org/jira/browse/HIVE-5168 Project: Hive Issue Type: New Feature Reporter: Fusheng Wang Labels: Hadoop-GIS, Spatial, I would like to propose to incorporate a newly developed spatial querying component into Hive. We have recently developed a high performance MapReduce based spatial querying system Hadoop-GIS, to support large scale spatial queries and analytics. Hadoop-GIS is a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through space partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects on MapReduce. Hadoop-GIS takes advantage of global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. We have an alpha release. We look forward to contributors in Hive community to contribute to the system. github: https://github.com/hadoop-gis Hadoop-GIS wiki: https://web.cci.emory.edu/confluence/display/HadoopGIS References: 1. Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, Joel Saltz: Hadoop-GIS: A High Performance Spatial Data Warehousing System Over MapReduce. In Proceedings of the 39th International Conference on Very Large Databases (VLDB'2013), Trento, Italy, August 26-30, 2013. http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p726-aji.pdf 2. Ablimit Aji, Fusheng Wang and Joel Saltz: Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data. In Proceedings of the 20th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2012), Redondo Beach, California, USA, November 6-9, 2012. http://confluence.cci.emory.edu:8090/download/attachments/6193390/SIGSpatial2012TechReport.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4501) HS2 memory leak - FileSystem objects in FileSystem.CACHE
[ https://issues.apache.org/jira/browse/HIVE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Wang updated HIVE-4501: - Priority: Critical (was: Major) HS2 memory leak - FileSystem objects in FileSystem.CACHE Key: HIVE-4501 URL: https://issues.apache.org/jira/browse/HIVE-4501 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Priority: Critical Attachments: HIVE-4501.1.patch org.apache.hadoop.fs.FileSystem objects are getting accumulated in FileSystem.CACHE, with HS2 in unsecure mode. As a workaround, it is possible to set fs.hdfs.impl.disable.cache and fs.file.impl.disable.cache to false. Users should not have to bother with this extra configuration. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4501) HS2 memory leak - FileSystem objects in FileSystem.CACHE
[ https://issues.apache.org/jira/browse/HIVE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Wang updated HIVE-4501: - Description: org.apache.hadoop.fs.FileSystem objects are getting accumulated in FileSystem.CACHE, with HS2 in unsecure mode. As a workaround, it is possible to set fs.hdfs.impl.disable.cache and fs.file.impl.disable.cache to true. Users should not have to bother with this extra configuration. As a workaround disable impersonation by setting hive.server2.enable.doAs to false. was: org.apache.hadoop.fs.FileSystem objects are getting accumulated in FileSystem.CACHE, with HS2 in unsecure mode. As a workaround, it is possible to set fs.hdfs.impl.disable.cache and fs.file.impl.disable.cache to true. Users should not have to bother with this extra configuration. HS2 memory leak - FileSystem objects in FileSystem.CACHE Key: HIVE-4501 URL: https://issues.apache.org/jira/browse/HIVE-4501 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Priority: Critical Attachments: HIVE-4501.1.patch org.apache.hadoop.fs.FileSystem objects are getting accumulated in FileSystem.CACHE, with HS2 in unsecure mode. As a workaround, it is possible to set fs.hdfs.impl.disable.cache and fs.file.impl.disable.cache to true. Users should not have to bother with this extra configuration. As a workaround disable impersonation by setting hive.server2.enable.doAs to false. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4501) HS2 memory leak - FileSystem objects in FileSystem.CACHE
[ https://issues.apache.org/jira/browse/HIVE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13778247#comment-13778247 ] Henry Wang commented on HIVE-4501: -- Setting hive.server2.enable.doAs to false is a workaround. HS2 memory leak - FileSystem objects in FileSystem.CACHE Key: HIVE-4501 URL: https://issues.apache.org/jira/browse/HIVE-4501 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Priority: Critical Attachments: HIVE-4501.1.patch org.apache.hadoop.fs.FileSystem objects are getting accumulated in FileSystem.CACHE, with HS2 in unsecure mode. As a workaround, it is possible to set fs.hdfs.impl.disable.cache and fs.file.impl.disable.cache to true. Users should not have to bother with this extra configuration. As a workaround disable impersonation by setting hive.server2.enable.doAs to false. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4014) Hive+RCFile is not doing column pruning and reading much more data than necessary
[ https://issues.apache.org/jira/browse/HIVE-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586701#comment-13586701 ] Lianhui Wang commented on HIVE-4014: i donot think that. i see the code. in HiveInputFormat and CombineHiveInputFormat's getRecordReader(), it calls pushProjectionsAndFilters(). also in pushProjectionsAndFilters(), from TableScanOperator it get needed columns and set these ids to hive.io.file.readcolumn.ids. and then in RCFile.Reader will read hive.io.file.readcolumn.ids to skip column. maybe the counter has some mistakes. if i have mistake,please tell me.thx. Hive+RCFile is not doing column pruning and reading much more data than necessary - Key: HIVE-4014 URL: https://issues.apache.org/jira/browse/HIVE-4014 Project: Hive Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli With even simple projection queries, I see that HDFS bytes read counter doesn't show any reduction in the amount of data read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4014) Hive+RCFile is not doing column pruning and reading much more data than necessary
[ https://issues.apache.org/jira/browse/HIVE-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589226#comment-13589226 ] Lianhui Wang commented on HIVE-4014: hi,Tamas thank you very much,you are right. also i think rcfile.reader are not very efficient. the readed column ids are transfer to rcfile.reader. Hive+RCFile is not doing column pruning and reading much more data than necessary - Key: HIVE-4014 URL: https://issues.apache.org/jira/browse/HIVE-4014 Project: Hive Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli With even simple projection queries, I see that HDFS bytes read counter doesn't show any reduction in the amount of data read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3430) group by followed by join with the same key should be optimized
[ https://issues.apache.org/jira/browse/HIVE-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590321#comment-13590321 ] Lianhui Wang commented on HIVE-3430: also should consider the following query: SELECT a.key, a.cnt, b.key, a.cnt FROM (SELECT x.key as key, count(x.value) AS cnt FROM src x group by x.key) a JOIN src b ON (a.key = b.key); group by followed by join with the same key should be optimized --- Key: HIVE-3430 URL: https://issues.apache.org/jira/browse/HIVE-3430 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.10.0 Reporter: Namit Jain -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3963) Allow Hive to connect to RDBMS
[ https://issues.apache.org/jira/browse/HIVE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13596639#comment-13596639 ] Lianhui Wang commented on HIVE-3963: i think that must support as clause like transform syntax. for example: SELECT jdbcload('driver','url','user','password','sql') as c1,c2 FROM dual; Allow Hive to connect to RDBMS -- Key: HIVE-3963 URL: https://issues.apache.org/jira/browse/HIVE-3963 Project: Hive Issue Type: New Feature Components: Import/Export, JDBC, SQL, StorageHandler Affects Versions: 0.9.0, 0.10.0, 0.9.1, 0.11.0 Reporter: Maxime LANCIAUX I am thinking about something like : SELECT jdbcload('driver','url','user','password','sql') FROM dual; There is already a JIRA https://issues.apache.org/jira/browse/HIVE-1555 for JDBCStorageHandler -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4137) optimize group by followed by joins for bucketed/sorted tables
[ https://issues.apache.org/jira/browse/HIVE-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13596690#comment-13596690 ] Lianhui Wang commented on HIVE-4137: in addition. for bucketed/sorted tables, for single group by operator,it only needs map-group by operator and doesnot have reduce-group by operator. example: select key,aggr() from T1 group by key. now plan is TS-SEL-GBY-RS-GBY-SEL-FS but that can chang to following plan TS-SEL-GBY-SEL-FS optimize group by followed by joins for bucketed/sorted tables -- Key: HIVE-4137 URL: https://issues.apache.org/jira/browse/HIVE-4137 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Consider the following scenario: create table T1 (...) clustered by (key) sorted by (key) into 2 buckets; create table T2 (...) clustered by (key) sorted by (key) into 2 buckets; create table T3 (...) clustered by (key) sorted by (key) into 2 buckets; SET hive.enforce.sorting=true; SET hive.enforce.bucketing=true; insert overwrite table T3 select .. from (select key, aggr() from T1 group by key) s1 full outer join (select key, aggr() from T2 group by key) s2 on s1.key=s2.ley; Ideally, this query can be performed in a single map-only job. Group By - SortMerge Join. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2905) Desc table can't read Chinese (UTF-8 character code)
[ https://issues.apache.org/jira/browse/HIVE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaozhe Wang updated HIVE-2905: --- Affects Version/s: 0.10.0 Desc table can't read Chinese (UTF-8 character code) Key: HIVE-2905 URL: https://issues.apache.org/jira/browse/HIVE-2905 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.7.0, 0.10.0 Environment: hive 0.7.0, mysql 5.1.45 Reporter: Sheng Zhou When desc a table with command line or hive jdbc way, the table's comment can't be read. 1. I have updated javax.jdo.option.ConnectionURL parameter in hive-site.xml file. jdbc:mysql://*.*.*.*:3306/hive?characterEncoding=UTF-8 2. In mysql database, the comment field of COLUMNS table can be read normally. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2905) Desc table can't read Chinese (UTF-8 character code)
[ https://issues.apache.org/jira/browse/HIVE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaozhe Wang updated HIVE-2905: --- Environment: hive 0.7.0, mysql 5.1.45 hive 0.10.0, mysql 5.5.30 was:hive 0.7.0, mysql 5.1.45 Desc table can't read Chinese (UTF-8 character code) Key: HIVE-2905 URL: https://issues.apache.org/jira/browse/HIVE-2905 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.7.0, 0.10.0 Environment: hive 0.7.0, mysql 5.1.45 hive 0.10.0, mysql 5.5.30 Reporter: Sheng Zhou When desc a table with command line or hive jdbc way, the table's comment can't be read. 1. I have updated javax.jdo.option.ConnectionURL parameter in hive-site.xml file. jdbc:mysql://*.*.*.*:3306/hive?characterEncoding=UTF-8 2. In mysql database, the comment field of COLUMNS table can be read normally. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2905) Desc table can't read Chinese (UTF-8 character code)
[ https://issues.apache.org/jira/browse/HIVE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaozhe Wang updated HIVE-2905: --- Labels: patch (was: ) Status: Patch Available (was: Open) The problem is org.apache.hadoop.hive.ql.metadata.formatting.TextMetaDataFormatter.describeTable() use DataOutputStream.writeBytes() to output column info string. Unfortunately, DataOutputStream.writeBytes() will only write out lower byte of each character in the String, which cause garbling problem when column comment contains non-latin1 characters. This simple patch solved Unicode character garbling problem when describe table in Hive client. Desc table can't read Chinese (UTF-8 character code) Key: HIVE-2905 URL: https://issues.apache.org/jira/browse/HIVE-2905 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.10.0, 0.7.0 Environment: hive 0.7.0, mysql 5.1.45 hive 0.10.0, mysql 5.5.30 Reporter: Sheng Zhou Labels: patch When desc a table with command line or hive jdbc way, the table's comment can't be read. 1. I have updated javax.jdo.option.ConnectionURL parameter in hive-site.xml file. jdbc:mysql://*.*.*.*:3306/hive?characterEncoding=UTF-8 2. In mysql database, the comment field of COLUMNS table can be read normally. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2905) Desc table can't read Chinese (UTF-8 character code)
[ https://issues.apache.org/jira/browse/HIVE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaozhe Wang updated HIVE-2905: --- Attachment: utf8-desc-comment.patch Simple patch to resolve the garbling problem of column comment which contains unicode characters. Desc table can't read Chinese (UTF-8 character code) Key: HIVE-2905 URL: https://issues.apache.org/jira/browse/HIVE-2905 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.7.0, 0.10.0 Environment: hive 0.7.0, mysql 5.1.45 hive 0.10.0, mysql 5.5.30 Reporter: Sheng Zhou Labels: patch Attachments: utf8-desc-comment.patch When desc a table with command line or hive jdbc way, the table's comment can't be read. 1. I have updated javax.jdo.option.ConnectionURL parameter in hive-site.xml file. jdbc:mysql://*.*.*.*:3306/hive?characterEncoding=UTF-8 2. In mysql database, the comment field of COLUMNS table can be read normally. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4365) wrong result in left semi join
[ https://issues.apache.org/jira/browse/HIVE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633664#comment-13633664 ] Lianhui Wang commented on HIVE-4365: hi,ransom problem also exist in my environment. and i use explain statement and find that the second sql's ppd has error. TableScan alias: t2 Filter Operator predicate: expr: (c1 = 1) type: boolean the ppd optimizer push the filter c1='1' to table t1 and t2. but correct thing is table t1, not t2. wrong result in left semi join -- Key: HIVE-4365 URL: https://issues.apache.org/jira/browse/HIVE-4365 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0, 0.10.0 Reporter: ransom.hezhiqiang wrong result in left semi join while hive.optimize.ppd=true for example: 1、create table create table t1(c1 int,c2 int, c3 int, c4 int, c5 double,c6 int,c7 string) row format DELIMITED FIELDS TERMINATED BY '|'; create table t2(c1 int) ; 2、load data load data local inpath '/home/test/t1.txt' OVERWRITE into table t1; load data local inpath '/home/test/t2.txt' OVERWRITE into table t2; t1 data: 1|3|10003|52|781.96|555|201203 1|3|10003|39|782.96|555|201203 1|3|10003|87|783.96|555|201203 2|5|10004|24|789.96|555|201203 2|5|10004|58|788.96|555|201203 t2 data: 555 3、excute Query select t1.c1,t1.c2,t1.c3,t1.c4,t1.c5,t1.c6,t1.c7 from t1 left semi join t2 on t1.c6 = t2.c1 and t1.c1 = '1' and t1.c7 = '201203' ; can got result. select t1.c1,t1.c2,t1.c3,t1.c4,t1.c5,t1.c6,t1.c7 from t1 left semi join t2 on t1.c6 = t2.c1 where t1.c1 = '1' and t1.c7 = '201203' ; can't got result. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4429) Nested ORDER BY produces incorrect result
[ https://issues.apache.org/jira/browse/HIVE-4429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643453#comment-13643453 ] Lianhui Wang commented on HIVE-4429: hi, Mihir Kulkarni i run the first sql of your cases, but in my hive-0.9, it produces correct result.it is the following. 30.01.0 20.01.0 10.01.0 30.02.0 20.02.0 10.02.0 30.03.0 20.03.0 10.03.0 60.04.0 50.04.0 40.04.0 60.05.0 50.05.0 40.05.0 60.06.0 50.06.0 40.06.0 so can you tell which version you used. Nested ORDER BY produces incorrect result - Key: HIVE-4429 URL: https://issues.apache.org/jira/browse/HIVE-4429 Project: Hive Issue Type: Bug Components: Query Processor, SQL, UDF Affects Versions: 0.9.0 Environment: Red Hat Linux VM with Hive 0.9 and Hadoop 2.0 Reporter: Mihir Kulkarni Priority: Critical Attachments: Hive_Command_Script.txt, HiveQuery.txt, Test_Data.txt Nested ORDER BY clause doesn't honor the outer one in specific case. The below query produces result which honors only the inner ORDER BY clause. (it produces only 1 MapRed job) {code:borderStyle=solid} SELECT alias.b0 as d0, alias.b1 as d1 FROM (SELECT test.a0 as b0, test.a1 as b1 FROM test ORDER BY b1 ASC, b0 DESC) alias ORDER BY d0 ASC, d1 DESC; {code} On the other hand the query below honors the outer ORDER BY clause which produces the correct result. (it produces 2 MapRed jobs) {code:borderStyle=solid} SELECT alias.b0 as d0, alias.b1 as d1 FROM (SELECT test.a0 as b0, test.a1 as b1 FROM test ORDER BY b1 ASC, b0 DESC) alias ORDER BY d0 DESC, d1 DESC; {code} Any other combination of nested ORDER BY clauses does produce the correct result. Please see attachments for query, schema and Hive Commands for reprocase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4506) use one map reduce to join multiple small tables
[ https://issues.apache.org/jira/browse/HIVE-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650371#comment-13650371 ] Lianhui Wang commented on HIVE-4506: Fern, can you provide your sql? if these tables used the same column in join clause, it used one mr. example: explain SELECT /*+mapjoin(src2,src3)*/ src1.key, src3.value FROM src src1 JOIN src src2 ON (src1.key = src2.key) JOIN src src3 ON (src1.key = src3.key); use one map reduce to join multiple small tables - Key: HIVE-4506 URL: https://issues.apache.org/jira/browse/HIVE-4506 Project: Hive Issue Type: Wish Affects Versions: 0.10.0 Reporter: Fern Priority: Minor I know we can use map side join for small table. by my test, if I use HQL like this -- select /*+mapjoin(b,c)*/... from a left join b on ... left join c on ... --- b and c are both small tables, I expect do the join in one map reduce using map side join. Actually, it would generate two map-reduce jobs by sequence. Sorry, currently I am just a user of hive and not dig into the code, so this is what I expect but I have no idea about how to improve now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4506) use one map reduce to join multiple small tables
[ https://issues.apache.org/jira/browse/HIVE-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650380#comment-13650380 ] Lianhui Wang commented on HIVE-4506: if these have difference column, HIVE-3784 resolved one big table with multiple small tables. use one map reduce to join multiple small tables - Key: HIVE-4506 URL: https://issues.apache.org/jira/browse/HIVE-4506 Project: Hive Issue Type: Wish Affects Versions: 0.10.0 Reporter: Fern Priority: Minor I know we can use map side join for small table. by my test, if I use HQL like this -- select /*+mapjoin(b,c)*/... from a left join b on ... left join c on ... --- b and c are both small tables, I expect do the join in one map reduce using map side join. Actually, it would generate two map-reduce jobs by sequence. Sorry, currently I am just a user of hive and not dig into the code, so this is what I expect but I have no idea about how to improve now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4233) The TGT gotten from class 'CLIService' should be renewed on time
[ https://issues.apache.org/jira/browse/HIVE-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13652877#comment-13652877 ] Dongyong Wang commented on HIVE-4233: - Thanks to Thejas and Jitendra's reply. I agree with your solution, my patch is too complex, waiting for your new patch. The TGT gotten from class 'CLIService' should be renewed on time - Key: HIVE-4233 URL: https://issues.apache.org/jira/browse/HIVE-4233 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.10.0 Environment: CentOS release 6.3 (Final) jdk1.6.0_31 HiveServer2 0.10.0-cdh4.2.0 Kerberos Security Reporter: Dongyong Wang Priority: Critical Attachments: 0001-FIX-HIVE-4233.patch When the HIveServer2 have started more than 7 days, I use beeline shell to connect the HiveServer2,all operation failed. The log of HiveServer2 shows it was caused by the Kerberos auth failure,the exception stack trace is: 2013-03-26 11:55:20,932 ERROR hive.ql.metadata.Hive: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1084) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:51) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2140) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2151) at org.apache.hadoop.hive.ql.metadata.Hive.getDelegationToken(Hive.java:2275) at org.apache.hive.service.cli.CLIService.getDelegationTokenFromMetaStore(CLIService.java:358) at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:127) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1073) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1058) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedConstructorAccessor52.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1082) ... 16 more Caused by: java.lang.IllegalStateException: This ticket is no longer valid at javax.security.auth.kerberos.KerberosTicket.toString(KerberosTicket.java:601) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at sun.security.jgss.krb5.SubjectComber.findAux(SubjectComber.java:120) at sun.security.jgss.krb5.SubjectComber.find(SubjectComber.java:41) at sun.security.jgss.krb5.Krb5Util.getTicket(Krb5Util.java:130) at sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:328) at java.security.AccessController.doPrivileged(Native Method) at sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:325) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:128) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162) at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175) at
[jira] [Commented] (HIVE-4233) The TGT gotten from class 'CLIService' should be renewed on time
[ https://issues.apache.org/jira/browse/HIVE-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13653625#comment-13653625 ] Dongyong Wang commented on HIVE-4233: - Thanks Thejas,I will try your work around lately. But we use Cloudera Manager to config the HiveServer2 and Metastore Server,I have to find the correct way to config the embeded metastore. The TGT gotten from class 'CLIService' should be renewed on time - Key: HIVE-4233 URL: https://issues.apache.org/jira/browse/HIVE-4233 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.10.0 Environment: CentOS release 6.3 (Final) jdk1.6.0_31 HiveServer2 0.10.0-cdh4.2.0 Kerberos Security Reporter: Dongyong Wang Priority: Critical Attachments: 0001-FIX-HIVE-4233.patch, HIVE-4233-2.patch When the HIveServer2 have started more than 7 days, I use beeline shell to connect the HiveServer2,all operation failed. The log of HiveServer2 shows it was caused by the Kerberos auth failure,the exception stack trace is: 2013-03-26 11:55:20,932 ERROR hive.ql.metadata.Hive: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1084) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:51) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2140) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2151) at org.apache.hadoop.hive.ql.metadata.Hive.getDelegationToken(Hive.java:2275) at org.apache.hive.service.cli.CLIService.getDelegationTokenFromMetaStore(CLIService.java:358) at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:127) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1073) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1058) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedConstructorAccessor52.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1082) ... 16 more Caused by: java.lang.IllegalStateException: This ticket is no longer valid at javax.security.auth.kerberos.KerberosTicket.toString(KerberosTicket.java:601) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at sun.security.jgss.krb5.SubjectComber.findAux(SubjectComber.java:120) at sun.security.jgss.krb5.SubjectComber.find(SubjectComber.java:41) at sun.security.jgss.krb5.Krb5Util.getTicket(Krb5Util.java:130) at sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:328) at java.security.AccessController.doPrivileged(Native Method) at sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:325) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:128) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162) at
[jira] [Commented] (HIVE-4233) The TGT gotten from class 'CLIService' should be renewed on time
[ https://issues.apache.org/jira/browse/HIVE-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655120#comment-13655120 ] Dongyong Wang commented on HIVE-4233: - We deploy the initial patch from 2013-04-15 and the TGT exception does not occur again. I just used the beeline client to connect to the HiveServer2 to execute the query can be successful. The TGT gotten from class 'CLIService' should be renewed on time - Key: HIVE-4233 URL: https://issues.apache.org/jira/browse/HIVE-4233 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.10.0 Environment: CentOS release 6.3 (Final) jdk1.6.0_31 HiveServer2 0.10.0-cdh4.2.0 Kerberos Security Reporter: Dongyong Wang Assignee: Thejas M Nair Priority: Critical Attachments: 0001-FIX-HIVE-4233.patch, HIVE-4233-2.patch, HIVE-4233-3.patch When the HIveServer2 have started more than 7 days, I use beeline shell to connect the HiveServer2,all operation failed. The log of HiveServer2 shows it was caused by the Kerberos auth failure,the exception stack trace is: 2013-03-26 11:55:20,932 ERROR hive.ql.metadata.Hive: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1084) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:51) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2140) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2151) at org.apache.hadoop.hive.ql.metadata.Hive.getDelegationToken(Hive.java:2275) at org.apache.hive.service.cli.CLIService.getDelegationTokenFromMetaStore(CLIService.java:358) at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:127) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1073) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1058) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedConstructorAccessor52.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1082) ... 16 more Caused by: java.lang.IllegalStateException: This ticket is no longer valid at javax.security.auth.kerberos.KerberosTicket.toString(KerberosTicket.java:601) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at sun.security.jgss.krb5.SubjectComber.findAux(SubjectComber.java:120) at sun.security.jgss.krb5.SubjectComber.find(SubjectComber.java:41) at sun.security.jgss.krb5.Krb5Util.getTicket(Krb5Util.java:130) at sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:328) at java.security.AccessController.doPrivileged(Native Method) at sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:325) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:128) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162) at
[jira] [Commented] (HIVE-4233) The TGT gotten from class 'CLIService' should be renewed on time
[ https://issues.apache.org/jira/browse/HIVE-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655123#comment-13655123 ] Dongyong Wang commented on HIVE-4233: - Sorry,I haven't tried you patch. The TGT gotten from class 'CLIService' should be renewed on time - Key: HIVE-4233 URL: https://issues.apache.org/jira/browse/HIVE-4233 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.10.0 Environment: CentOS release 6.3 (Final) jdk1.6.0_31 HiveServer2 0.10.0-cdh4.2.0 Kerberos Security Reporter: Dongyong Wang Assignee: Thejas M Nair Priority: Critical Attachments: 0001-FIX-HIVE-4233.patch, HIVE-4233-2.patch, HIVE-4233-3.patch When the HIveServer2 have started more than 7 days, I use beeline shell to connect the HiveServer2,all operation failed. The log of HiveServer2 shows it was caused by the Kerberos auth failure,the exception stack trace is: 2013-03-26 11:55:20,932 ERROR hive.ql.metadata.Hive: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1084) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:51) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2140) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2151) at org.apache.hadoop.hive.ql.metadata.Hive.getDelegationToken(Hive.java:2275) at org.apache.hive.service.cli.CLIService.getDelegationTokenFromMetaStore(CLIService.java:358) at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:127) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1073) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1058) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedConstructorAccessor52.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1082) ... 16 more Caused by: java.lang.IllegalStateException: This ticket is no longer valid at javax.security.auth.kerberos.KerberosTicket.toString(KerberosTicket.java:601) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at sun.security.jgss.krb5.SubjectComber.findAux(SubjectComber.java:120) at sun.security.jgss.krb5.SubjectComber.find(SubjectComber.java:41) at sun.security.jgss.krb5.Krb5Util.getTicket(Krb5Util.java:130) at sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:328) at java.security.AccessController.doPrivileged(Native Method) at sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:325) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:128) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162) at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175) at
[jira] [Updated] (HIVE-2247) ALTER TABLE RENAME PARTITION
[ https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiyan Wang updated HIVE-2247: -- Attachment: HIVE-2247.3.patch.txt Implement ALTER TABLE PARTITION RENAME function to rename a partition. Add HiveQL syntax ALTER TABLE bar PARTITION (k1='v1', k2='v2') RENAME TO PARTITION (k1='v3', k2='v4'); This is my first Hive diff, I just learn everything from existing codebase and may not have a good understanding on it. Feel free to inform me if I make something wrong. Thanks ALTER TABLE RENAME PARTITION Key: HIVE-2247 URL: https://issues.apache.org/jira/browse/HIVE-2247 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Weiyan Wang Attachments: HIVE-2247.3.patch.txt We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER TABLE RENAME. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2247) ALTER TABLE RENAME PARTITION
[ https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiyan Wang updated HIVE-2247: -- Attachment: HIVE-2247.4.patch.txt Refactor the code, rename the old partition data directory to new partition data directory. ALTER TABLE RENAME PARTITION Key: HIVE-2247 URL: https://issues.apache.org/jira/browse/HIVE-2247 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Weiyan Wang Attachments: HIVE-2247.3.patch.txt, HIVE-2247.4.patch.txt We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER TABLE RENAME. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2247) ALTER TABLE RENAME PARTITION
[ https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiyan Wang updated HIVE-2247: -- Attachment: HIVE-2247.5.patch.txt Use alter_partition(db_name, tbl_name, newPart, part_vals) to replace rename_partition thrift API Add one authorization unit test to test if new partition has the same privilege as old one ALTER TABLE RENAME PARTITION Key: HIVE-2247 URL: https://issues.apache.org/jira/browse/HIVE-2247 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Weiyan Wang Attachments: HIVE-2247.3.patch.txt, HIVE-2247.4.patch.txt, HIVE-2247.5.patch.txt We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER TABLE RENAME. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2247) ALTER TABLE RENAME PARTITION
[ https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiyan Wang updated HIVE-2247: -- Attachment: HIVE-2247.7.patch.txt refactor codes ALTER TABLE RENAME PARTITION Key: HIVE-2247 URL: https://issues.apache.org/jira/browse/HIVE-2247 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Weiyan Wang Attachments: HIVE-2247.3.patch.txt, HIVE-2247.4.patch.txt, HIVE-2247.5.patch.txt, HIVE-2247.6.patch.txt, HIVE-2247.7.patch.txt We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER TABLE RENAME. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2247) ALTER TABLE RENAME PARTITION
[ https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiyan Wang updated HIVE-2247: -- Attachment: HIVE-2247.8.patch.txt +work.getInputs().add(new ReadEntity(oldPart)); +work.getOutputs().add(new WriteEntity(newPart)); ALTER TABLE RENAME PARTITION Key: HIVE-2247 URL: https://issues.apache.org/jira/browse/HIVE-2247 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Weiyan Wang Attachments: HIVE-2247.3.patch.txt, HIVE-2247.4.patch.txt, HIVE-2247.5.patch.txt, HIVE-2247.6.patch.txt, HIVE-2247.7.patch.txt, HIVE-2247.8.patch.txt We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER TABLE RENAME. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-7645) Hive CompactorMR job set NUM_BUCKETS mistake
Xiaoyu Wang created HIVE-7645: - Summary: Hive CompactorMR job set NUM_BUCKETS mistake Key: HIVE-7645 URL: https://issues.apache.org/jira/browse/HIVE-7645 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.13.1 Reporter: Xiaoyu Wang code: job.setInt(NUM_BUCKETS, sd.getBucketColsSize()); should change to: job.setInt(NUM_BUCKETS, sd.getNumBuckets()); -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7645) Hive CompactorMR job set NUM_BUCKETS mistake
[ https://issues.apache.org/jira/browse/HIVE-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Wang updated HIVE-7645: -- Attachment: HIVE-7645.patch Hive CompactorMR job set NUM_BUCKETS mistake Key: HIVE-7645 URL: https://issues.apache.org/jira/browse/HIVE-7645 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.13.1 Reporter: Xiaoyu Wang Attachments: HIVE-7645.patch code: job.setInt(NUM_BUCKETS, sd.getBucketColsSize()); should change to: job.setInt(NUM_BUCKETS, sd.getNumBuckets()); -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7645) Hive CompactorMR job set NUM_BUCKETS mistake
[ https://issues.apache.org/jira/browse/HIVE-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Wang updated HIVE-7645: -- Status: Patch Available (was: Open) Hive CompactorMR job set NUM_BUCKETS mistake Key: HIVE-7645 URL: https://issues.apache.org/jira/browse/HIVE-7645 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.13.1 Reporter: Xiaoyu Wang Attachments: HIVE-7645.patch code: job.setInt(NUM_BUCKETS, sd.getBucketColsSize()); should change to: job.setInt(NUM_BUCKETS, sd.getNumBuckets()); -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7645) Hive CompactorMR job set NUM_BUCKETS mistake
[ https://issues.apache.org/jira/browse/HIVE-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090728#comment-14090728 ] Xiaoyu Wang commented on HIVE-7645: --- This error should not cause by this patch! Hive CompactorMR job set NUM_BUCKETS mistake Key: HIVE-7645 URL: https://issues.apache.org/jira/browse/HIVE-7645 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.13.1 Reporter: Xiaoyu Wang Attachments: HIVE-7645.patch code: job.setInt(NUM_BUCKETS, sd.getBucketColsSize()); should change to: job.setInt(NUM_BUCKETS, sd.getNumBuckets()); -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7384) Research into reduce-side join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105301#comment-14105301 ] Lianhui Wang commented on HIVE-7384: i think current spark already support hash by join_col,sort by {join_col,tag}. because in spark map's shuffleWriter hash by Key.hashcode and sort by Key and in Hive HiveKey class already define the hashcode. so that can support hash by HiveKey.hashcode, sort by HiveKey's bytes Research into reduce-side join [Spark Branch] - Key: HIVE-7384 URL: https://issues.apache.org/jira/browse/HIVE-7384 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Szehon Ho Attachments: Hive on Spark Reduce Side Join.docx, sales_items.txt, sales_products.txt, sales_stores.txt Hive's join operator is very sophisticated, especially for reduce-side join. While we expect that other types of join, such as map-side join and SMB map-side join, will work out of the box with our design, there may be some complication in reduce-side join, which extensively utilizes key tag and shuffle behavior. Our design principle prefers to making Hive implementation work out of box also, which might requires new functionality from Spark. The tasks is to research into this area, identifying requirements for Spark community and the work to be done on Hive to make reduce-side join work. A design doc might be needed for this. For more information, please refer to the overall design doc on wiki. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7384) Research into reduce-side join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106343#comment-14106343 ] Lianhui Wang commented on HIVE-7384: @Szehon Ho yes,i read OrderedRDDFunctions code and discove that sortByKey actually does a range-partition. we need to replace range-partition with hash partition. so spark maybe should create a new interface example: partitionSortByKey. @Brock Noland code in 1) means when sample data and more than one reducers, Hive does a total order sort. so join does not sample data, it does not need a total order sort. 2) i think we really need auto-parallelism. before i talk it with Reynold Xin, spark need to support re-partition mapoutput's data as same as tez does. Research into reduce-side join [Spark Branch] - Key: HIVE-7384 URL: https://issues.apache.org/jira/browse/HIVE-7384 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Szehon Ho Attachments: Hive on Spark Reduce Side Join.docx, sales_items.txt, sales_products.txt, sales_stores.txt Hive's join operator is very sophisticated, especially for reduce-side join. While we expect that other types of join, such as map-side join and SMB map-side join, will work out of the box with our design, there may be some complication in reduce-side join, which extensively utilizes key tag and shuffle behavior. Our design principle prefers to making Hive implementation work out of box also, which might requires new functionality from Spark. The tasks is to research into this area, identifying requirements for Spark community and the work to be done on Hive to make reduce-side join work. A design doc might be needed for this. For more information, please refer to the overall design doc on wiki. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7384) Research into reduce-side join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106407#comment-14106407 ] Lianhui Wang commented on HIVE-7384: i think the thoughts is same as ideas that you said before. like HIVE-7158, that will auto-calculate the number of reducers based on some input from Hive (upper/lower bound). Research into reduce-side join [Spark Branch] - Key: HIVE-7384 URL: https://issues.apache.org/jira/browse/HIVE-7384 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Szehon Ho Attachments: Hive on Spark Reduce Side Join.docx, sales_items.txt, sales_products.txt, sales_stores.txt Hive's join operator is very sophisticated, especially for reduce-side join. While we expect that other types of join, such as map-side join and SMB map-side join, will work out of the box with our design, there may be some complication in reduce-side join, which extensively utilizes key tag and shuffle behavior. Our design principle prefers to making Hive implementation work out of box also, which might requires new functionality from Spark. The tasks is to research into this area, identifying requirements for Spark community and the work to be done on Hive to make reduce-side join work. A design doc might be needed for this. For more information, please refer to the overall design doc on wiki. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6179) OOM occurs when query spans to a large number of partitions
[ https://issues.apache.org/jira/browse/HIVE-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] perry wang updated HIVE-6179: - Description: When executing a query against a large number of partitions, such as select count(*) from table, OOM error may occur because Hive fetches the metadata for all partitions involved and tries to store it in memory. {code} 2014-01-09 13:14:17,090 ERROR metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(141)) - java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2367) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415) at java.lang.StringBuffer.append(StringBuffer.java:237) at org.apache.derby.impl.sql.conn.GenericStatementContext.appendErrorInfo(Unknown Source) at org.apache.derby.iapi.services.context.ContextManager.cleanupOnError(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedResultSet.closeOnTransactionError(Unknown Source) at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown Source) at org.apache.derby.impl.jdbc.EmbedResultSet.next(Unknown Source) at org.datanucleus.store.rdbms.query.ForwardQueryResult.nextResultSetElement(ForwardQueryResult.java:191) at org.datanucleus.store.rdbms.query.ForwardQueryResult$QueryResultIterator.next(ForwardQueryResult.java:379) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.loopJoinOrderedResult(MetaStoreDirectSql.java:641) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:410) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitions(MetaStoreDirectSql.java:205) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsInternal(ObjectStore.java:1433) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:1420) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:122) at com.sun.proxy.$Proxy7.getPartitions(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions(HiveMetaStore.java:2128) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103) {code} The above error happened when executing select count(*) on a table with 40K partitions. was: When executing a query against a large number of partitions, such as select count(*) from table, OOM error may occur because Hive fetches the metadata for all partitions involved and tries to store it in memory. {code} 2014-01-09 13:14:17,090 ERROR metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(141)) - java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2367) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415) at java.lang.StringBuffer.append(StringBuffer.java:237) at org.apache.derby.impl.sql.conn.GenericStatementContext.appendErrorInfo(Unknown Source) at org.apache.derby.iapi.services.context.ContextManager.cleanupOnError(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedResultSet.closeOnTransactionError(Unknown Source) at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown Source) at
[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925394#comment-13925394 ] Gordon Wang commented on HIVE-4629: --- What about the status of this jira? Does anyone try to rebase it to the latest trunk? I think it is a useful feature especially when doing some testing about hql. HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4629-no_thrift.1.patch, HIVE-4629.1.patch, HIVE-4629.2.patch HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view
Adrian Wang created HIVE-6765: - Summary: ASTNodeOrigin unserializable leads to fail when join with view Key: HIVE-6765 URL: https://issues.apache.org/jira/browse/HIVE-6765 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Adrian Wang when a view contains a UDF, and the view comes into a JOIN operation, Hive will encounter a bug with stack trace like Caused by: java.lang.InstantiationException: org.apache.hadoop.hive.ql.parse.ASTNodeOrigin at java.lang.Class.newInstance0(Class.java:359) at java.lang.Class.newInstance(Class.java:327) at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view
[ https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949052#comment-13949052 ] Adrian Wang commented on HIVE-6765: --- I added a PersistenceDelegate in serializeObject() in Class Utilities and resolved the problem. later I'll attach the patch. ASTNodeOrigin unserializable leads to fail when join with view -- Key: HIVE-6765 URL: https://issues.apache.org/jira/browse/HIVE-6765 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Adrian Wang when a view contains a UDF, and the view comes into a JOIN operation, Hive will encounter a bug with stack trace like Caused by: java.lang.InstantiationException: org.apache.hadoop.hive.ql.parse.ASTNodeOrigin at java.lang.Class.newInstance0(Class.java:359) at java.lang.Class.newInstance(Class.java:327) at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view
[ https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Wang updated HIVE-6765: -- Attachment: HIVE-6765.patch.1 ASTNodeOrigin unserializable leads to fail when join with view -- Key: HIVE-6765 URL: https://issues.apache.org/jira/browse/HIVE-6765 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Adrian Wang Attachments: HIVE-6765.patch.1 when a view contains a UDF, and the view comes into a JOIN operation, Hive will encounter a bug with stack trace like Caused by: java.lang.InstantiationException: org.apache.hadoop.hive.ql.parse.ASTNodeOrigin at java.lang.Class.newInstance0(Class.java:359) at java.lang.Class.newInstance(Class.java:327) at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view
[ https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949075#comment-13949075 ] Adrian Wang commented on HIVE-6765: --- Here's an example to see the Exception: CREATE TABLE t1 (a1 INT, b1 INT); CREATE VIEW v1 (x1) AS SELECT MAX(a1) FROM t1; SELECT s1.x1 FROM v1 s1 JOIN (SELECT MAX(a1) AS ma FROM t1) s2 ON s1.x1 = s2.ma; This is a bug on both ApacheHive and Tez, outputing return code 1 ... ASTNodeOrigin unserializable leads to fail when join with view -- Key: HIVE-6765 URL: https://issues.apache.org/jira/browse/HIVE-6765 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Adrian Wang Attachments: HIVE-6765.patch.1 when a view contains a UDF, and the view comes into a JOIN operation, Hive will encounter a bug with stack trace like Caused by: java.lang.InstantiationException: org.apache.hadoop.hive.ql.parse.ASTNodeOrigin at java.lang.Class.newInstance0(Class.java:359) at java.lang.Class.newInstance(Class.java:327) at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view
[ https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949076#comment-13949076 ] Adrian Wang commented on HIVE-6765: --- And I think this is just another drawback for using XMLEncoder to clone plan. ASTNodeOrigin unserializable leads to fail when join with view -- Key: HIVE-6765 URL: https://issues.apache.org/jira/browse/HIVE-6765 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Adrian Wang Attachments: HIVE-6765.patch.1 when a view contains a UDF, and the view comes into a JOIN operation, Hive will encounter a bug with stack trace like Caused by: java.lang.InstantiationException: org.apache.hadoop.hive.ql.parse.ASTNodeOrigin at java.lang.Class.newInstance0(Class.java:359) at java.lang.Class.newInstance(Class.java:327) at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view
[ https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949113#comment-13949113 ] Adrian Wang commented on HIVE-6765: --- Sorry, the previous example works on Tez with hive-0.13. But it fails when I run the query in Hive-0.12 in eclipse. ASTNodeOrigin unserializable leads to fail when join with view -- Key: HIVE-6765 URL: https://issues.apache.org/jira/browse/HIVE-6765 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Adrian Wang Attachments: HIVE-6765.patch.1 when a view contains a UDF, and the view comes into a JOIN operation, Hive will encounter a bug with stack trace like Caused by: java.lang.InstantiationException: org.apache.hadoop.hive.ql.parse.ASTNodeOrigin at java.lang.Class.newInstance0(Class.java:359) at java.lang.Class.newInstance(Class.java:327) at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view
[ https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949118#comment-13949118 ] Adrian Wang commented on HIVE-6765: --- when I run the test case in hive command line(0.12-release), the full output is as follows, hive SELECT s1.x1 FROM v1 s1 JOIN (SELECT MAX(a1) AS ma FROM t1) s2 ON s1.x1 = s2.ma; java.lang.RuntimeException: Cannot serialize object at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:652) at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:361) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:277) at org.apache.hadoop.hive.ql.exec.Utilities.serializeObject(Utilities.java:666) at org.apache.hadoop.hive.ql.exec.Utilities.clonePlan(Utilities.java:637) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinTaskDispatcher.processCurrentTask(CommonJoinTaskDispatcher.java:505) at org.apache.hadoop.hive.ql.optimizer.physical.AbstractJoinTaskDispatcher.dispatch(AbstractJoinTaskDispatcher.java:182) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:79) at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:90) at org.apache.hadoop.hive.ql.parse.MapReduceCompiler.compile(MapReduceCompiler.java:300) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8410) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:441) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.lang.Exception: XMLEncoder: discarding statement XMLEncoder.writeObject(MapredWork); ... 29 more Caused by: java.lang.RuntimeException: Cannot serialize object at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:652) at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:267) at java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:408) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:116) at java.beans.Encoder.writeObject(Encoder.java:74) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:274) at java.beans.Encoder.writeExpression(Encoder.java:304) at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:389) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:113) at java.beans.Encoder.writeObject(Encoder.java:74) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:274) at java.beans.Encoder.writeObject1(Encoder.java:231) at java.beans.Encoder.cloneStatement(Encoder.java:244) at java.beans.Encoder.writeStatement(Encoder.java:275) at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:348) ... 28 more Caused by: java.lang.RuntimeException: Cannot serialize object at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:652) at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:267) at java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:408) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:116) at java.beans.Encoder.writeObject(Encoder.java:74) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:274) at
[jira] [Updated] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view
[ https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Wang updated HIVE-6765: -- Component/s: (was: Query Processor) ASTNodeOrigin unserializable leads to fail when join with view -- Key: HIVE-6765 URL: https://issues.apache.org/jira/browse/HIVE-6765 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Adrian Wang Attachments: HIVE-6765.patch.1 when a view contains a UDF, and the view comes into a JOIN operation, Hive will encounter a bug with stack trace like Caused by: java.lang.InstantiationException: org.apache.hadoop.hive.ql.parse.ASTNodeOrigin at java.lang.Class.newInstance0(Class.java:359) at java.lang.Class.newInstance(Class.java:327) at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view
[ https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Wang updated HIVE-6765: -- Fix Version/s: 0.13.0 ASTNodeOrigin unserializable leads to fail when join with view -- Key: HIVE-6765 URL: https://issues.apache.org/jira/browse/HIVE-6765 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Adrian Wang Fix For: 0.13.0 Attachments: HIVE-6765.patch.1 when a view contains a UDF, and the view comes into a JOIN operation, Hive will encounter a bug with stack trace like Caused by: java.lang.InstantiationException: org.apache.hadoop.hive.ql.parse.ASTNodeOrigin at java.lang.Class.newInstance0(Class.java:359) at java.lang.Class.newInstance(Class.java:327) at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view
[ https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Wang updated HIVE-6765: -- Status: Patch Available (was: Open) ASTNodeOrigin unserializable leads to fail when join with view -- Key: HIVE-6765 URL: https://issues.apache.org/jira/browse/HIVE-6765 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Adrian Wang Fix For: 0.13.0 Attachments: HIVE-6765.patch.1 when a view contains a UDF, and the view comes into a JOIN operation, Hive will encounter a bug with stack trace like Caused by: java.lang.InstantiationException: org.apache.hadoop.hive.ql.parse.ASTNodeOrigin at java.lang.Class.newInstance0(Class.java:359) at java.lang.Class.newInstance(Class.java:327) at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-2777) ability to add and drop partitions atomically
[ https://issues.apache.org/jira/browse/HIVE-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Wang updated HIVE-2777: - Affects Version/s: 0.13.0 Status: Patch Available (was: Open) This is a rebased patch on top of hive branch-0.13. Please review. ability to add and drop partitions atomically - Key: HIVE-2777 URL: https://issues.apache.org/jira/browse/HIVE-2777 Project: Hive Issue Type: New Feature Components: Metastore Affects Versions: 0.13.0 Reporter: Aniket Mokashi Assignee: Aniket Mokashi Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2777.D2271.1.patch Hive should have ability to atomically add and drop partitions. This way admins can change partitions atomically without breaking the running jobs. It allows admin to merge several partitions into one. Essentially, we would like to have an api- add_drop_partitions(String db, String tbl_name, ListPartition addParts, ListListString dropParts, boolean deleteData); This jira covers changes required for metastore and thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-2777) ability to add and drop partitions atomically
[ https://issues.apache.org/jira/browse/HIVE-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Wang updated HIVE-2777: - Attachment: hive-2777.patch ability to add and drop partitions atomically - Key: HIVE-2777 URL: https://issues.apache.org/jira/browse/HIVE-2777 Project: Hive Issue Type: New Feature Components: Metastore Affects Versions: 0.13.0 Reporter: Aniket Mokashi Assignee: Aniket Mokashi Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2777.D2271.1.patch, hive-2777.patch Hive should have ability to atomically add and drop partitions. This way admins can change partitions atomically without breaking the running jobs. It allows admin to merge several partitions into one. Essentially, we would like to have an api- add_drop_partitions(String db, String tbl_name, ListPartition addParts, ListListString dropParts, boolean deleteData); This jira covers changes required for metastore and thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view
[ https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13988648#comment-13988648 ] Adrian Wang commented on HIVE-6765: --- [~selinazh] Thanks for your comment! It's so glad that someone also noticed this. Actually, I found that only when there was something like an aggregation function in the view, will the problem came up. The problem results from cloning the plan, but when joining with view as described, the plan would contain a node of ASTNodeOrigin, which does not have a default construct method, in which case when duplicating, exception will be thrown. Could you please try to apply my patch here to see whether your problem is resolved? Thanks again. ASTNodeOrigin unserializable leads to fail when join with view -- Key: HIVE-6765 URL: https://issues.apache.org/jira/browse/HIVE-6765 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Adrian Wang Fix For: 0.13.0 Attachments: HIVE-6765.patch.1 when a view contains a UDF, and the view comes into a JOIN operation, Hive will encounter a bug with stack trace like Caused by: java.lang.InstantiationException: org.apache.hadoop.hive.ql.parse.ASTNodeOrigin at java.lang.Class.newInstance0(Class.java:359) at java.lang.Class.newInstance(Class.java:327) at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view
[ https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989287#comment-13989287 ] Adrian Wang commented on HIVE-6765: --- [~cdrome] good catch. I knew the serialization in Hive has been notorious for a long time, but I didn't know the progress they made there. Actually, I was real curious when I saw my case was OK with Tez with hive-0.13, while I never tried Apache's hive-0.13 since there was no official release. ASTNodeOrigin unserializable leads to fail when join with view -- Key: HIVE-6765 URL: https://issues.apache.org/jira/browse/HIVE-6765 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Adrian Wang Fix For: 0.13.0 Attachments: HIVE-6765.patch.1 when a view contains a UDF, and the view comes into a JOIN operation, Hive will encounter a bug with stack trace like Caused by: java.lang.InstantiationException: org.apache.hadoop.hive.ql.parse.ASTNodeOrigin at java.lang.Class.newInstance0(Class.java:359) at java.lang.Class.newInstance(Class.java:327) at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-2777) ability to add and drop partitions atomically
[ https://issues.apache.org/jira/browse/HIVE-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Wang updated HIVE-2777: - Attachment: (was: hive-2777.patch) ability to add and drop partitions atomically - Key: HIVE-2777 URL: https://issues.apache.org/jira/browse/HIVE-2777 Project: Hive Issue Type: New Feature Components: Metastore Affects Versions: 0.13.0 Reporter: Aniket Mokashi Assignee: Aniket Mokashi Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2777.D2271.1.patch Hive should have ability to atomically add and drop partitions. This way admins can change partitions atomically without breaking the running jobs. It allows admin to merge several partitions into one. Essentially, we would like to have an api- add_drop_partitions(String db, String tbl_name, ListPartition addParts, ListListString dropParts, boolean deleteData); This jira covers changes required for metastore and thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)