[jira] [Updated] (HIVE-2095) auto convert map join bug
[ https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2095: --- Description: 1) when considering to choose one table as the big table candidate for a map join, if at compile time, hive can find out that the total known size of all other tables excluding the big table in consideration is bigger than a configured value, this big table candidate is a bad one, and should not put into plan. Otherwise, at runtime to filter this out may cause more time. 2) added a null check for back up tasks. Otherwise will see NullPointerException 3) CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise it will make wrong decision. 4) changes made to the ConditionalResolverCommonJoin: added pathToAliases, aliasToSize (alias's input size that is known at compile time, by inputSummary), and intermediate dir path. So the logic is, go over all the pathToAliases, and for each path, if it is from intermediate dir path, add this path's size to all aliases. And finally based on the size information and others like aliasToTask to choose the big table. 5) Conditional task's children contains wrong options, which may cause join fail or incorrect results. Basically when getting all possible children for the conditional task, should use a whitelist of big tables. Only tables in this while list can be considered as a big table. Here is the logic: + * Get a list of big table candidates. Only the tables in the returned set can + * be used as big table in the join operation. + * + * The logic here is to scan the join condition array from left to right. If + * see a inner join and the bigTableCandidates is empty, add both side of this + * inner join to big table candidates. If see a left outer join, and the + * bigTableCandidates is empty, add the left side to it, and if the + * bigTableCandidates is not empty, do nothing (which means the + * bigTableCandidates is from left side). If see a right outer join, clear the + * bigTableCandidates, and add right side to the bigTableCandidates, it means + * the right side of a right outer join always win. If see a full outer join, + * return null immediately (no one can be the big table, can not do a + * mapjoin). was:If auto convert join is set to true, it should fall back to common join if the input size of each join table is bigger than a configured value. Summary: auto convert map join bug (was: auto convert map join should not be triggered if the input size is bigger than a configured value.) auto convert map join bug - Key: HIVE-2095 URL: https://issues.apache.org/jira/browse/HIVE-2095 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-2095.1.patch 1) when considering to choose one table as the big table candidate for a map join, if at compile time, hive can find out that the total known size of all other tables excluding the big table in consideration is bigger than a configured value, this big table candidate is a bad one, and should not put into plan. Otherwise, at runtime to filter this out may cause more time. 2) added a null check for back up tasks. Otherwise will see NullPointerException 3) CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise it will make wrong decision. 4) changes made to the ConditionalResolverCommonJoin: added pathToAliases, aliasToSize (alias's input size that is known at compile time, by inputSummary), and intermediate dir path. So the logic is, go over all the pathToAliases, and for each path, if it is from intermediate dir path, add this path's size to all aliases. And finally based on the size information and others like aliasToTask to choose the big table. 5) Conditional task's children contains wrong options, which may cause join fail or incorrect results. Basically when getting all possible children for the conditional task, should use a whitelist of big tables. Only tables in this while list can be considered as a big table. Here is the logic: + * Get a list of big table candidates. Only the tables in the returned set can + * be used as big table in the join operation. + * + * The logic here is to scan the join condition array from left to right. If + * see a inner join and the bigTableCandidates is empty, add both side of this + * inner join to big table candidates. If see a left outer join, and the + * bigTableCandidates is empty, add the left side to it, and if the + * bigTableCandidates is not empty, do nothing (which means the + * bigTableCandidates is from left side). If see a right outer join, clear the + * bigTableCandidates, and add right side to the bigTableCandidates, it
[jira] [Updated] (HIVE-2095) auto convert map join bug
[ https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2095: --- Attachment: HIVE-2095.1.patch auto convert map join bug - Key: HIVE-2095 URL: https://issues.apache.org/jira/browse/HIVE-2095 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-2095.1.patch 1) when considering to choose one table as the big table candidate for a map join, if at compile time, hive can find out that the total known size of all other tables excluding the big table in consideration is bigger than a configured value, this big table candidate is a bad one, and should not put into plan. Otherwise, at runtime to filter this out may cause more time. 2) added a null check for back up tasks. Otherwise will see NullPointerException 3) CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise it will make wrong decision. 4) changes made to the ConditionalResolverCommonJoin: added pathToAliases, aliasToSize (alias's input size that is known at compile time, by inputSummary), and intermediate dir path. So the logic is, go over all the pathToAliases, and for each path, if it is from intermediate dir path, add this path's size to all aliases. And finally based on the size information and others like aliasToTask to choose the big table. 5) Conditional task's children contains wrong options, which may cause join fail or incorrect results. Basically when getting all possible children for the conditional task, should use a whitelist of big tables. Only tables in this while list can be considered as a big table. Here is the logic: + * Get a list of big table candidates. Only the tables in the returned set can + * be used as big table in the join operation. + * + * The logic here is to scan the join condition array from left to right. If + * see a inner join and the bigTableCandidates is empty, add both side of this + * inner join to big table candidates. If see a left outer join, and the + * bigTableCandidates is empty, add the left side to it, and if the + * bigTableCandidates is not empty, do nothing (which means the + * bigTableCandidates is from left side). If see a right outer join, clear the + * bigTableCandidates, and add right side to the bigTableCandidates, it means + * the right side of a right outer join always win. If see a full outer join, + * return null immediately (no one can be the big table, can not do a + * mapjoin). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Attachment: unit-tests.2.patch New unit tests patch that should fix some more tests. John, I didn't see any failures in TestMTQueries even before adding this new patch. I'm not sure why that would be, but I definitely fixed some things in the other two tests. Also this patch only includes the unit tests, so you will need to include patch 11 as well. Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.11.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, unit-tests.patch Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: How To Use Hive
Answered on the users list. On Thu, Apr 7, 2011 at 12:00 AM, komara nagarjuna komaranagarj...@gmail.com wrote: Sir, I am new to Hadoop and Hive. Now i am developing an application Hadoop with Hive in MultiNode cluster. I install and run Hadoop successfully in master and slave machines successfully.Hive installed in Master machine and it also connecting to database through mysql. In Hive, I create a table successfully.My problem is how to insert data to hive tables. How to communicate Hadoop with Hive.How to use Hive datawarehouse.What is the purpose of hive. Please explain how to use Hive in real time. *Thanks Regards*, *Nagarjuna komara.*
[jira] [Created] (HIVE-2098) Make couple of convenience methods in EximUtil public
Make couple of convenience methods in EximUtil public - Key: HIVE-2098 URL: https://issues.apache.org/jira/browse/HIVE-2098 Project: Hive Issue Type: Bug Reporter: Krishna Kumar Priority: Minor readMetaData() and createExportDump() to be public -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2098) Make couple of convenience methods in EximUtil public
[ https://issues.apache.org/jira/browse/HIVE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar reassigned HIVE-2098: --- Assignee: Krishna Kumar Make couple of convenience methods in EximUtil public - Key: HIVE-2098 URL: https://issues.apache.org/jira/browse/HIVE-2098 Project: Hive Issue Type: Bug Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor readMetaData() and createExportDump() to be public -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2098) Make couple of convenience methods in EximUtil public
[ https://issues.apache.org/jira/browse/HIVE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-2098: Attachment: HIVE.2098.patch.0.txt Couple of methods made public for use outside of the package. Make couple of convenience methods in EximUtil public - Key: HIVE-2098 URL: https://issues.apache.org/jira/browse/HIVE-2098 Project: Hive Issue Type: Bug Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor Attachments: HIVE.2098.patch.0.txt readMetaData() and createExportDump() to be public -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2098) Make couple of convenience methods in EximUtil public
[ https://issues.apache.org/jira/browse/HIVE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-2098: Status: Patch Available (was: Open) Make couple of convenience methods in EximUtil public - Key: HIVE-2098 URL: https://issues.apache.org/jira/browse/HIVE-2098 Project: Hive Issue Type: Bug Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor Attachments: HIVE.2098.patch.0.txt readMetaData() and createExportDump() to be public -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2095) auto convert map join bug
[ https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016871#comment-13016871 ] Liyin Tang commented on HIVE-2095: -- I will take a look auto convert map join bug - Key: HIVE-2095 URL: https://issues.apache.org/jira/browse/HIVE-2095 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-2095.1.patch 1) when considering to choose one table as the big table candidate for a map join, if at compile time, hive can find out that the total known size of all other tables excluding the big table in consideration is bigger than a configured value, this big table candidate is a bad one, and should not put into plan. Otherwise, at runtime to filter this out may cause more time. 2) added a null check for back up tasks. Otherwise will see NullPointerException 3) CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise it will make wrong decision. 4) changes made to the ConditionalResolverCommonJoin: added pathToAliases, aliasToSize (alias's input size that is known at compile time, by inputSummary), and intermediate dir path. So the logic is, go over all the pathToAliases, and for each path, if it is from intermediate dir path, add this path's size to all aliases. And finally based on the size information and others like aliasToTask to choose the big table. 5) Conditional task's children contains wrong options, which may cause join fail or incorrect results. Basically when getting all possible children for the conditional task, should use a whitelist of big tables. Only tables in this while list can be considered as a big table. Here is the logic: + * Get a list of big table candidates. Only the tables in the returned set can + * be used as big table in the join operation. + * + * The logic here is to scan the join condition array from left to right. If + * see a inner join and the bigTableCandidates is empty, add both side of this + * inner join to big table candidates. If see a left outer join, and the + * bigTableCandidates is empty, add the left side to it, and if the + * bigTableCandidates is not empty, do nothing (which means the + * bigTableCandidates is from left side). If see a right outer join, clear the + * bigTableCandidates, and add right side to the bigTableCandidates, it means + * the right side of a right outer join always win. If see a full outer join, + * return null immediately (no one can be the big table, can not do a + * mapjoin). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2082) Reduce memory consumption in preparing MapReduce job
[ https://issues.apache.org/jira/browse/HIVE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain resolved HIVE-2082. -- Resolution: Fixed Hadoop Flags: [Reviewed] Committed. Thanks Ning Reduce memory consumption in preparing MapReduce job Key: HIVE-2082 URL: https://issues.apache.org/jira/browse/HIVE-2082 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2082.patch, HIVE-2082.patch, HIVE-2082.patch Hive client side consume a lot of memory when the number of input partitions is large. One reason is that each partition maintains a list of FieldSchema which are intended to deal with schema evolution. However they are not used currently and Hive uses the table level schema for all partitions. This will be fixed in HIVE-2050. The memory consumption by this part will be reduced by almost half (1.2GB to 700BM for 20k partitions). Another large chunk of memory consumption is in the MapReduce job setup phase when a PartitionDesc is created from each Partition object. A property object is maintained in PartitionDesc which contains a full list of columns and types. Due to the same reason, these should be the same as in the table level schema. Also the deserializer initialization takes large amount of memory, which should be avoided. My initial testing for these optimizations cut the memory consumption in half (700MB to 300MB for 20k partitions). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2093) inputs are outputs should be populated for create/drop database
[ https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-2093: - Status: Open (was: Patch Available) inputs are outputs should be populated for create/drop database --- Key: HIVE-2093 URL: https://issues.apache.org/jira/browse/HIVE-2093 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE.2093.1.patch This is needed for many other things: concurrency, authorization etc. to work -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2093) inputs are outputs should be populated for create/drop database
[ https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017034#comment-13017034 ] Namit Jain commented on HIVE-2093: -- The changes to inputs/outputs look good - Yongqiang, can you confirm the authorization changes ? If we are supporting this, we should also support LOCK DATABASE DB_NAME in the same patch. Also, can you add a negative test with LOCK DATABASE .. ? inputs are outputs should be populated for create/drop database --- Key: HIVE-2093 URL: https://issues.apache.org/jira/browse/HIVE-2093 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE.2093.1.patch This is needed for many other things: concurrency, authorization etc. to work -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2095) auto convert map join bug
[ https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017049#comment-13017049 ] Namit Jain commented on HIVE-2095: -- Can you also create a review-board request ? auto convert map join bug - Key: HIVE-2095 URL: https://issues.apache.org/jira/browse/HIVE-2095 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-2095.1.patch 1) when considering to choose one table as the big table candidate for a map join, if at compile time, hive can find out that the total known size of all other tables excluding the big table in consideration is bigger than a configured value, this big table candidate is a bad one, and should not put into plan. Otherwise, at runtime to filter this out may cause more time. 2) added a null check for back up tasks. Otherwise will see NullPointerException 3) CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise it will make wrong decision. 4) changes made to the ConditionalResolverCommonJoin: added pathToAliases, aliasToSize (alias's input size that is known at compile time, by inputSummary), and intermediate dir path. So the logic is, go over all the pathToAliases, and for each path, if it is from intermediate dir path, add this path's size to all aliases. And finally based on the size information and others like aliasToTask to choose the big table. 5) Conditional task's children contains wrong options, which may cause join fail or incorrect results. Basically when getting all possible children for the conditional task, should use a whitelist of big tables. Only tables in this while list can be considered as a big table. Here is the logic: + * Get a list of big table candidates. Only the tables in the returned set can + * be used as big table in the join operation. + * + * The logic here is to scan the join condition array from left to right. If + * see a inner join and the bigTableCandidates is empty, add both side of this + * inner join to big table candidates. If see a left outer join, and the + * bigTableCandidates is empty, add the left side to it, and if the + * bigTableCandidates is not empty, do nothing (which means the + * bigTableCandidates is from left side). If see a right outer join, clear the + * bigTableCandidates, and add right side to the bigTableCandidates, it means + * the right side of a right outer join always win. If see a full outer join, + * return null immediately (no one can be the big table, can not do a + * mapjoin). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: auto map join bug
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/559/ --- Review request for hive. Summary --- auto map join bug Diffs - trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 1088810 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 1088810 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinResolver.java 1088810 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java 1088810 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverCommonJoin.java 1088810 trunk/ql/src/test/queries/clientpositive/auto_join28.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/auto_join29.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/auto_join30.q PRE-CREATION trunk/ql/src/test/results/clientpositive/auto_join12.q.out 1088810 trunk/ql/src/test/results/clientpositive/auto_join20.q.out 1088810 trunk/ql/src/test/results/clientpositive/auto_join21.q.out 1088810 trunk/ql/src/test/results/clientpositive/auto_join28.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/auto_join29.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/auto_join3.q.out 1088810 trunk/ql/src/test/results/clientpositive/auto_join30.q.out PRE-CREATION Diff: https://reviews.apache.org/r/559/diff Testing --- yes. Thanks, Yongqiang
[jira] [Commented] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017066#comment-13017066 ] John Sichi commented on HIVE-1803: -- OK, maybe the TestMTQueries failure was a side-effect of the other failures...I'll retry with your latest patch. Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.11.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, unit-tests.patch Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1803: - Status: Open (was: Patch Available) Some other stuff got committed in between which is causing conflicts when I try patch -p0 unit-tests.2.patch Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.11.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, unit-tests.patch Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017069#comment-13017069 ] Marquis Wang commented on HIVE-1803: I re-pulled from trunk and made a new patch and there was no difference between the two. If you have the original unit-tests.patch applied then this patch will fail. Can you try patching HIVE-1803.11.patch followed by unit-tests.2.patch on a clean checkout? Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.11.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, unit-tests.patch Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Status: Patch Available (was: Open) Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.11.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, unit-tests.patch Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017088#comment-13017088 ] John Sichi commented on HIVE-1803: -- That's what I did, and the conflicts match files which were in very recent commits. Are you sure you did svn update? If you're using git, there may be some lag in the replica. Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.11.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, unit-tests.patch Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1803: - Status: Open (was: Patch Available) Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.11.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, unit-tests.patch Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: for HIVE-2068
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/540/ --- Review request for hive and namit jain. Summary --- For HIVE-2068 This addresses bug HIVE-2068. https://issues.apache.org/jira/browse/HIVE-2068 Diffs - trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1086466 trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1086466 trunk/conf/hive-default.xml 1086466 trunk/hwi/src/java/org/apache/hadoop/hive/hwi/HWISessionItem.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/CommandNeedRetryException.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SamplePruner.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/LimitDesc.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessor.java 1086466 trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 1086466 trunk/ql/src/test/queries/clientpositive/global_limit.q PRE-CREATION trunk/ql/src/test/results/clientpositive/global_limit.q.out PRE-CREATION trunk/service/src/java/org/apache/hadoop/hive/service/HiveServer.java 1086466 Diff: https://reviews.apache.org/r/540/diff Testing --- added a test to test suite. Thanks, Siying
[jira] [Commented] (HIVE-2068) Speed up query select xx,xx from xxx LIMIT xxx if no filtering or aggregation
[ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017092#comment-13017092 ] jirapos...@reviews.apache.org commented on HIVE-2068: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/540/ --- Review request for hive and namit jain. Summary --- For HIVE-2068 This addresses bug HIVE-2068. https://issues.apache.org/jira/browse/HIVE-2068 Diffs - trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1086466 trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1086466 trunk/conf/hive-default.xml 1086466 trunk/hwi/src/java/org/apache/hadoop/hive/hwi/HWISessionItem.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/CommandNeedRetryException.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SamplePruner.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/LimitDesc.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessor.java 1086466 trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 1086466 trunk/ql/src/test/queries/clientpositive/global_limit.q PRE-CREATION trunk/ql/src/test/results/clientpositive/global_limit.q.out PRE-CREATION trunk/service/src/java/org/apache/hadoop/hive/service/HiveServer.java 1086466 Diff: https://reviews.apache.org/r/540/diff Testing --- added a test to test suite. Thanks, Siying Speed up query select xx,xx from xxx LIMIT xxx if no filtering or aggregation --- Key: HIVE-2068 URL: https://issues.apache.org/jira/browse/HIVE-2068 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch Currently, select xx,xx from xxx where ...(only partition conditions) LIMIT xxx will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-trunk-h0.20 #659
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/659/ -- [...truncated 29862 lines...] [junit] OK [junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-04-07_12-29-45_398_101051018938832795/-mr-1 [junit] Total MapReduce jobs = 1 [junit] Launching Job 1 out of 1 [junit] Number of reduce tasks determined at compile time: 1 [junit] In order to change the average load for a reducer (in bytes): [junit] set hive.exec.reducers.bytes.per.reducer=number [junit] In order to limit the maximum number of reducers: [junit] set hive.exec.reducers.max=number [junit] In order to set a constant number of reducers: [junit] set mapred.reduce.tasks=number [junit] Job running in-process (local Hadoop) [junit] Hadoop job information for null: number of mappers: 0; number of reducers: 0 [junit] 2011-04-07 12:29:48,481 null map = 100%, reduce = 100% [junit] Ended Job = job_local_0001 [junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-04-07_12-29-45_398_101051018938832795/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201104071229_13744265.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] PREHOOK: Output: default@testhivedrivertable [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-04-07_12-29-49_964_8300355644500983998/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-04-07_12-29-49_964_8300355644500983998/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201104071229_852872020.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE
[jira] [Commented] (HIVE-2093) inputs are outputs should be populated for create/drop database
[ https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017097#comment-13017097 ] Siying Dong commented on HIVE-2093: --- Namit, do you mean we should add LOCK DATABASE? Looks like we don't have the syntax at all. inputs are outputs should be populated for create/drop database --- Key: HIVE-2093 URL: https://issues.apache.org/jira/browse/HIVE-2093 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE.2093.1.patch This is needed for many other things: concurrency, authorization etc. to work -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Jenkins build is back to normal : Hive-0.7.0-h0.20 #69
See https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/69/
[jira] [Created] (HIVE-2099) GROUP BY rules not applied correctly for select *
GROUP BY rules not applied correctly for select * - Key: HIVE-2099 URL: https://issues.apache.org/jira/browse/HIVE-2099 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi This fails as expected: select foo, bar from pokes group by foo; This succeeds, which is incorrect: select * from pokes group by foo; I verified this as far back as 0.6, so maybe it has always been this way. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2093) inputs are outputs should be populated for create/drop database
[ https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017115#comment-13017115 ] He Yongqiang commented on HIVE-2093: Siying, can you add cleanup of db/user privilege in QTestUtils? inputs are outputs should be populated for create/drop database --- Key: HIVE-2093 URL: https://issues.apache.org/jira/browse/HIVE-2093 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE.2093.1.patch This is needed for many other things: concurrency, authorization etc. to work -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2065) RCFile issues
[ https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017121#comment-13017121 ] He Yongqiang commented on HIVE-2065: why is there a minor.version needed in metadata? Mostly looks good, but i prefer not to change the code too much as some other external things are depending on it. RCFile issues - Key: HIVE-2065 URL: https://issues.apache.org/jira/browse/HIVE-2065 Project: Hive Issue Type: Bug Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor Attachments: HIVE.2065.patch.0.txt, HIVE.2065.patch.1.txt, Slide1.png, proposal.png Some potential issues with RCFile 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per yongqiang he, the class is not meant to be thread-safe (and it is not). Might as well get rid of the confusing and performance-impacting lock acquisitions. 2. Record Length overstated for compressed files. IIUC, the key compression happens after we have written the record length. {code} int keyLength = key.getSize(); if (keyLength 0) { throw new IOException(negative length keys not allowed: + key); } out.writeInt(keyLength + valueLength); // total record length out.writeInt(keyLength); // key portion length if (!isCompressed()) { out.writeInt(keyLength); key.write(out); // key } else { keyCompressionBuffer.reset(); keyDeflateFilter.resetState(); key.write(keyDeflateOut); keyDeflateOut.flush(); keyDeflateFilter.finish(); int compressedKeyLen = keyCompressionBuffer.getLength(); out.writeInt(compressedKeyLen); out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen); } {code} 3. For sequence file compatibility, the compressed key length should be the next field to record length, not the uncompressed key length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-1644 Use filter pushdown for automatically accessing indexes
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/558/#review399 --- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/558/#comment748 For consistency with my review in HIVE-1694, I suggest hive.optimize.index.filter as the name for this configuration parameter. (In HIVE-1694 I suggested hive.optimize.index.groupby, and we want it to be possible to enable/disable them independently) common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/558/#comment749 In line with the previous comment, suggest hive.optimize.index.filter.compact.minSize/maxSize. Namit's suggestion for minSize was 5G. I think the default for maxSize should be infinity (I can't think of a case where we want it in effect by default). ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java https://reviews.apache.org/r/558/#comment750 HIVE-1803 is changing this to hive.index.blockfilter.file. Assuming that gets committed first, we should use that, since it's generic rather than tied to the index type. ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java https://reviews.apache.org/r/558/#comment751 What are the units here? Also, don't use colon after parameter name. ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java https://reviews.apache.org/r/558/#comment752 The non-functional changes in this file are gonna conflict with HIVE-1803, so get rid of them. ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java https://reviews.apache.org/r/558/#comment755 Use HiveUtils.unparseIdentifier for quoting table names in generated SQL. ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java https://reviews.apache.org/r/558/#comment756 Isn't it incorrect to set properties on the original table scan here since this is only tentative? ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java https://reviews.apache.org/r/558/#comment757 Likewise, modifying inputs is incorrect before we have a definite plan. Some more work on the new HiveIndexHandler interface method is required for resolving this plus the residuals. ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java https://reviews.apache.org/r/558/#comment753 If searchConditions.size() == 0, it means we didn't find anything which could be handled by the index. In that case, we should bail out immediately and not try to do anything more with this index. ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java https://reviews.apache.org/r/558/#comment759 We collect the residual here, but we don't do anything with it. Don't we need to pass it back so that Hive can decide what to leave in the Filter operator? ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java https://reviews.apache.org/r/558/#comment760 The list actually contains index objects, not index table names. Also typo: is exists ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java https://reviews.apache.org/r/558/#comment761 Only cast once. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java https://reviews.apache.org/r/558/#comment764 Indentation is wrong here. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java https://reviews.apache.org/r/558/#comment763 In my review for HIVE-1694, I noted that we should not be swallowing exceptions. I think some of this code was copied from there. If we can't access the metastore during optimization, it should be treated as a fatal error. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java https://reviews.apache.org/r/558/#comment765 The plan still looks wrong (there are two Stage-0's, one for the index scan, one for the final fetch), so the relabeling is still not quite working correctly. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java https://reviews.apache.org/r/558/#comment766 no space after ! ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java https://reviews.apache.org/r/558/#comment767 Suggested rename for method: arePartitionsCoveredByIndex ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java https://reviews.apache.org/r/558/#comment768 This checks that the metadata matches. But it does not actually check that the index partitions exist. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java https://reviews.apache.org/r/558/#comment769
[jira] [Updated] (HIVE-1644) use filter pushdown for automatically accessing indexes
[ https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1644: - Status: Open (was: Patch Available) I added comments in review board. use filter pushdown for automatically accessing indexes --- Key: HIVE-1644 URL: https://issues.apache.org/jira/browse/HIVE-1644 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: John Sichi Assignee: Russell Melick Attachments: HIVE-1644.1.patch, HIVE-1644.10.patch, HIVE-1644.11.patch, HIVE-1644.12.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, HIVE-1644.4.patch, HIVE-1644.5.patch, HIVE-1644.6.patch, HIVE-1644.7.patch, HIVE-1644.8.patch, HIVE-1644.9.patch HIVE-1226 provides utilities for analyzing filters which have been pushed down to a table scan. The next step is to use these for selecting available indexes and generating access plans for those indexes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1644) use filter pushdown for automatically accessing indexes
[ https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017164#comment-13017164 ] jirapos...@reviews.apache.org commented on HIVE-1644: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/558/#review399 --- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/558/#comment748 For consistency with my review in HIVE-1694, I suggest hive.optimize.index.filter as the name for this configuration parameter. (In HIVE-1694 I suggested hive.optimize.index.groupby, and we want it to be possible to enable/disable them independently) common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/558/#comment749 In line with the previous comment, suggest hive.optimize.index.filter.compact.minSize/maxSize. Namit's suggestion for minSize was 5G. I think the default for maxSize should be infinity (I can't think of a case where we want it in effect by default). ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java https://reviews.apache.org/r/558/#comment750 HIVE-1803 is changing this to hive.index.blockfilter.file. Assuming that gets committed first, we should use that, since it's generic rather than tied to the index type. ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java https://reviews.apache.org/r/558/#comment751 What are the units here? Also, don't use colon after parameter name. ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java https://reviews.apache.org/r/558/#comment752 The non-functional changes in this file are gonna conflict with HIVE-1803, so get rid of them. ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java https://reviews.apache.org/r/558/#comment755 Use HiveUtils.unparseIdentifier for quoting table names in generated SQL. ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java https://reviews.apache.org/r/558/#comment756 Isn't it incorrect to set properties on the original table scan here since this is only tentative? ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java https://reviews.apache.org/r/558/#comment757 Likewise, modifying inputs is incorrect before we have a definite plan. Some more work on the new HiveIndexHandler interface method is required for resolving this plus the residuals. ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java https://reviews.apache.org/r/558/#comment753 If searchConditions.size() == 0, it means we didn't find anything which could be handled by the index. In that case, we should bail out immediately and not try to do anything more with this index. ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java https://reviews.apache.org/r/558/#comment759 We collect the residual here, but we don't do anything with it. Don't we need to pass it back so that Hive can decide what to leave in the Filter operator? ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java https://reviews.apache.org/r/558/#comment760 The list actually contains index objects, not index table names. Also typo: is exists ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java https://reviews.apache.org/r/558/#comment761 Only cast once. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java https://reviews.apache.org/r/558/#comment764 Indentation is wrong here. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java https://reviews.apache.org/r/558/#comment763 In my review for HIVE-1694, I noted that we should not be swallowing exceptions. I think some of this code was copied from there. If we can't access the metastore during optimization, it should be treated as a fatal error. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java https://reviews.apache.org/r/558/#comment765 The plan still looks wrong (there are two Stage-0's, one for the index scan, one for the final fetch), so the relabeling is still not quite working correctly. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java https://reviews.apache.org/r/558/#comment766 no space after ! ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java https://reviews.apache.org/r/558/#comment767 Suggested rename for method: arePartitionsCoveredByIndex ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
[jira] [Updated] (HIVE-2095) auto convert map join bug
[ https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-2095: - Status: Open (was: Patch Available) auto convert map join bug - Key: HIVE-2095 URL: https://issues.apache.org/jira/browse/HIVE-2095 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-2095.1.patch 1) when considering to choose one table as the big table candidate for a map join, if at compile time, hive can find out that the total known size of all other tables excluding the big table in consideration is bigger than a configured value, this big table candidate is a bad one, and should not put into plan. Otherwise, at runtime to filter this out may cause more time. 2) added a null check for back up tasks. Otherwise will see NullPointerException 3) CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise it will make wrong decision. 4) changes made to the ConditionalResolverCommonJoin: added pathToAliases, aliasToSize (alias's input size that is known at compile time, by inputSummary), and intermediate dir path. So the logic is, go over all the pathToAliases, and for each path, if it is from intermediate dir path, add this path's size to all aliases. And finally based on the size information and others like aliasToTask to choose the big table. 5) Conditional task's children contains wrong options, which may cause join fail or incorrect results. Basically when getting all possible children for the conditional task, should use a whitelist of big tables. Only tables in this while list can be considered as a big table. Here is the logic: + * Get a list of big table candidates. Only the tables in the returned set can + * be used as big table in the join operation. + * + * The logic here is to scan the join condition array from left to right. If + * see a inner join and the bigTableCandidates is empty, add both side of this + * inner join to big table candidates. If see a left outer join, and the + * bigTableCandidates is empty, add the left side to it, and if the + * bigTableCandidates is not empty, do nothing (which means the + * bigTableCandidates is from left side). If see a right outer join, clear the + * bigTableCandidates, and add right side to the bigTableCandidates, it means + * the right side of a right outer join always win. If see a full outer join, + * return null immediately (no one can be the big table, can not do a + * mapjoin). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-trunk-h0.20 #660
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/660/changes Changes: [namit] HIVE-2082 Reduce memory consumption in preparing MapReduce job (Ning Zhang via namit) -- [...truncated 29863 lines...] [junit] OK [junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-04-07_15-51-46_165_2513310220195627021/-mr-1 [junit] Total MapReduce jobs = 1 [junit] Launching Job 1 out of 1 [junit] Number of reduce tasks determined at compile time: 1 [junit] In order to change the average load for a reducer (in bytes): [junit] set hive.exec.reducers.bytes.per.reducer=number [junit] In order to limit the maximum number of reducers: [junit] set hive.exec.reducers.max=number [junit] In order to set a constant number of reducers: [junit] set mapred.reduce.tasks=number [junit] Job running in-process (local Hadoop) [junit] Hadoop job information for null: number of mappers: 0; number of reducers: 0 [junit] 2011-04-07 15:51:49,223 null map = 100%, reduce = 100% [junit] Ended Job = job_local_0001 [junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-04-07_15-51-46_165_2513310220195627021/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201104071551_1925633887.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] PREHOOK: Output: default@testhivedrivertable [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-04-07_15-51-50_712_3119286579259358399/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-04-07_15-51-50_712_3119286579259358399/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201104071551_1526356288.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output:
Review Request: review board for HIVE-2093
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/566/ --- Review request for hive, Yongqiang He and namit jain. Summary --- Still need to change some old tests' outputs. This addresses bug HIVE-2093. https://issues.apache.org/jira/browse/HIVE-2093 Diffs - trunk/ql/src/test/results/clientnegative/lockneg8.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/lockneg9.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/add_part_exist.q.out 1089697 trunk/ql/src/test/results/clientpositive/alter1.q.out 1089697 trunk/ql/src/test/results/clientnegative/lockneg7.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/lockneg6.q.out PRE-CREATION trunk/ql/src/test/queries/clientnegative/lockneg9.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/database.q 1089697 trunk/ql/src/test/results/clientnegative/authorization_fail_create_db.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/authorization_fail_drop_db.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/database_create_already_exists.q.out 1089697 trunk/ql/src/test/results/clientnegative/database_create_invalid_name.q.out 1089697 trunk/ql/src/test/results/clientnegative/database_drop_does_not_exist.q.out 1089697 trunk/ql/src/test/results/clientnegative/database_drop_not_empty.q.out 1089697 trunk/ql/src/test/queries/clientnegative/authorization_fail_create_db.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/lockneg6.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/lockneg7.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/lockneg8.q PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ShowLocksDesc.java 1089697 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1089697 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 1089697 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/DDLWork.java 1089697 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java 1089697 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1089697 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1089697 trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1089697 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1089697 trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java 1089697 trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/WriteEntity.java 1089697 trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java 1089697 trunk/ql/src/test/results/clientpositive/alter2.q.out 1089697 trunk/ql/src/test/results/clientpositive/alter3.q.out 1089697 trunk/ql/src/test/results/clientpositive/alter4.q.out 1089697 trunk/ql/src/test/results/clientpositive/authorization_5.q.out 1089697 trunk/ql/src/test/results/clientpositive/database.q.out 1089697 Diff: https://reviews.apache.org/r/566/diff Testing --- Thanks, Siying
[jira] [Commented] (HIVE-2093) inputs are outputs should be populated for create/drop database
[ https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017253#comment-13017253 ] Siying Dong commented on HIVE-2093: --- created review board: https://reviews.apache.org/r/566 inputs are outputs should be populated for create/drop database --- Key: HIVE-2093 URL: https://issues.apache.org/jira/browse/HIVE-2093 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE.2093.1.patch, HIVE.2093.2.patch This is needed for many other things: concurrency, authorization etc. to work -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2093) inputs are outputs should be populated for create/drop database
[ https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017254#comment-13017254 ] jirapos...@reviews.apache.org commented on HIVE-2093: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/566/ --- Review request for hive, Yongqiang He and namit jain. Summary --- Still need to change some old tests' outputs. This addresses bug HIVE-2093. https://issues.apache.org/jira/browse/HIVE-2093 Diffs - trunk/ql/src/test/results/clientnegative/lockneg8.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/lockneg9.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/add_part_exist.q.out 1089697 trunk/ql/src/test/results/clientpositive/alter1.q.out 1089697 trunk/ql/src/test/results/clientnegative/lockneg7.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/lockneg6.q.out PRE-CREATION trunk/ql/src/test/queries/clientnegative/lockneg9.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/database.q 1089697 trunk/ql/src/test/results/clientnegative/authorization_fail_create_db.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/authorization_fail_drop_db.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/database_create_already_exists.q.out 1089697 trunk/ql/src/test/results/clientnegative/database_create_invalid_name.q.out 1089697 trunk/ql/src/test/results/clientnegative/database_drop_does_not_exist.q.out 1089697 trunk/ql/src/test/results/clientnegative/database_drop_not_empty.q.out 1089697 trunk/ql/src/test/queries/clientnegative/authorization_fail_create_db.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/lockneg6.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/lockneg7.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/lockneg8.q PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ShowLocksDesc.java 1089697 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1089697 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 1089697 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/DDLWork.java 1089697 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java 1089697 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1089697 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1089697 trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1089697 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1089697 trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java 1089697 trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/WriteEntity.java 1089697 trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java 1089697 trunk/ql/src/test/results/clientpositive/alter2.q.out 1089697 trunk/ql/src/test/results/clientpositive/alter3.q.out 1089697 trunk/ql/src/test/results/clientpositive/alter4.q.out 1089697 trunk/ql/src/test/results/clientpositive/authorization_5.q.out 1089697 trunk/ql/src/test/results/clientpositive/database.q.out 1089697 Diff: https://reviews.apache.org/r/566/diff Testing --- Thanks, Siying inputs are outputs should be populated for create/drop database --- Key: HIVE-2093 URL: https://issues.apache.org/jira/browse/HIVE-2093 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE.2093.1.patch, HIVE.2093.2.patch This is needed for many other things: concurrency, authorization etc. to work -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2095) auto convert map join bug
[ https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017291#comment-13017291 ] He Yongqiang commented on HIVE-2095: Uploading a new patch to address namit's comments. Note, there is an existing bug in hive that cause results of auto_join29.q is not correct. Let's file another jira for it. basically, if the outer join filter is enabled, the query SELECT /*+mapjoin(src1, src2)*/ * FROM src src1 RIGHT OUTER JOIN src src2 ON (src1.key = src2.key AND src1.key 10 AND src2.key 10) JOIN src src3 ON (src2.key = src3.key AND src3.key 10) SORT BY src1.key, src1.value, src2.key, src2.value, src3.key, src3.value; will give wrong results in today's hive. auto convert map join bug - Key: HIVE-2095 URL: https://issues.apache.org/jira/browse/HIVE-2095 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-2095.1.patch, HIVE-2095.2.patch 1) when considering to choose one table as the big table candidate for a map join, if at compile time, hive can find out that the total known size of all other tables excluding the big table in consideration is bigger than a configured value, this big table candidate is a bad one, and should not put into plan. Otherwise, at runtime to filter this out may cause more time. 2) added a null check for back up tasks. Otherwise will see NullPointerException 3) CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise it will make wrong decision. 4) changes made to the ConditionalResolverCommonJoin: added pathToAliases, aliasToSize (alias's input size that is known at compile time, by inputSummary), and intermediate dir path. So the logic is, go over all the pathToAliases, and for each path, if it is from intermediate dir path, add this path's size to all aliases. And finally based on the size information and others like aliasToTask to choose the big table. 5) Conditional task's children contains wrong options, which may cause join fail or incorrect results. Basically when getting all possible children for the conditional task, should use a whitelist of big tables. Only tables in this while list can be considered as a big table. Here is the logic: + * Get a list of big table candidates. Only the tables in the returned set can + * be used as big table in the join operation. + * + * The logic here is to scan the join condition array from left to right. If + * see a inner join and the bigTableCandidates is empty, add both side of this + * inner join to big table candidates. If see a left outer join, and the + * bigTableCandidates is empty, add the left side to it, and if the + * bigTableCandidates is not empty, do nothing (which means the + * bigTableCandidates is from left side). If see a right outer join, clear the + * bigTableCandidates, and add right side to the bigTableCandidates, it means + * the right side of a right outer join always win. If see a full outer join, + * return null immediately (no one can be the big table, can not do a + * mapjoin). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira