[jira] [Updated] (HIVE-2095) auto convert map join bug

2011-04-07 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2095:
---

Description: 
1) 
when considering to choose one table as the big table candidate for a map join, 
if at compile time, hive can find out that the total known size of all other 
tables excluding the big table in consideration is bigger than a configured 
value, this big table candidate is a bad one, and should not put into plan. 
Otherwise, at runtime to filter this out may cause more time.

2)
added a null check for back up tasks. Otherwise will see NullPointerException

3)
CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise it 
will make wrong decision.

4)
changes made to the ConditionalResolverCommonJoin: added pathToAliases, 
aliasToSize (alias's input size that is known at compile time, by 
inputSummary), and intermediate dir path.
So the logic is, go over all the pathToAliases, and for each path, if it is 
from intermediate dir path, add this path's size to all aliases. And finally 
based on the size information and others like aliasToTask to choose the big 
table. 

5)
Conditional task's children contains wrong options, which may cause join fail 
or incorrect results. Basically when getting all possible children for the 
conditional task, should use a whitelist of big tables. Only tables in this 
while list can be considered as a big table.
Here is the logic:

+   * Get a list of big table candidates. Only the tables in the returned set 
can
+   * be used as big table in the join operation.
+   * 
+   * The logic here is to scan the join condition array from left to right. If
+   * see a inner join and the bigTableCandidates is empty, add both side of 
this
+   * inner join to big table candidates. If see a left outer join, and the
+   * bigTableCandidates is empty, add the left side to it, and if the
+   * bigTableCandidates is not empty, do nothing (which means the
+   * bigTableCandidates is from left side). If see a right outer join, clear 
the
+   * bigTableCandidates, and add right side to the bigTableCandidates, it means
+   * the right side of a right outer join always win. If see a full outer join,
+   * return null immediately (no one can be the big table, can not do a
+   * mapjoin).


  was:If auto convert join is set to true, it should fall back to common join 
if the input size of each join table is bigger than a configured value.

Summary: auto convert map join bug  (was: auto convert map join should 
not be triggered if the input size is bigger than a configured value.)

 auto convert map join bug
 -

 Key: HIVE-2095
 URL: https://issues.apache.org/jira/browse/HIVE-2095
 Project: Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-2095.1.patch


 1) 
 when considering to choose one table as the big table candidate for a map 
 join, if at compile time, hive can find out that the total known size of all 
 other tables excluding the big table in consideration is bigger than a 
 configured value, this big table candidate is a bad one, and should not put 
 into plan. Otherwise, at runtime to filter this out may cause more time.
 2)
 added a null check for back up tasks. Otherwise will see NullPointerException
 3)
 CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise 
 it will make wrong decision.
 4)
 changes made to the ConditionalResolverCommonJoin: added pathToAliases, 
 aliasToSize (alias's input size that is known at compile time, by 
 inputSummary), and intermediate dir path.
 So the logic is, go over all the pathToAliases, and for each path, if it is 
 from intermediate dir path, add this path's size to all aliases. And finally 
 based on the size information and others like aliasToTask to choose the big 
 table. 
 5)
 Conditional task's children contains wrong options, which may cause join fail 
 or incorrect results. Basically when getting all possible children for the 
 conditional task, should use a whitelist of big tables. Only tables in this 
 while list can be considered as a big table.
 Here is the logic:
 +   * Get a list of big table candidates. Only the tables in the returned set 
 can
 +   * be used as big table in the join operation.
 +   * 
 +   * The logic here is to scan the join condition array from left to right. 
 If
 +   * see a inner join and the bigTableCandidates is empty, add both side of 
 this
 +   * inner join to big table candidates. If see a left outer join, and the
 +   * bigTableCandidates is empty, add the left side to it, and if the
 +   * bigTableCandidates is not empty, do nothing (which means the
 +   * bigTableCandidates is from left side). If see a right outer join, clear 
 the
 +   * bigTableCandidates, and add right side to the bigTableCandidates, it 
 

[jira] [Updated] (HIVE-2095) auto convert map join bug

2011-04-07 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2095:
---

Attachment: HIVE-2095.1.patch

 auto convert map join bug
 -

 Key: HIVE-2095
 URL: https://issues.apache.org/jira/browse/HIVE-2095
 Project: Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-2095.1.patch


 1) 
 when considering to choose one table as the big table candidate for a map 
 join, if at compile time, hive can find out that the total known size of all 
 other tables excluding the big table in consideration is bigger than a 
 configured value, this big table candidate is a bad one, and should not put 
 into plan. Otherwise, at runtime to filter this out may cause more time.
 2)
 added a null check for back up tasks. Otherwise will see NullPointerException
 3)
 CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise 
 it will make wrong decision.
 4)
 changes made to the ConditionalResolverCommonJoin: added pathToAliases, 
 aliasToSize (alias's input size that is known at compile time, by 
 inputSummary), and intermediate dir path.
 So the logic is, go over all the pathToAliases, and for each path, if it is 
 from intermediate dir path, add this path's size to all aliases. And finally 
 based on the size information and others like aliasToTask to choose the big 
 table. 
 5)
 Conditional task's children contains wrong options, which may cause join fail 
 or incorrect results. Basically when getting all possible children for the 
 conditional task, should use a whitelist of big tables. Only tables in this 
 while list can be considered as a big table.
 Here is the logic:
 +   * Get a list of big table candidates. Only the tables in the returned set 
 can
 +   * be used as big table in the join operation.
 +   * 
 +   * The logic here is to scan the join condition array from left to right. 
 If
 +   * see a inner join and the bigTableCandidates is empty, add both side of 
 this
 +   * inner join to big table candidates. If see a left outer join, and the
 +   * bigTableCandidates is empty, add the left side to it, and if the
 +   * bigTableCandidates is not empty, do nothing (which means the
 +   * bigTableCandidates is from left side). If see a right outer join, clear 
 the
 +   * bigTableCandidates, and add right side to the bigTableCandidates, it 
 means
 +   * the right side of a right outer join always win. If see a full outer 
 join,
 +   * return null immediately (no one can be the big table, can not do a
 +   * mapjoin).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive

2011-04-07 Thread Marquis Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marquis Wang updated HIVE-1803:
---

Attachment: unit-tests.2.patch

New unit tests patch that should fix some more tests.

John, I didn't see any failures in TestMTQueries even before adding this new 
patch. I'm not sure why that would be, but I definitely fixed some things in 
the other two tests.

Also this patch only includes the unit tests, so you will need to include patch 
11 as well.

 Implement bitmap indexing in Hive
 -

 Key: HIVE-1803
 URL: https://issues.apache.org/jira/browse/HIVE-1803
 Project: Hive
  Issue Type: New Feature
  Components: Indexing
Reporter: Marquis Wang
Assignee: Marquis Wang
 Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, 
 HIVE-1803.11.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, 
 HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, 
 HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, 
 bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, 
 unit-tests.patch


 Implement bitmap index handler to complement compact indexing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: How To Use Hive

2011-04-07 Thread Geoff Howard
Answered on the users list.

On Thu, Apr 7, 2011 at 12:00 AM, komara nagarjuna
komaranagarj...@gmail.com wrote:
  Sir,

     I am new to Hadoop and Hive. Now i am developing an application Hadoop
 with Hive
  in MultiNode cluster. I install and run Hadoop successfully in master and
 slave
  machines successfully.Hive installed in Master machine and it also
 connecting to
 database through mysql.

     In Hive, I create a table successfully.My problem is how to insert data
 to hive tables.
 How to communicate Hadoop with Hive.How to use Hive datawarehouse.What is
 the
 purpose of hive.

     Please explain how to use Hive in real time.


 *Thanks  Regards*,
 *Nagarjuna komara.*



[jira] [Created] (HIVE-2098) Make couple of convenience methods in EximUtil public

2011-04-07 Thread Krishna Kumar (JIRA)
Make couple of convenience methods in EximUtil public
-

 Key: HIVE-2098
 URL: https://issues.apache.org/jira/browse/HIVE-2098
 Project: Hive
  Issue Type: Bug
Reporter: Krishna Kumar
Priority: Minor


readMetaData() and createExportDump() to be public

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-2098) Make couple of convenience methods in EximUtil public

2011-04-07 Thread Krishna Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar reassigned HIVE-2098:
---

Assignee: Krishna Kumar

 Make couple of convenience methods in EximUtil public
 -

 Key: HIVE-2098
 URL: https://issues.apache.org/jira/browse/HIVE-2098
 Project: Hive
  Issue Type: Bug
Reporter: Krishna Kumar
Assignee: Krishna Kumar
Priority: Minor

 readMetaData() and createExportDump() to be public

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2098) Make couple of convenience methods in EximUtil public

2011-04-07 Thread Krishna Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2098:


Attachment: HIVE.2098.patch.0.txt

Couple of methods made public for use outside of the package.

 Make couple of convenience methods in EximUtil public
 -

 Key: HIVE-2098
 URL: https://issues.apache.org/jira/browse/HIVE-2098
 Project: Hive
  Issue Type: Bug
Reporter: Krishna Kumar
Assignee: Krishna Kumar
Priority: Minor
 Attachments: HIVE.2098.patch.0.txt


 readMetaData() and createExportDump() to be public

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2098) Make couple of convenience methods in EximUtil public

2011-04-07 Thread Krishna Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2098:


Status: Patch Available  (was: Open)

 Make couple of convenience methods in EximUtil public
 -

 Key: HIVE-2098
 URL: https://issues.apache.org/jira/browse/HIVE-2098
 Project: Hive
  Issue Type: Bug
Reporter: Krishna Kumar
Assignee: Krishna Kumar
Priority: Minor
 Attachments: HIVE.2098.patch.0.txt


 readMetaData() and createExportDump() to be public

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2095) auto convert map join bug

2011-04-07 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016871#comment-13016871
 ] 

Liyin Tang commented on HIVE-2095:
--

I will take a look

 auto convert map join bug
 -

 Key: HIVE-2095
 URL: https://issues.apache.org/jira/browse/HIVE-2095
 Project: Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-2095.1.patch


 1) 
 when considering to choose one table as the big table candidate for a map 
 join, if at compile time, hive can find out that the total known size of all 
 other tables excluding the big table in consideration is bigger than a 
 configured value, this big table candidate is a bad one, and should not put 
 into plan. Otherwise, at runtime to filter this out may cause more time.
 2)
 added a null check for back up tasks. Otherwise will see NullPointerException
 3)
 CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise 
 it will make wrong decision.
 4)
 changes made to the ConditionalResolverCommonJoin: added pathToAliases, 
 aliasToSize (alias's input size that is known at compile time, by 
 inputSummary), and intermediate dir path.
 So the logic is, go over all the pathToAliases, and for each path, if it is 
 from intermediate dir path, add this path's size to all aliases. And finally 
 based on the size information and others like aliasToTask to choose the big 
 table. 
 5)
 Conditional task's children contains wrong options, which may cause join fail 
 or incorrect results. Basically when getting all possible children for the 
 conditional task, should use a whitelist of big tables. Only tables in this 
 while list can be considered as a big table.
 Here is the logic:
 +   * Get a list of big table candidates. Only the tables in the returned set 
 can
 +   * be used as big table in the join operation.
 +   * 
 +   * The logic here is to scan the join condition array from left to right. 
 If
 +   * see a inner join and the bigTableCandidates is empty, add both side of 
 this
 +   * inner join to big table candidates. If see a left outer join, and the
 +   * bigTableCandidates is empty, add the left side to it, and if the
 +   * bigTableCandidates is not empty, do nothing (which means the
 +   * bigTableCandidates is from left side). If see a right outer join, clear 
 the
 +   * bigTableCandidates, and add right side to the bigTableCandidates, it 
 means
 +   * the right side of a right outer join always win. If see a full outer 
 join,
 +   * return null immediately (no one can be the big table, can not do a
 +   * mapjoin).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-2082) Reduce memory consumption in preparing MapReduce job

2011-04-07 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-2082.
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Committed. Thanks Ning

 Reduce memory consumption in preparing MapReduce job
 

 Key: HIVE-2082
 URL: https://issues.apache.org/jira/browse/HIVE-2082
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2082.patch, HIVE-2082.patch, HIVE-2082.patch


 Hive client side consume a lot of memory when the number of input partitions 
 is large. One reason is that each partition maintains a list of FieldSchema 
 which are intended to deal with schema evolution. However they are not used 
 currently and Hive uses the table level schema for all partitions. This will 
 be fixed in HIVE-2050. The memory consumption by this part will be reduced by 
 almost half (1.2GB to 700BM for 20k partitions). 
 Another large chunk of memory consumption is in the MapReduce job setup phase 
 when a PartitionDesc is created from each Partition object. A property object 
 is maintained in PartitionDesc which contains a full list of columns and 
 types. Due to the same reason, these should be the same as in the table level 
 schema. Also the deserializer initialization takes large amount of memory, 
 which should be avoided. My initial testing for these optimizations cut the 
 memory consumption in half (700MB to 300MB for 20k partitions). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2093) inputs are outputs should be populated for create/drop database

2011-04-07 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-2093:
-

Status: Open  (was: Patch Available)

 inputs are outputs should be populated for create/drop database
 ---

 Key: HIVE-2093
 URL: https://issues.apache.org/jira/browse/HIVE-2093
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE.2093.1.patch


 This is needed for many other things: concurrency, authorization etc. to work

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2093) inputs are outputs should be populated for create/drop database

2011-04-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017034#comment-13017034
 ] 

Namit Jain commented on HIVE-2093:
--

The changes to inputs/outputs look good - 
Yongqiang, can you confirm the authorization changes ?

If we are supporting this, we should also support
LOCK DATABASE DB_NAME in the same patch.

Also, can you add a negative test with LOCK DATABASE .. ?


 inputs are outputs should be populated for create/drop database
 ---

 Key: HIVE-2093
 URL: https://issues.apache.org/jira/browse/HIVE-2093
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE.2093.1.patch


 This is needed for many other things: concurrency, authorization etc. to work

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2095) auto convert map join bug

2011-04-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017049#comment-13017049
 ] 

Namit Jain commented on HIVE-2095:
--

Can you also create a review-board request ?

 auto convert map join bug
 -

 Key: HIVE-2095
 URL: https://issues.apache.org/jira/browse/HIVE-2095
 Project: Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-2095.1.patch


 1) 
 when considering to choose one table as the big table candidate for a map 
 join, if at compile time, hive can find out that the total known size of all 
 other tables excluding the big table in consideration is bigger than a 
 configured value, this big table candidate is a bad one, and should not put 
 into plan. Otherwise, at runtime to filter this out may cause more time.
 2)
 added a null check for back up tasks. Otherwise will see NullPointerException
 3)
 CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise 
 it will make wrong decision.
 4)
 changes made to the ConditionalResolverCommonJoin: added pathToAliases, 
 aliasToSize (alias's input size that is known at compile time, by 
 inputSummary), and intermediate dir path.
 So the logic is, go over all the pathToAliases, and for each path, if it is 
 from intermediate dir path, add this path's size to all aliases. And finally 
 based on the size information and others like aliasToTask to choose the big 
 table. 
 5)
 Conditional task's children contains wrong options, which may cause join fail 
 or incorrect results. Basically when getting all possible children for the 
 conditional task, should use a whitelist of big tables. Only tables in this 
 while list can be considered as a big table.
 Here is the logic:
 +   * Get a list of big table candidates. Only the tables in the returned set 
 can
 +   * be used as big table in the join operation.
 +   * 
 +   * The logic here is to scan the join condition array from left to right. 
 If
 +   * see a inner join and the bigTableCandidates is empty, add both side of 
 this
 +   * inner join to big table candidates. If see a left outer join, and the
 +   * bigTableCandidates is empty, add the left side to it, and if the
 +   * bigTableCandidates is not empty, do nothing (which means the
 +   * bigTableCandidates is from left side). If see a right outer join, clear 
 the
 +   * bigTableCandidates, and add right side to the bigTableCandidates, it 
 means
 +   * the right side of a right outer join always win. If see a full outer 
 join,
 +   * return null immediately (no one can be the big table, can not do a
 +   * mapjoin).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: auto map join bug

2011-04-07 Thread Yongqiang He

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/559/
---

Review request for hive.


Summary
---

auto map join bug


Diffs
-

  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 1088810 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 
1088810 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinResolver.java
 1088810 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java
 1088810 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverCommonJoin.java
 1088810 
  trunk/ql/src/test/queries/clientpositive/auto_join28.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/auto_join29.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/auto_join30.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/auto_join12.q.out 1088810 
  trunk/ql/src/test/results/clientpositive/auto_join20.q.out 1088810 
  trunk/ql/src/test/results/clientpositive/auto_join21.q.out 1088810 
  trunk/ql/src/test/results/clientpositive/auto_join28.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/auto_join29.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/auto_join3.q.out 1088810 
  trunk/ql/src/test/results/clientpositive/auto_join30.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/559/diff


Testing
---

yes.


Thanks,

Yongqiang



[jira] [Commented] (HIVE-1803) Implement bitmap indexing in Hive

2011-04-07 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017066#comment-13017066
 ] 

John Sichi commented on HIVE-1803:
--

OK, maybe the TestMTQueries failure was a side-effect of the other 
failures...I'll retry with your latest patch.


 Implement bitmap indexing in Hive
 -

 Key: HIVE-1803
 URL: https://issues.apache.org/jira/browse/HIVE-1803
 Project: Hive
  Issue Type: New Feature
  Components: Indexing
Reporter: Marquis Wang
Assignee: Marquis Wang
 Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, 
 HIVE-1803.11.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, 
 HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, 
 HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, 
 bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, 
 unit-tests.patch


 Implement bitmap index handler to complement compact indexing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive

2011-04-07 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1803:
-

Status: Open  (was: Patch Available)

Some other stuff got committed in between which is causing conflicts when I try 
patch -p0  unit-tests.2.patch

 Implement bitmap indexing in Hive
 -

 Key: HIVE-1803
 URL: https://issues.apache.org/jira/browse/HIVE-1803
 Project: Hive
  Issue Type: New Feature
  Components: Indexing
Reporter: Marquis Wang
Assignee: Marquis Wang
 Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, 
 HIVE-1803.11.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, 
 HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, 
 HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, 
 bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, 
 unit-tests.patch


 Implement bitmap index handler to complement compact indexing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1803) Implement bitmap indexing in Hive

2011-04-07 Thread Marquis Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017069#comment-13017069
 ] 

Marquis Wang commented on HIVE-1803:


I re-pulled from trunk and made a new patch and there was no difference between 
the two. If you have the original unit-tests.patch applied then this patch will 
fail. Can you try patching HIVE-1803.11.patch followed by unit-tests.2.patch on 
a clean checkout?

 Implement bitmap indexing in Hive
 -

 Key: HIVE-1803
 URL: https://issues.apache.org/jira/browse/HIVE-1803
 Project: Hive
  Issue Type: New Feature
  Components: Indexing
Reporter: Marquis Wang
Assignee: Marquis Wang
 Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, 
 HIVE-1803.11.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, 
 HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, 
 HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, 
 bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, 
 unit-tests.patch


 Implement bitmap index handler to complement compact indexing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive

2011-04-07 Thread Marquis Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marquis Wang updated HIVE-1803:
---

Status: Patch Available  (was: Open)

 Implement bitmap indexing in Hive
 -

 Key: HIVE-1803
 URL: https://issues.apache.org/jira/browse/HIVE-1803
 Project: Hive
  Issue Type: New Feature
  Components: Indexing
Reporter: Marquis Wang
Assignee: Marquis Wang
 Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, 
 HIVE-1803.11.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, 
 HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, 
 HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, 
 bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, 
 unit-tests.patch


 Implement bitmap index handler to complement compact indexing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1803) Implement bitmap indexing in Hive

2011-04-07 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017088#comment-13017088
 ] 

John Sichi commented on HIVE-1803:
--

That's what I did, and the conflicts match files which were in very recent 
commits.  

Are you sure you did svn update?  If you're using git, there may be some lag in 
the replica.


 Implement bitmap indexing in Hive
 -

 Key: HIVE-1803
 URL: https://issues.apache.org/jira/browse/HIVE-1803
 Project: Hive
  Issue Type: New Feature
  Components: Indexing
Reporter: Marquis Wang
Assignee: Marquis Wang
 Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, 
 HIVE-1803.11.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, 
 HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, 
 HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, 
 bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, 
 unit-tests.patch


 Implement bitmap index handler to complement compact indexing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive

2011-04-07 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1803:
-

Status: Open  (was: Patch Available)

 Implement bitmap indexing in Hive
 -

 Key: HIVE-1803
 URL: https://issues.apache.org/jira/browse/HIVE-1803
 Project: Hive
  Issue Type: New Feature
  Components: Indexing
Reporter: Marquis Wang
Assignee: Marquis Wang
 Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, 
 HIVE-1803.11.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, 
 HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, 
 HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, 
 bitmap_index_2.png, javaewah.jar, javaewah.jar, unit-tests.2.patch, 
 unit-tests.patch


 Implement bitmap index handler to complement compact indexing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: for HIVE-2068

2011-04-07 Thread Siying Dong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/540/
---

Review request for hive and namit jain.


Summary
---

For HIVE-2068


This addresses bug HIVE-2068.
https://issues.apache.org/jira/browse/HIVE-2068


Diffs
-

  trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1086466 
  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1086466 
  trunk/conf/hive-default.xml 1086466 
  trunk/hwi/src/java/org/apache/hadoop/hive/hwi/HWISessionItem.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/CommandNeedRetryException.java 
PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SamplePruner.java 
1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/LimitDesc.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessor.java 
1086466 
  trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 1086466 
  trunk/ql/src/test/queries/clientpositive/global_limit.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/global_limit.q.out PRE-CREATION 
  trunk/service/src/java/org/apache/hadoop/hive/service/HiveServer.java 1086466 

Diff: https://reviews.apache.org/r/540/diff


Testing
---

added a test to test suite.


Thanks,

Siying



[jira] [Commented] (HIVE-2068) Speed up query select xx,xx from xxx LIMIT xxx if no filtering or aggregation

2011-04-07 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017092#comment-13017092
 ] 

jirapos...@reviews.apache.org commented on HIVE-2068:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/540/
---

Review request for hive and namit jain.


Summary
---

For HIVE-2068


This addresses bug HIVE-2068.
https://issues.apache.org/jira/browse/HIVE-2068


Diffs
-

  trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1086466 
  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1086466 
  trunk/conf/hive-default.xml 1086466 
  trunk/hwi/src/java/org/apache/hadoop/hive/hwi/HWISessionItem.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/CommandNeedRetryException.java 
PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SamplePruner.java 
1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/LimitDesc.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessor.java 
1086466 
  trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 1086466 
  trunk/ql/src/test/queries/clientpositive/global_limit.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/global_limit.q.out PRE-CREATION 
  trunk/service/src/java/org/apache/hadoop/hive/service/HiveServer.java 1086466 

Diff: https://reviews.apache.org/r/540/diff


Testing
---

added a test to test suite.


Thanks,

Siying



 Speed up query select xx,xx from xxx LIMIT xxx if no filtering or 
 aggregation
 ---

 Key: HIVE-2068
 URL: https://issues.apache.org/jira/browse/HIVE-2068
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, 
 HIVE-2068.4.patch


 Currently, select xx,xx from xxx where ...(only partition conditions) LIMIT 
 xxx will start a MapReduce job with input to be the whole table or 
 partition. The latency can be huge if the table or partition is big. We could 
 reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-trunk-h0.20 #659

2011-04-07 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/659/

--
[...truncated 29862 lines...]
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-04-07_12-29-45_398_101051018938832795/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=number
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=number
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=number
[junit] Job running in-process (local Hadoop)
[junit] Hadoop job information for null: number of mappers: 0; number of 
reducers: 0
[junit] 2011-04-07 12:29:48,481 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-04-07_12-29-45_398_101051018938832795/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201104071229_13744265.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-04-07_12-29-49_964_8300355644500983998/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-04-07_12-29-49_964_8300355644500983998/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201104071229_852872020.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE

[jira] [Commented] (HIVE-2093) inputs are outputs should be populated for create/drop database

2011-04-07 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017097#comment-13017097
 ] 

Siying Dong commented on HIVE-2093:
---

Namit, do you mean we should add LOCK DATABASE? Looks like we don't have the 
syntax at all.

 inputs are outputs should be populated for create/drop database
 ---

 Key: HIVE-2093
 URL: https://issues.apache.org/jira/browse/HIVE-2093
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE.2093.1.patch


 This is needed for many other things: concurrency, authorization etc. to work

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Jenkins build is back to normal : Hive-0.7.0-h0.20 #69

2011-04-07 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/69/




[jira] [Created] (HIVE-2099) GROUP BY rules not applied correctly for select *

2011-04-07 Thread John Sichi (JIRA)
GROUP BY rules not applied correctly for select *
-

 Key: HIVE-2099
 URL: https://issues.apache.org/jira/browse/HIVE-2099
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi


This fails as expected:

select foo, bar from pokes group by foo;

This succeeds, which is incorrect:

select * from pokes group by foo;

I verified this as far back as 0.6, so maybe it has always been this way.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2093) inputs are outputs should be populated for create/drop database

2011-04-07 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017115#comment-13017115
 ] 

He Yongqiang commented on HIVE-2093:


Siying, can you add cleanup of db/user privilege in QTestUtils?

 inputs are outputs should be populated for create/drop database
 ---

 Key: HIVE-2093
 URL: https://issues.apache.org/jira/browse/HIVE-2093
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE.2093.1.patch


 This is needed for many other things: concurrency, authorization etc. to work

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2065) RCFile issues

2011-04-07 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017121#comment-13017121
 ] 

He Yongqiang commented on HIVE-2065:


why is there a minor.version needed in metadata? 
Mostly looks good, but i prefer not to change the code too much as some other 
external things are depending on it.

 RCFile issues
 -

 Key: HIVE-2065
 URL: https://issues.apache.org/jira/browse/HIVE-2065
 Project: Hive
  Issue Type: Bug
Reporter: Krishna Kumar
Assignee: Krishna Kumar
Priority: Minor
 Attachments: HIVE.2065.patch.0.txt, HIVE.2065.patch.1.txt, 
 Slide1.png, proposal.png


 Some potential issues with RCFile
 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
 yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
 as well get rid of the confusing and performance-impacting lock acquisitions.
 2. Record Length overstated for compressed files. IIUC, the key compression 
 happens after we have written the record length.
 {code}
   int keyLength = key.getSize();
   if (keyLength  0) {
 throw new IOException(negative length keys not allowed:  + key);
   }
   out.writeInt(keyLength + valueLength); // total record length
   out.writeInt(keyLength); // key portion length
   if (!isCompressed()) {
 out.writeInt(keyLength);
 key.write(out); // key
   } else {
 keyCompressionBuffer.reset();
 keyDeflateFilter.resetState();
 key.write(keyDeflateOut);
 keyDeflateOut.flush();
 keyDeflateFilter.finish();
 int compressedKeyLen = keyCompressionBuffer.getLength();
 out.writeInt(compressedKeyLen);
 out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
   }
 {code}
 3. For sequence file compatibility, the compressed key length should be the 
 next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-1644 Use filter pushdown for automatically accessing indexes

2011-04-07 Thread John Sichi

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/558/#review399
---



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/558/#comment748

For consistency with my review in HIVE-1694, I suggest 
hive.optimize.index.filter as the name for this configuration parameter.

(In HIVE-1694 I suggested hive.optimize.index.groupby, and we want it to be 
possible to enable/disable them independently)




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/558/#comment749

In line with the previous comment, suggest 
hive.optimize.index.filter.compact.minSize/maxSize.

Namit's suggestion for minSize was 5G.

I think the default for maxSize should be infinity (I can't think of a case 
where we want it in effect by default).



ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java
https://reviews.apache.org/r/558/#comment750

HIVE-1803 is changing this to hive.index.blockfilter.file.  Assuming that 
gets committed first, we should use that, since it's generic rather than tied 
to the index type.



ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java
https://reviews.apache.org/r/558/#comment751

What are the units here?  Also, don't use colon after parameter name.



ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
https://reviews.apache.org/r/558/#comment752

The non-functional changes in this file are gonna conflict with HIVE-1803, 
so get rid of them.



ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
https://reviews.apache.org/r/558/#comment755

Use HiveUtils.unparseIdentifier for quoting table names in generated SQL.



ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
https://reviews.apache.org/r/558/#comment756

Isn't it incorrect to set properties on the original table scan here since 
this is only tentative?



ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
https://reviews.apache.org/r/558/#comment757

Likewise, modifying inputs is incorrect before we have a definite plan.

Some more work on the new HiveIndexHandler interface method is required for 
resolving this plus the residuals.



ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
https://reviews.apache.org/r/558/#comment753

If searchConditions.size() == 0, it means we didn't find anything which 
could be handled by the index.  In that case, we should bail out immediately 
and not try to do anything more with this index.




ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
https://reviews.apache.org/r/558/#comment759

We collect the residual here, but we don't do anything with it.  Don't we 
need to pass it back so that Hive can decide what to leave in the Filter 
operator?




ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
https://reviews.apache.org/r/558/#comment760

The list actually contains index objects, not index table names.  Also 
typo: is exists



ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java
https://reviews.apache.org/r/558/#comment761

Only cast once.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
https://reviews.apache.org/r/558/#comment764

Indentation is wrong here.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
https://reviews.apache.org/r/558/#comment763

In my review for HIVE-1694, I noted that we should not be swallowing 
exceptions.  I think some of this code was copied from there.  If we can't 
access the metastore during optimization, it should be treated as a fatal error.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
https://reviews.apache.org/r/558/#comment765

The plan still looks wrong (there are two Stage-0's, one for the index 
scan, one for the final fetch), so the relabeling is still not quite working 
correctly.




ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
https://reviews.apache.org/r/558/#comment766

no space after !




ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
https://reviews.apache.org/r/558/#comment767

Suggested rename for method:  arePartitionsCoveredByIndex



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
https://reviews.apache.org/r/558/#comment768

This checks that the metadata matches.  But it does not actually check that 
the index partitions exist.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
https://reviews.apache.org/r/558/#comment769

[jira] [Updated] (HIVE-1644) use filter pushdown for automatically accessing indexes

2011-04-07 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1644:
-

Status: Open  (was: Patch Available)

I added comments in review board.


 use filter pushdown for automatically accessing indexes
 ---

 Key: HIVE-1644
 URL: https://issues.apache.org/jira/browse/HIVE-1644
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.8.0
Reporter: John Sichi
Assignee: Russell Melick
 Attachments: HIVE-1644.1.patch, HIVE-1644.10.patch, 
 HIVE-1644.11.patch, HIVE-1644.12.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, 
 HIVE-1644.4.patch, HIVE-1644.5.patch, HIVE-1644.6.patch, HIVE-1644.7.patch, 
 HIVE-1644.8.patch, HIVE-1644.9.patch


 HIVE-1226 provides utilities for analyzing filters which have been pushed 
 down to a table scan.  The next step is to use these for selecting available 
 indexes and generating access plans for those indexes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1644) use filter pushdown for automatically accessing indexes

2011-04-07 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017164#comment-13017164
 ] 

jirapos...@reviews.apache.org commented on HIVE-1644:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/558/#review399
---



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/558/#comment748

For consistency with my review in HIVE-1694, I suggest 
hive.optimize.index.filter as the name for this configuration parameter.

(In HIVE-1694 I suggested hive.optimize.index.groupby, and we want it to be 
possible to enable/disable them independently)




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/558/#comment749

In line with the previous comment, suggest 
hive.optimize.index.filter.compact.minSize/maxSize.

Namit's suggestion for minSize was 5G.

I think the default for maxSize should be infinity (I can't think of a case 
where we want it in effect by default).



ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java
https://reviews.apache.org/r/558/#comment750

HIVE-1803 is changing this to hive.index.blockfilter.file.  Assuming that 
gets committed first, we should use that, since it's generic rather than tied 
to the index type.



ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java
https://reviews.apache.org/r/558/#comment751

What are the units here?  Also, don't use colon after parameter name.



ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
https://reviews.apache.org/r/558/#comment752

The non-functional changes in this file are gonna conflict with HIVE-1803, 
so get rid of them.



ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
https://reviews.apache.org/r/558/#comment755

Use HiveUtils.unparseIdentifier for quoting table names in generated SQL.



ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
https://reviews.apache.org/r/558/#comment756

Isn't it incorrect to set properties on the original table scan here since 
this is only tentative?



ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
https://reviews.apache.org/r/558/#comment757

Likewise, modifying inputs is incorrect before we have a definite plan.

Some more work on the new HiveIndexHandler interface method is required for 
resolving this plus the residuals.



ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
https://reviews.apache.org/r/558/#comment753

If searchConditions.size() == 0, it means we didn't find anything which 
could be handled by the index.  In that case, we should bail out immediately 
and not try to do anything more with this index.




ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
https://reviews.apache.org/r/558/#comment759

We collect the residual here, but we don't do anything with it.  Don't we 
need to pass it back so that Hive can decide what to leave in the Filter 
operator?




ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
https://reviews.apache.org/r/558/#comment760

The list actually contains index objects, not index table names.  Also 
typo: is exists



ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java
https://reviews.apache.org/r/558/#comment761

Only cast once.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
https://reviews.apache.org/r/558/#comment764

Indentation is wrong here.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
https://reviews.apache.org/r/558/#comment763

In my review for HIVE-1694, I noted that we should not be swallowing 
exceptions.  I think some of this code was copied from there.  If we can't 
access the metastore during optimization, it should be treated as a fatal error.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
https://reviews.apache.org/r/558/#comment765

The plan still looks wrong (there are two Stage-0's, one for the index 
scan, one for the final fetch), so the relabeling is still not quite working 
correctly.




ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
https://reviews.apache.org/r/558/#comment766

no space after !




ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
https://reviews.apache.org/r/558/#comment767

Suggested rename for method:  arePartitionsCoveredByIndex



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java

[jira] [Updated] (HIVE-2095) auto convert map join bug

2011-04-07 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-2095:
-

Status: Open  (was: Patch Available)

 auto convert map join bug
 -

 Key: HIVE-2095
 URL: https://issues.apache.org/jira/browse/HIVE-2095
 Project: Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-2095.1.patch


 1) 
 when considering to choose one table as the big table candidate for a map 
 join, if at compile time, hive can find out that the total known size of all 
 other tables excluding the big table in consideration is bigger than a 
 configured value, this big table candidate is a bad one, and should not put 
 into plan. Otherwise, at runtime to filter this out may cause more time.
 2)
 added a null check for back up tasks. Otherwise will see NullPointerException
 3)
 CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise 
 it will make wrong decision.
 4)
 changes made to the ConditionalResolverCommonJoin: added pathToAliases, 
 aliasToSize (alias's input size that is known at compile time, by 
 inputSummary), and intermediate dir path.
 So the logic is, go over all the pathToAliases, and for each path, if it is 
 from intermediate dir path, add this path's size to all aliases. And finally 
 based on the size information and others like aliasToTask to choose the big 
 table. 
 5)
 Conditional task's children contains wrong options, which may cause join fail 
 or incorrect results. Basically when getting all possible children for the 
 conditional task, should use a whitelist of big tables. Only tables in this 
 while list can be considered as a big table.
 Here is the logic:
 +   * Get a list of big table candidates. Only the tables in the returned set 
 can
 +   * be used as big table in the join operation.
 +   * 
 +   * The logic here is to scan the join condition array from left to right. 
 If
 +   * see a inner join and the bigTableCandidates is empty, add both side of 
 this
 +   * inner join to big table candidates. If see a left outer join, and the
 +   * bigTableCandidates is empty, add the left side to it, and if the
 +   * bigTableCandidates is not empty, do nothing (which means the
 +   * bigTableCandidates is from left side). If see a right outer join, clear 
 the
 +   * bigTableCandidates, and add right side to the bigTableCandidates, it 
 means
 +   * the right side of a right outer join always win. If see a full outer 
 join,
 +   * return null immediately (no one can be the big table, can not do a
 +   * mapjoin).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-trunk-h0.20 #660

2011-04-07 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/660/changes

Changes:

[namit] HIVE-2082 Reduce memory consumption in preparing MapReduce job
(Ning Zhang via namit)

--
[...truncated 29863 lines...]
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-04-07_15-51-46_165_2513310220195627021/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=number
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=number
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=number
[junit] Job running in-process (local Hadoop)
[junit] Hadoop job information for null: number of mappers: 0; number of 
reducers: 0
[junit] 2011-04-07 15:51:49,223 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-04-07_15-51-46_165_2513310220195627021/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201104071551_1925633887.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-04-07_15-51-50_712_3119286579259358399/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-04-07_15-51-50_712_3119286579259358399/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201104071551_1526356288.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: 

Review Request: review board for HIVE-2093

2011-04-07 Thread Siying Dong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/566/
---

Review request for hive, Yongqiang He and namit jain.


Summary
---

Still need to change some old tests' outputs.


This addresses bug HIVE-2093.
https://issues.apache.org/jira/browse/HIVE-2093


Diffs
-

  trunk/ql/src/test/results/clientnegative/lockneg8.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/lockneg9.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/add_part_exist.q.out 1089697 
  trunk/ql/src/test/results/clientpositive/alter1.q.out 1089697 
  trunk/ql/src/test/results/clientnegative/lockneg7.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/lockneg6.q.out PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/lockneg9.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/database.q 1089697 
  trunk/ql/src/test/results/clientnegative/authorization_fail_create_db.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/authorization_fail_drop_db.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/database_create_already_exists.q.out 
1089697 
  trunk/ql/src/test/results/clientnegative/database_create_invalid_name.q.out 
1089697 
  trunk/ql/src/test/results/clientnegative/database_drop_does_not_exist.q.out 
1089697 
  trunk/ql/src/test/results/clientnegative/database_drop_not_empty.q.out 
1089697 
  trunk/ql/src/test/queries/clientnegative/authorization_fail_create_db.q 
PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/lockneg6.q PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/lockneg7.q PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/lockneg8.q PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ShowLocksDesc.java 1089697 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1089697 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 
1089697 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/DDLWork.java 1089697 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java 1089697 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
1089697 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
1089697 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1089697 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1089697 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java 1089697 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/WriteEntity.java 1089697 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
 1089697 
  trunk/ql/src/test/results/clientpositive/alter2.q.out 1089697 
  trunk/ql/src/test/results/clientpositive/alter3.q.out 1089697 
  trunk/ql/src/test/results/clientpositive/alter4.q.out 1089697 
  trunk/ql/src/test/results/clientpositive/authorization_5.q.out 1089697 
  trunk/ql/src/test/results/clientpositive/database.q.out 1089697 

Diff: https://reviews.apache.org/r/566/diff


Testing
---


Thanks,

Siying



[jira] [Commented] (HIVE-2093) inputs are outputs should be populated for create/drop database

2011-04-07 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017253#comment-13017253
 ] 

Siying Dong commented on HIVE-2093:
---

created review board: https://reviews.apache.org/r/566

 inputs are outputs should be populated for create/drop database
 ---

 Key: HIVE-2093
 URL: https://issues.apache.org/jira/browse/HIVE-2093
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE.2093.1.patch, HIVE.2093.2.patch


 This is needed for many other things: concurrency, authorization etc. to work

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2093) inputs are outputs should be populated for create/drop database

2011-04-07 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017254#comment-13017254
 ] 

jirapos...@reviews.apache.org commented on HIVE-2093:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/566/
---

Review request for hive, Yongqiang He and namit jain.


Summary
---

Still need to change some old tests' outputs.


This addresses bug HIVE-2093.
https://issues.apache.org/jira/browse/HIVE-2093


Diffs
-

  trunk/ql/src/test/results/clientnegative/lockneg8.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/lockneg9.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/add_part_exist.q.out 1089697 
  trunk/ql/src/test/results/clientpositive/alter1.q.out 1089697 
  trunk/ql/src/test/results/clientnegative/lockneg7.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/lockneg6.q.out PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/lockneg9.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/database.q 1089697 
  trunk/ql/src/test/results/clientnegative/authorization_fail_create_db.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/authorization_fail_drop_db.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/database_create_already_exists.q.out 
1089697 
  trunk/ql/src/test/results/clientnegative/database_create_invalid_name.q.out 
1089697 
  trunk/ql/src/test/results/clientnegative/database_drop_does_not_exist.q.out 
1089697 
  trunk/ql/src/test/results/clientnegative/database_drop_not_empty.q.out 
1089697 
  trunk/ql/src/test/queries/clientnegative/authorization_fail_create_db.q 
PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/lockneg6.q PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/lockneg7.q PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/lockneg8.q PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ShowLocksDesc.java 1089697 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1089697 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 
1089697 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/DDLWork.java 1089697 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java 1089697 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
1089697 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
1089697 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1089697 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1089697 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java 1089697 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/WriteEntity.java 1089697 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
 1089697 
  trunk/ql/src/test/results/clientpositive/alter2.q.out 1089697 
  trunk/ql/src/test/results/clientpositive/alter3.q.out 1089697 
  trunk/ql/src/test/results/clientpositive/alter4.q.out 1089697 
  trunk/ql/src/test/results/clientpositive/authorization_5.q.out 1089697 
  trunk/ql/src/test/results/clientpositive/database.q.out 1089697 

Diff: https://reviews.apache.org/r/566/diff


Testing
---


Thanks,

Siying



 inputs are outputs should be populated for create/drop database
 ---

 Key: HIVE-2093
 URL: https://issues.apache.org/jira/browse/HIVE-2093
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE.2093.1.patch, HIVE.2093.2.patch


 This is needed for many other things: concurrency, authorization etc. to work

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2095) auto convert map join bug

2011-04-07 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017291#comment-13017291
 ] 

He Yongqiang commented on HIVE-2095:


Uploading a new patch to address namit's comments.

Note, there is an existing bug in hive that cause results of auto_join29.q is 
not correct. 
Let's file another jira for it.
basically, if the outer join filter is enabled, the query SELECT 
/*+mapjoin(src1, src2)*/ * FROM src src1 RIGHT OUTER JOIN src src2 ON (src1.key 
= src2.key AND src1.key  10 AND src2.key  10) JOIN src src3 ON (src2.key = 
src3.key AND src3.key  10) SORT BY src1.key, src1.value, src2.key, src2.value, 
src3.key, src3.value; will give wrong results in today's hive.

 auto convert map join bug
 -

 Key: HIVE-2095
 URL: https://issues.apache.org/jira/browse/HIVE-2095
 Project: Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-2095.1.patch, HIVE-2095.2.patch


 1) 
 when considering to choose one table as the big table candidate for a map 
 join, if at compile time, hive can find out that the total known size of all 
 other tables excluding the big table in consideration is bigger than a 
 configured value, this big table candidate is a bad one, and should not put 
 into plan. Otherwise, at runtime to filter this out may cause more time.
 2)
 added a null check for back up tasks. Otherwise will see NullPointerException
 3)
 CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise 
 it will make wrong decision.
 4)
 changes made to the ConditionalResolverCommonJoin: added pathToAliases, 
 aliasToSize (alias's input size that is known at compile time, by 
 inputSummary), and intermediate dir path.
 So the logic is, go over all the pathToAliases, and for each path, if it is 
 from intermediate dir path, add this path's size to all aliases. And finally 
 based on the size information and others like aliasToTask to choose the big 
 table. 
 5)
 Conditional task's children contains wrong options, which may cause join fail 
 or incorrect results. Basically when getting all possible children for the 
 conditional task, should use a whitelist of big tables. Only tables in this 
 while list can be considered as a big table.
 Here is the logic:
 +   * Get a list of big table candidates. Only the tables in the returned set 
 can
 +   * be used as big table in the join operation.
 +   * 
 +   * The logic here is to scan the join condition array from left to right. 
 If
 +   * see a inner join and the bigTableCandidates is empty, add both side of 
 this
 +   * inner join to big table candidates. If see a left outer join, and the
 +   * bigTableCandidates is empty, add the left side to it, and if the
 +   * bigTableCandidates is not empty, do nothing (which means the
 +   * bigTableCandidates is from left side). If see a right outer join, clear 
 the
 +   * bigTableCandidates, and add right side to the bigTableCandidates, it 
 means
 +   * the right side of a right outer join always win. If see a full outer 
 join,
 +   * return null immediately (no one can be the big table, can not do a
 +   * mapjoin).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira