[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive

2011-03-22 Thread Marquis Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marquis Wang updated HIVE-1803:
---

Attachment: HIVE-1803.7.patch

New patch which I believe takes care of all the issues in the review for patch 
6.

 Implement bitmap indexing in Hive
 -

 Key: HIVE-1803
 URL: https://issues.apache.org/jira/browse/HIVE-1803
 Project: Hive
  Issue Type: New Feature
  Components: Indexing
Reporter: Marquis Wang
Assignee: Marquis Wang
 Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, 
 HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, 
 JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, 
 javaewah.jar


 Implement bitmap index handler to complement compact indexing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive

2011-03-22 Thread Marquis Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marquis Wang updated HIVE-1803:
---

Status: Patch Available  (was: Open)

 Implement bitmap indexing in Hive
 -

 Key: HIVE-1803
 URL: https://issues.apache.org/jira/browse/HIVE-1803
 Project: Hive
  Issue Type: New Feature
  Components: Indexing
Reporter: Marquis Wang
Assignee: Marquis Wang
 Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, 
 HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, 
 JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, 
 javaewah.jar


 Implement bitmap index handler to complement compact indexing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2042) In error scenario some opened streams may not closed in ExplainTask.java and Throttle.java

2011-03-22 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009567#comment-13009567
 ] 

Amareshwari Sriramadasu commented on HIVE-2042:
---

Running tests. Will commit if tests pass.

 In error scenario some opened streams may not closed in ExplainTask.java and 
 Throttle.java
 --

 Key: HIVE-2042
 URL: https://issues.apache.org/jira/browse/HIVE-2042
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2042.2.Patch, HIVE-2042.Patch


 1) In error scenario PrintStream may not be closed in execute() of  
 ExplainTask.java
 2) In error scenario InputStream may not be closed in checkJobTracker() of 
 Throttle.java 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Work around for using OR in Joins

2011-03-22 Thread MIS
I want to use OR in the join expression, but it seems only AND is supported
as of now.
I have a work around though to use DeMorgan's law {C1 OR C2 = !(!C1 AND
!C2))} , but it would be nice if somebody can point me to the location in
code base that would need modification to support the OR in the join
expression.

Thanks,
MIS.


Re: Work around for using OR in Joins

2011-03-22 Thread MIS
Found it at  *org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.java* line
no. 1122
There is some concern mentioned that supporting OR would lead to data
explosion. Is it discussed/documneted in a little more detail somewhere ? If
so, some pointers towards the same will be helpful.

Thanks,
MIS.

On Tue, Mar 22, 2011 at 1:19 PM, MIS misapa...@gmail.com wrote:

 I want to use OR in the join expression, but it seems only AND is supported
 as of now.
 I have a work around though to use DeMorgan's law {C1 OR C2 = !(!C1 AND
 !C2))} , but it would be nice if somebody can point me to the location in
 code base that would need modification to support the OR in the join
 expression.

 Thanks,
 MIS.



[jira] [Commented] (HIVE-2031) Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the l

2011-03-22 Thread Chinna Rao Lalam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009586#comment-13009586
 ] 

Chinna Rao Lalam commented on HIVE-2031:


Updated the patch with test cases.

 Correct the exception message for the better traceability for the scenario 
 load into the partitioned table having 2  partitions by specifying only one 
 partition in the load statement. 
 

 Key: HIVE-2031
 URL: https://issues.apache.org/jira/browse/HIVE-2031
 Project: Hive
  Issue Type: Bug
  Components: Logging
Affects Versions: 0.7.0
 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2031.2.patch, HIVE-2031.patch


  Load into the partitioned table having 2 partitions by specifying only one 
 partition in the load statement is failing and logging the following 
 exception message.
 {noformat}
  org.apache.hadoop.hive.ql.parse.SemanticException: line 1:91 Partition not 
 found '21Oct'
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:685)
   at 
 org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:196)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
   at 
 org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:151)
   at 
 org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:764)
   at 
 org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:742)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:619)
 {noformat}
 This needs to be corrected in such a way what is the actual root cause for 
 this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2042) In error scenario some opened streams may not closed in ExplainTask.java and Throttle.java

2011-03-22 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-2042:
--

   Resolution: Fixed
Fix Version/s: 0.8.0
   Status: Resolved  (was: Patch Available)

I just committed this. Thanks Chinna.

 In error scenario some opened streams may not closed in ExplainTask.java and 
 Throttle.java
 --

 Key: HIVE-2042
 URL: https://issues.apache.org/jira/browse/HIVE-2042
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: 0.8.0

 Attachments: HIVE-2042.2.Patch, HIVE-2042.Patch


 1) In error scenario PrintStream may not be closed in execute() of  
 ExplainTask.java
 2) In error scenario InputStream may not be closed in checkJobTracker() of 
 Throttle.java 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2042) In error scenario some opened streams may not closed

2011-03-22 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-2042:
--

Summary: In error scenario some opened streams may not closed  (was: In 
error scenario some opened streams may not closed in ExplainTask.java and 
Throttle.java)

 In error scenario some opened streams may not closed
 

 Key: HIVE-2042
 URL: https://issues.apache.org/jira/browse/HIVE-2042
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: 0.8.0

 Attachments: HIVE-2042.2.Patch, HIVE-2042.Patch


 1) In error scenario PrintStream may not be closed in execute() of  
 ExplainTask.java
 2) In error scenario InputStream may not be closed in checkJobTracker() of 
 Throttle.java 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2063) jdbc return only 1 collumn

2011-03-22 Thread Alexey Diomin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009623#comment-13009623
 ] 

Alexey Diomin commented on HIVE-2063:
-

wait
bug very interesting

reproducing on hadoop-0.20.2, but

on cloudera CDH3B4 bug not reproducing and apply patch break correct parsing 
input row,
as delimiter in input row is have code '1' (default)



 jdbc return only 1 collumn
 --

 Key: HIVE-2063
 URL: https://issues.apache.org/jira/browse/HIVE-2063
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.7.0
Reporter: Alexey Diomin
Assignee: Alexey Diomin
Priority: Critical
 Attachments: HIVE-2063.patch, HIVE-2063.patch


 we not set separator for data and all data return in first columns and all 
 other fields set NULL
 addition we get WARNING: Missing fields! Expected 27 fields but only got 1! 
 Ignoring similar problems.
 it's regresion after HIVE-1378 
 bug:
 use delimiter '\t' for fields on server side
 use default delimiter with code '1' on client side

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1538) FilterOperator is applied twice with ppd on.

2011-03-22 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-1538:
--

Attachment: patch-1538-2.txt

Added configuration hive.ppd.remove.duplicatefilters, with default value of 
true. Updated ppd tests to run with both configuration off and on.

 FilterOperator is applied twice with ppd on.
 

 Key: HIVE-1538
 URL: https://issues.apache.org/jira/browse/HIVE-1538
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1538-1.txt, patch-1538-2.txt, patch-1538.txt


 With hive.optimize.ppd set to true, FilterOperator is applied twice. And it 
 seems second operator is always filtering zero rows.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2049) Push down partition pruning to JDO filtering for a subset of partition predicates

2011-03-22 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-2049:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Ning

 Push down partition pruning to JDO filtering for a subset of partition 
 predicates
 -

 Key: HIVE-2049
 URL: https://issues.apache.org/jira/browse/HIVE-2049
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2049.2.patch, HIVE-2049.3.patch, HIVE-2049.4.patch, 
 HIVE-2049.patch


 Several tasks:
   - expose HiveMetaStoreClient.listPartitionsByFilter() to Hive.java so that 
 PartitionPruner can use that for certain partition predicates. 
   - figure out a safe subset of partition predicates that can be pushed down 
 to JDO filtering. 
 By my initial testing for the 2nd part is equality queries with AND/OR can be 
 pushed down and return correct results. However range queries on partition 
 columns gave NPE by the JDO execute() function. This might be a bug in the 
 JDO query string itself, but we need to figure it out and heavily test all 
 cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.

2011-03-22 Thread Krishna Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2003:


Status: Patch Available  (was: Open)

Please review asap as there are lots of changes to q.out files and any delay 
may cause another conflict/resolution cycle.

 LOAD compilation does not set the outputs during semantic analysis resulting 
 in no authorization checks being done for it.
 --

 Key: HIVE-2003
 URL: https://issues.apache.org/jira/browse/HIVE-2003
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Krishna Kumar
Assignee: Krishna Kumar
Priority: Minor
 Attachments: HIVE-2003.patch.1.txt, HIVE-2003.patch.txt


 The table/partition being loaded is not being added to outputs in the 
 LoadSemanticAnalyzer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.

2011-03-22 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009739#comment-13009739
 ] 

Namit Jain commented on HIVE-2003:
--

I will take a look right away

 LOAD compilation does not set the outputs during semantic analysis resulting 
 in no authorization checks being done for it.
 --

 Key: HIVE-2003
 URL: https://issues.apache.org/jira/browse/HIVE-2003
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Krishna Kumar
Assignee: Krishna Kumar
Priority: Minor
 Attachments: HIVE-2003.patch.1.txt, HIVE-2003.patch.txt


 The table/partition being loaded is not being added to outputs in the 
 LoadSemanticAnalyzer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.

2011-03-22 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009772#comment-13009772
 ] 

Namit Jain commented on HIVE-2003:
--

   Instead of adding a new configuration parameter which is being checked in
   EnforceReadOnlyTables, it might be easier to remove EnforceReadOnlyTables
   from the hive.exec.pre.hooks at creation time. But, this can be done in a
   follow-up also (if other things look good).

Will commit if tests pass, please file a follow-up jira for the cleanup 
mentioned
above.

 LOAD compilation does not set the outputs during semantic analysis resulting 
 in no authorization checks being done for it.
 --

 Key: HIVE-2003
 URL: https://issues.apache.org/jira/browse/HIVE-2003
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Krishna Kumar
Assignee: Krishna Kumar
Priority: Minor
 Attachments: HIVE-2003.patch.1.txt, HIVE-2003.patch.txt


 The table/partition being loaded is not being added to outputs in the 
 LoadSemanticAnalyzer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.7.0-h0.20 #50

2011-03-22 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/50/

--
[...truncated 27029 lines...]
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] Copying file: 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 INTO TABLE src
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.src
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 INTO TABLE src
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt'
 INTO TABLE src1
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt
[junit] Copying file: 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt
[junit] Loading data to table default.src1
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt'
 INTO TABLE src1
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src1
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq'
 INTO TABLE src_sequencefile
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq
[junit] Copying file: 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq
[junit] Loading data to table default.src_sequencefile
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq'
 INTO TABLE src_sequencefile
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_sequencefile
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq'
 INTO TABLE src_thrift
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq
[junit] Copying file: 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq
[junit] Loading data to table default.src_thrift
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq'
 INTO TABLE src_thrift
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_thrift
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt'
 INTO TABLE src_json
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt
[junit] Copying file: 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt
[junit] Loading data to table default.src_json
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt'
 INTO TABLE src_json
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_json
[junit] OK
[junit] diff 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/ql/test/logs/negative/wrong_distinct1.q.out
 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/ql/src/test/results/compiler/errors/wrong_distinct1.q.out
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/ql/tmp/hive_job_log_hudson_201103221207_951496250.txt
[junit] Done query: wrong_distinct1.q
[junit] Begin query: wrong_distinct2.q
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/ql/tmp/hive_job_log_hudson_201103221207_1182796463.txt
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-08',hr='11')
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt
[junit] Copying file: 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.srcpart partition (ds=2008-04-08, 
hr=11)
[junit] POSTHOOK: query: LOAD DATA 

Build failed in Jenkins: Hive-trunk-h0.20 #632

2011-03-22 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/632/

--
[...truncated 28061 lines...]
[junit] POSTHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-22_12-11-03_852_331101495470767803/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=number
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=number
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=number
[junit] Job running in-process (local Hadoop)
[junit] Hadoop job information for null: number of mappers: 0; number of 
reducers: 0
[junit] 2011-03-22 12:11:06,951 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-22_12-11-03_852_331101495470767803/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103221211_333569579.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-22_12-11-08_486_3558028685216045491/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-22_12-11-08_486_3558028685216045491/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103221211_1627999456.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)

[jira] [Created] (HIVE-2069) NullPointerException on getSchemas

2011-03-22 Thread Bennie Schut (JIRA)
NullPointerException on getSchemas
--

 Key: HIVE-2069
 URL: https://issues.apache.org/jira/browse/HIVE-2069
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.8.0
Reporter: Bennie Schut
Assignee: Bennie Schut
 Fix For: 0.8.0


Calling getSchemas will cause a nullpointerexception

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2069) NullPointerException on getSchemas

2011-03-22 Thread Bennie Schut (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009825#comment-13009825
 ] 

Bennie Schut commented on HIVE-2069:


java.lang.NullPointerException
at java.util.ArrayList.init(ArrayList.java:131)
at 
org.apache.hadoop.hive.jdbc.HiveMetaDataResultSet.init(HiveMetaDataResultSet.java:32)
at 
org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData$3.init(HiveDatabaseMetaData.java:481)
at 
org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.getSchemas(HiveDatabaseMetaData.java:480)
at 
org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.getSchemas(HiveDatabaseMetaData.java:475)
at 
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetSchemas(TestJdbcDriver.java:488)

Probably introduced on HIVE-1126. getCatalogs works correctly but this wasn't 
tested.

 NullPointerException on getSchemas
 --

 Key: HIVE-2069
 URL: https://issues.apache.org/jira/browse/HIVE-2069
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.8.0
Reporter: Bennie Schut
Assignee: Bennie Schut
 Fix For: 0.8.0


 Calling getSchemas will cause a nullpointerexception

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2069) NullPointerException on getSchemas

2011-03-22 Thread Bennie Schut (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bennie Schut updated HIVE-2069:
---

Attachment: HIVE-2069.1.patch.txt

This patch includes a fix and a test which can be used to reproduce the 
nullpointer.

 NullPointerException on getSchemas
 --

 Key: HIVE-2069
 URL: https://issues.apache.org/jira/browse/HIVE-2069
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.8.0
Reporter: Bennie Schut
Assignee: Bennie Schut
 Fix For: 0.8.0

 Attachments: HIVE-2069.1.patch.txt


 Calling getSchemas will cause a nullpointerexception

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2069) NullPointerException on getSchemas

2011-03-22 Thread Bennie Schut (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bennie Schut updated HIVE-2069:
---

Release Note: Fix for NullPointerException on the jdbc driver on getSchemas
  Status: Patch Available  (was: Open)

 NullPointerException on getSchemas
 --

 Key: HIVE-2069
 URL: https://issues.apache.org/jira/browse/HIVE-2069
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.8.0
Reporter: Bennie Schut
Assignee: Bennie Schut
 Fix For: 0.8.0

 Attachments: HIVE-2069.1.patch.txt


 Calling getSchemas will cause a nullpointerexception

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2069) NullPointerException on getSchemas

2011-03-22 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009841#comment-13009841
 ] 

Ning Zhang commented on HIVE-2069:
--

+1. will commit if tests pass. 

 NullPointerException on getSchemas
 --

 Key: HIVE-2069
 URL: https://issues.apache.org/jira/browse/HIVE-2069
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.8.0
Reporter: Bennie Schut
Assignee: Bennie Schut
 Fix For: 0.8.0

 Attachments: HIVE-2069.1.patch.txt


 Calling getSchemas will cause a nullpointerexception

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.

2011-03-22 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-2003:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Krishna

 LOAD compilation does not set the outputs during semantic analysis resulting 
 in no authorization checks being done for it.
 --

 Key: HIVE-2003
 URL: https://issues.apache.org/jira/browse/HIVE-2003
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Krishna Kumar
Assignee: Krishna Kumar
Priority: Minor
 Attachments: HIVE-2003.patch.1.txt, HIVE-2003.patch.txt


 The table/partition being loaded is not being added to outputs in the 
 LoadSemanticAnalyzer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: HIVE-2050. batch processing partition pruning process

2011-03-22 Thread Ning Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/522/
---

Review request for hive.


Summary
---

Introducing a new metastore API to retrieve a list of partitions in batch. 


Diffs
-

  trunk/metastore/if/hive_metastore.thrift 1084243 
  trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 1084243 
  trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 1084243 
  
trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp 
1084243 
  
trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java
 1084243 
  trunk/metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php 
1084243 
  
trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote 
1084243 
  trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 
1084243 
  trunk/metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 1084243 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
1084243 
  
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
 1084243 
  
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
1084243 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
1084243 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
1084243 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1084243 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartExprEvalUtils.java
 1084243 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
1084243 

Diff: https://reviews.apache.org/r/522/diff


Testing
---


Thanks,

Ning



[jira] [Updated] (HIVE-2050) batch processing partition pruning process

2011-03-22 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2050:
-

Status: Patch Available  (was: Open)

 batch processing partition pruning process
 --

 Key: HIVE-2050
 URL: https://issues.apache.org/jira/browse/HIVE-2050
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2050.patch


 For partition predicates that cannot be pushed down to JDO filtering 
 (HIVE-2049), we should fall back to the old approach of listing all partition 
 names first and use Hive's expression evaluation engine to select the correct 
 partitions. Then the partition pruner should hand Hive a list of partition 
 names and return a list of Partition Object (this should be added to the Hive 
 API). 
 A possible optimization is that the the partition pruner should give Hive a 
 set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
 the JDO query should be formulated as range queries. Range queries are 
 possible because the first step list all partition names in sorted order. 
 It's easy to come up with a range and it is guaranteed that the JDO range 
 query results should be equivalent to the query with a list of partition 
 names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2050) batch processing partition pruning process

2011-03-22 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2050:
-

Attachment: HIVE-2050.patch

Uploading a new patch for review. Still running tests. The review board 
request: https://reviews.apache.org/r/522/

 batch processing partition pruning process
 --

 Key: HIVE-2050
 URL: https://issues.apache.org/jira/browse/HIVE-2050
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2050.patch


 For partition predicates that cannot be pushed down to JDO filtering 
 (HIVE-2049), we should fall back to the old approach of listing all partition 
 names first and use Hive's expression evaluation engine to select the correct 
 partitions. Then the partition pruner should hand Hive a list of partition 
 names and return a list of Partition Object (this should be added to the Hive 
 API). 
 A possible optimization is that the the partition pruner should give Hive a 
 set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
 the JDO query should be formulated as range queries. Range queries are 
 possible because the first step list all partition names in sorted order. 
 It's easy to come up with a range and it is guaranteed that the JDO range 
 query results should be equivalent to the query with a list of partition 
 names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Meanings of privileges

2011-03-22 Thread Jonathan Natkins
Hi all,

I'm trying to understand the meaning of some of the privileges in the
system, and I'm a bit stumped on what some of them actually do.

Privileges that confuse me:
INDEX - my best guess is that this allows me to create/drop indexes on a
table?  Is it the case that if I have select access on a table, I can use
any index that exists on a table?
LOCK - Presumably this allows users to lock or unlock a table, so maybe a
better question is: are these locks like mutexes, where only I can access
the table, or is this literally locking down the table, so it can't be
modified in any way?
SHOW_DATABASE - I'm not sure what the scope of this one is: if I don't have
show_database access, can I not use the show database command? Or does this
extend to not being able to see the tables within a database?

It seems like you can grant some privileges on objects that don't have a lot
of meaning, i.e. create access on a table doesn't seem to have a lot of
semantic value, unless Hive requires that permission to create indexes on a
table, or something along those lines.  Similarly, I'm having a hard time
rationalizing why I can grant SHOW_DATABASE on a table.

Thanks a lot,
Jon


Re: Meanings of privileges

2011-03-22 Thread yongqiang he
INDEX - my best guess is that this allows me to create/drop indexes on a
table?
Yes. It is there for this purpose.

 Is it the case that if I have select access on a table, I can use
any index that exists on a table?
No. index is also a table now, so you need to have access to both of them.

LOCK - Presumably this allows users to lock or unlock a table, so maybe a
better question is: are these locks like mutexes, where only I can access
the table, or is this literally locking down the table, so it can't be
modified in any way?

Yes. If only you have lock privilege on this table, and concurrency is
enabled, no one will be able to run anything against the table.

SHOW_DATABASE - I'm not sure what the scope of this one is: if I don't have
show_database access, can I not use the show database command?

if you don't have show_database access, you should not be able to use
the show database command. I do not think today this privilege is
supported.

 create access on a table doesn't seem to have a lot of semantic value
i think create on a table means create partition

Similarly, I'm having a hard time rationalizing why I can grant SHOW_DATABASE 
on a table.
This should be a bug. Basically each privilege has its set of scope,
(can apply to db level or table level or column or user level,
non-exclusive)

Thanks
Yongqiang
On Tue, Mar 22, 2011 at 6:30 PM, Jonathan Natkins na...@cloudera.com wrote:
 Hi all,

 I'm trying to understand the meaning of some of the privileges in the
 system, and I'm a bit stumped on what some of them actually do.

 Privileges that confuse me:
 INDEX - my best guess is that this allows me to create/drop indexes on a
 table?  Is it the case that if I have select access on a table, I can use
 any index that exists on a table?
 LOCK - Presumably this allows users to lock or unlock a table, so maybe a
 better question is: are these locks like mutexes, where only I can access
 the table, or is this literally locking down the table, so it can't be
 modified in any way?
 SHOW_DATABASE - I'm not sure what the scope of this one is: if I don't have
 show_database access, can I not use the show database command? Or does this
 extend to not being able to see the tables within a database?

 It seems like you can grant some privileges on objects that don't have a lot
 of meaning, i.e. create access on a table doesn't seem to have a lot of
 semantic value, unless Hive requires that permission to create indexes on a
 table, or something along those lines.  Similarly, I'm having a hard time
 rationalizing why I can grant SHOW_DATABASE on a table.

 Thanks a lot,
 Jon



[jira] [Commented] (HIVE-2065) RCFile issues

2011-03-22 Thread Krishna Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009981#comment-13009981
 ] 

Krishna Kumar commented on HIVE-2065:
-

Hmm. #3 is taking me a bit too far than I originally thought. I assume being 
able to read an RCFile as SequenceFile is required, while being able to write 
an RCFile via the SequenceFile interface is desirable.

Having made changes so that record length is correctly set, in order to be able 
to make sure that the rcfile is handled correctly as a sequence file, the 
following changes are also required, IIUC.

 - the second field should be the key length (4 + compressed/plain key contents)
 - the key class (KeyBuffer) must be made responsible for reading/writing the 
next field - plain key contents length - as well as compression/decompression 
of the key contents
 - the value class (ValueBuffer) related changes will be trickier. Since the 
value is not compressed as a unit, we can not use record-compressed format. We 
need to mark the records as plain records, and move the codec to a metadata 
entry. Then the valueBuffer class will work correctly with sequencefile 
implementation.

Thoughts? worth it?


 RCFile issues
 -

 Key: HIVE-2065
 URL: https://issues.apache.org/jira/browse/HIVE-2065
 Project: Hive
  Issue Type: Bug
Reporter: Krishna Kumar
Assignee: Krishna Kumar
Priority: Minor
 Attachments: Slide1.png, proposal.png


 Some potential issues with RCFile
 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
 yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
 as well get rid of the confusing and performance-impacting lock acquisitions.
 2. Record Length overstated for compressed files. IIUC, the key compression 
 happens after we have written the record length.
 {code}
   int keyLength = key.getSize();
   if (keyLength  0) {
 throw new IOException(negative length keys not allowed:  + key);
   }
   out.writeInt(keyLength + valueLength); // total record length
   out.writeInt(keyLength); // key portion length
   if (!isCompressed()) {
 out.writeInt(keyLength);
 key.write(out); // key
   } else {
 keyCompressionBuffer.reset();
 keyDeflateFilter.resetState();
 key.write(keyDeflateOut);
 keyDeflateOut.flush();
 keyDeflateFilter.finish();
 int compressedKeyLen = keyCompressionBuffer.getLength();
 out.writeInt(compressedKeyLen);
 out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
   }
 {code}
 3. For sequence file compatibility, the compressed key length should be the 
 next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2065) RCFile issues

2011-03-22 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009982#comment-13009982
 ] 

He Yongqiang commented on HIVE-2065:


if being compatible with sequencefile does not break the rcfile's backward 
compatibility, it should be ok. But even after that, hive still won't be able 
to process it as a sequence file because of hive's serde layer.

 RCFile issues
 -

 Key: HIVE-2065
 URL: https://issues.apache.org/jira/browse/HIVE-2065
 Project: Hive
  Issue Type: Bug
Reporter: Krishna Kumar
Assignee: Krishna Kumar
Priority: Minor
 Attachments: Slide1.png, proposal.png


 Some potential issues with RCFile
 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
 yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
 as well get rid of the confusing and performance-impacting lock acquisitions.
 2. Record Length overstated for compressed files. IIUC, the key compression 
 happens after we have written the record length.
 {code}
   int keyLength = key.getSize();
   if (keyLength  0) {
 throw new IOException(negative length keys not allowed:  + key);
   }
   out.writeInt(keyLength + valueLength); // total record length
   out.writeInt(keyLength); // key portion length
   if (!isCompressed()) {
 out.writeInt(keyLength);
 key.write(out); // key
   } else {
 keyCompressionBuffer.reset();
 keyDeflateFilter.resetState();
 key.write(keyDeflateOut);
 keyDeflateOut.flush();
 keyDeflateFilter.finish();
 int compressedKeyLen = keyCompressionBuffer.getLength();
 out.writeInt(compressedKeyLen);
 out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
   }
 {code}
 3. For sequence file compatibility, the compressed key length should be the 
 next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2050) batch processing partition pruning process

2011-03-22 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009993#comment-13009993
 ] 

Ning Zhang commented on HIVE-2050:
--

passed all unit tests.

 batch processing partition pruning process
 --

 Key: HIVE-2050
 URL: https://issues.apache.org/jira/browse/HIVE-2050
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2050.patch


 For partition predicates that cannot be pushed down to JDO filtering 
 (HIVE-2049), we should fall back to the old approach of listing all partition 
 names first and use Hive's expression evaluation engine to select the correct 
 partitions. Then the partition pruner should hand Hive a list of partition 
 names and return a list of Partition Object (this should be added to the Hive 
 API). 
 A possible optimization is that the the partition pruner should give Hive a 
 set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
 the JDO query should be formulated as range queries. Range queries are 
 possible because the first step list all partition names in sorted order. 
 It's easy to come up with a range and it is guaranteed that the JDO range 
 query results should be equivalent to the query with a list of partition 
 names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Work around for using OR in Joins

2011-03-22 Thread Ning Zhang
Joins with OR conditions are not supported by Hive currently. I think even 
though you rewrite the condition to use NOT and AND only, the results may be 
wrong. 
 
It is quite hard to implement joins of any tables with OR conditions in a 
MapReduce framework. it is straightforward to implement it in nested-loop join, 
but due to the nature of distributed processing, nested loop join cannot be 
implemented in an efficient and scalable way in MapReduce. In nested-loop join, 
each mapper need to join a split of LHS table with the whole RHS table which 
could be terabytes. 

The regular (reduce-side) join in Hive is essentially a sort-merge join 
operator. With that in mind, it's hard to implement OR conditions in the 
sort-merge join. 

One exception is the map-side join, which assumes the RHS table is small and 
will be read fully into each mapper. Currently map-side join in Hive is a 
hash-based join operator. You can implement a nested-loop map-side join 
operator to enable any join conditions including OR. 

On Mar 22, 2011, at 1:39 AM, MIS wrote:

 Found it at  *org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.java* line
 no. 1122
 There is some concern mentioned that supporting OR would lead to data
 explosion. Is it discussed/documneted in a little more detail somewhere ? If
 so, some pointers towards the same will be helpful.
 
 Thanks,
 MIS.
 
 On Tue, Mar 22, 2011 at 1:19 PM, MIS misapa...@gmail.com wrote:
 
 I want to use OR in the join expression, but it seems only AND is supported
 as of now.
 I have a work around though to use DeMorgan's law {C1 OR C2 = !(!C1 AND
 !C2))} , but it would be nice if somebody can point me to the location in
 code base that would need modification to support the OR in the join
 expression.
 
 Thanks,
 MIS.
 



Bug in using columns with leading underscores in subqueries

2011-03-22 Thread Marquis Wang
Hi,

I believe I've found a bug in the semantic analyzer (or maybe something else?). 
It occurs when using a column with a leading underscore in a subquery.

 create table temp (`_col` int, key int);
 select key from temp;
 select `_col` from temp;
 select key from (select key from temp) t;

The above queries all work fine. 

 select `_col` from (select `_col` from temp) t;
 

This query fails with FAILED: Error in semantic analysis: line 1:7 Invalid 
Table Alias or Column Reference `_col`

The following query works in lieu of the above.

 select col as `_col` from (select `_col` as col from temp) t;
 

Thanks,
Marquis Wang
HMC Computer Science '11






Review Request: HIVE-2069: NullPointerException on getSchemas

2011-03-22 Thread Bennie Schut

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/521/
---

Review request for hive.


Summary
---

HIVE-2069: NullPointerException on getSchemas


This addresses bug HIVE-2069.
https://issues.apache.org/jira/browse/HIVE-2069


Diffs
-

  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveMetaDataResultSet.java 
1083926 
  trunk/jdbc/src/test/org/apache/hadoop/hive/jdbc/TestJdbcDriver.java 1083926 

Diff: https://reviews.apache.org/r/521/diff


Testing
---


Thanks,

Bennie



[jira] [Created] (HIVE-2071) enforcereadonlytables hook should not check a configuration variable

2011-03-22 Thread Namit Jain (JIRA)
enforcereadonlytables hook should not check a configuration variable


 Key: HIVE-2071
 URL: https://issues.apache.org/jira/browse/HIVE-2071
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Krishna Kumar


Instead of adding a new configuration parameter which is being checked in
EnforceReadOnlyTables, it might be easier to remove EnforceReadOnlyTables
from the hive.exec.pre.hooks at creation time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-1803: Implement bitmap indexing in Hive (new review starting from patch 6)

2011-03-22 Thread Marquis Wang


 On None, John Sichi wrote:
  ql/src/java/org/apache/hadoop/hive/ql/index/compact/HiveCompactIndexInputFormat.java,
   line 45
  https://reviews.apache.org/r/481/diff/1/?file=13771#file13771line45
 
  I'm confused about how the backwards compatibility works for the index 
  filename property...who uses this property name?

The property name is set on the command line when the index query is run (see 
the index_compact.q tests). This String is how the class knows where the index 
filename is stored.


 On None, John Sichi wrote:
  ql/build.xml, line 187
  https://reviews.apache.org/r/481/diff/1/?file=13758#file13758line187
 
  Why do you need to unpack the .jar?  And why to json/classes?

I was getting java.lang.NoClassDefFoundError: javaewah/EWAHCompressedBitmap 
errors at runtime without unpacking it. I guess I forgot to change the 
destination to something else when I copied that line. Is unpacking the .jar 
unnecessary? I'm not really familiar with how ivy(?) handles these libraries.


- Marquis


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/481/#review315
---


On 2011-03-08 16:27:50, John Sichi wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/481/
 ---
 
 (Updated 2011-03-08 16:27:50)
 
 
 Review request for hive.
 
 
 Summary
 ---
 
 Review board was giving me grief trying to update the old patch, so I'm 
 creating a fresh review request for HIVE-1803.6
 
 
 This addresses bug HIVE-1803.
 https://issues.apache.org/jira/browse/HIVE-1803
 
 
 Diffs
 -
 
   lib/README 1c2f0b1 
   lib/javaewah-0.2.jar PRE-CREATION 
   ql/build.xml 50c604e 
   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java ba222f3 
   ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java ff74f08 
   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 308d985 
   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/index/IndexMetadataChangeTask.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/index/IndexMetadataChangeWork.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapObjectInput.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapObjectOutput.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
 1f01446 
   
 ql/src/java/org/apache/hadoop/hive/ql/index/compact/HiveCompactIndexInputFormat.java
  6c320c5 
   
 ql/src/java/org/apache/hadoop/hive/ql/index/compact/HiveCompactIndexResult.java
  0c9ccea 
   
 ql/src/java/org/apache/hadoop/hive/ql/index/compact/IndexMetadataChangeTask.java
  eac168f 
   
 ql/src/java/org/apache/hadoop/hive/ql/index/compact/IndexMetadataChangeWork.java
  26beb4e 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java 
 391e5de 
   ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java 77220a1 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java 30714b8 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/AbstractGenericUDFEWAHBitmapOp.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEWAHBitmap.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapAnd.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapEmpty.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapOr.java 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/index_bitmap.q PRE-CREATION 
   ql/src/test/queries/clientpositive/index_bitmap1.q PRE-CREATION 
   ql/src/test/queries/clientpositive/index_bitmap2.q PRE-CREATION 
   ql/src/test/queries/clientpositive/index_bitmap3.q PRE-CREATION 
   ql/src/test/queries/clientpositive/index_compact.q 6547a52 
   ql/src/test/queries/clientpositive/index_compact_1.q 6d59353 
   ql/src/test/queries/clientpositive/index_compact_2.q 358b5e9 
   ql/src/test/queries/clientpositive/index_compact_3.q ee8abda 
   ql/src/test/queries/clientpositive/udf_bitmap_and.q PRE-CREATION 
   ql/src/test/queries/clientpositive/udf_bitmap_or.q PRE-CREATION 
   ql/src/test/results/clientpositive/index_bitmap.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/index_bitmap1.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/index_bitmap2.q.out PRE-CREATION 
   

Re: Review Request: HIVE-2054: fix for IOException on the jdbc driver on windows.

2011-03-22 Thread Bennie Schut

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/513/
---

(Updated 2011-03-21 12:50:40.422997)


Review request for hive.


Changes
---

New patch because of changes from HIVE-2062


Summary
---

HIVE-2054: fix for IOException on the jdbc driver on windows.


This addresses bug HIVE-2054.
https://issues.apache.org/jira/browse/HIVE-2054


Diffs (updated)
-

  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java 1083914 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HivePreparedStatement.java 
1083914 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveStatement.java 1083914 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/JdbcSessionState.java 1083914 

Diff: https://reviews.apache.org/r/513/diff


Testing
---


Thanks,

Bennie



Review Request: Patch for HIVE-2003: Load analysis should add table/partition to the outputs

2011-03-22 Thread Krishna

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/518/
---

Review request for hive.


Summary
---

Patch for HIVE-2003: Load analysis should add table/partition to the outputs


Diffs
-

  contrib/src/test/results/clientpositive/serde_regex.q.out c8b2dac 
  contrib/src/test/results/clientpositive/serde_s3.q.out 95cc726 
  ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 1ff9ea3 
  ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 892e759 
  ql/src/test/org/apache/hadoop/hive/ql/hooks/EnforceReadOnlyTables.java 
86a6d49 
  ql/src/test/queries/clientnegative/load_exist_part_authfail.q PRE-CREATION 
  ql/src/test/queries/clientnegative/load_nonpart_authfail.q PRE-CREATION 
  ql/src/test/queries/clientnegative/load_part_authfail.q PRE-CREATION 
  ql/src/test/queries/clientnegative/load_part_nospec.q PRE-CREATION 
  ql/src/test/queries/clientpositive/load_exist_part_authsuccess.q PRE-CREATION 
  ql/src/test/queries/clientpositive/load_nonpart_authsuccess.q PRE-CREATION 
  ql/src/test/queries/clientpositive/load_part_authsuccess.q PRE-CREATION 
  ql/src/test/results/clientnegative/exim_00_unsupported_schema.q.out 119510d 
  ql/src/test/results/clientnegative/exim_01_nonpart_over_loaded.q.out 242da6c 
  ql/src/test/results/clientnegative/exim_02_all_part_over_overlap.q.out 
b8b019b 
  ql/src/test/results/clientnegative/exim_03_nonpart_noncompat_colschema.q.out 
420eade 
  ql/src/test/results/clientnegative/exim_04_nonpart_noncompat_colnumber.q.out 
8b89284 
  ql/src/test/results/clientnegative/exim_05_nonpart_noncompat_coltype.q.out 
a07fb62 
  ql/src/test/results/clientnegative/exim_06_nonpart_noncompat_storage.q.out 
c7638d2 
  ql/src/test/results/clientnegative/exim_07_nonpart_noncompat_ifof.q.out 
3062dbe 
  ql/src/test/results/clientnegative/exim_08_nonpart_noncompat_serde.q.out 
f229498 
  ql/src/test/results/clientnegative/exim_09_nonpart_noncompat_serdeparam.q.out 
92c27ad 
  ql/src/test/results/clientnegative/exim_10_nonpart_noncompat_bucketing.q.out 
a98f4f9 
  ql/src/test/results/clientnegative/exim_11_nonpart_noncompat_sorting.q.out 
1fe4b50 
  ql/src/test/results/clientnegative/exim_13_nonnative_import.q.out 4c4297e 
  ql/src/test/results/clientnegative/exim_14_nonpart_part.q.out 04fa808 
  ql/src/test/results/clientnegative/exim_15_part_nonpart.q.out e1c67bb 
  ql/src/test/results/clientnegative/exim_16_part_noncompat_schema.q.out 
2393918 
  ql/src/test/results/clientnegative/exim_17_part_spec_underspec.q.out 7f29cb6 
  ql/src/test/results/clientnegative/exim_18_part_spec_missing.q.out 7f29cb6 
  ql/src/test/results/clientnegative/exim_19_external_over_existing.q.out 
0711b89 
  
ql/src/test/results/clientnegative/exim_20_managed_location_over_existing.q.out 
3ad0ad5 
  ql/src/test/results/clientnegative/exim_21_part_managed_external.q.out 
42c7600 
  ql/src/test/results/clientnegative/exim_23_import_exist_authfail.q.out 
8372910 
  ql/src/test/results/clientnegative/exim_24_import_part_authfail.q.out 0d82700 
  ql/src/test/results/clientnegative/exim_25_import_nonexist_authfail.q.out 
3814e14 
  ql/src/test/results/clientnegative/fetchtask_ioexception.q.out b9dd07c 
  ql/src/test/results/clientnegative/load_exist_part_authfail.q.out 
PRE-CREATION 
  ql/src/test/results/clientnegative/load_nonpart_authfail.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/load_part_authfail.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/load_part_nospec.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/load_wrong_fileformat.q.out 645e143 
  ql/src/test/results/clientnegative/load_wrong_fileformat_rc_seq.q.out 4809d31 
  ql/src/test/results/clientnegative/load_wrong_fileformat_txt_seq.q.out 
9b1ea48 
  ql/src/test/results/clientnegative/protectmode_part2.q.out daaae80 
  ql/src/test/results/clientpositive/alter3.q.out e6e5b49 
  ql/src/test/results/clientpositive/alter_merge.q.out 789ca14 
  ql/src/test/results/clientpositive/alter_merge_stats.q.out 5c9d387 
  ql/src/test/results/clientpositive/auto_join_filters.q.out 167c4b0 
  ql/src/test/results/clientpositive/auto_join_nulls.q.out 4ced637 
  ql/src/test/results/clientpositive/binarysortable_1.q.out a2e540e 
  ql/src/test/results/clientpositive/bucketizedhiveinputformat.q.out cd3489e 
  ql/src/test/results/clientpositive/bucketmapjoin1.q.out da27428 
  ql/src/test/results/clientpositive/bucketmapjoin2.q.out 4aeb731 
  ql/src/test/results/clientpositive/bucketmapjoin3.q.out 1109aae 
  ql/src/test/results/clientpositive/bucketmapjoin4.q.out a45b625 
  ql/src/test/results/clientpositive/bucketmapjoin5.q.out 3858ae0 
  ql/src/test/results/clientpositive/bucketmapjoin_negative.q.out c5b4a9c 
  ql/src/test/results/clientpositive/bucketmapjoin_negative2.q.out b320252 
  ql/src/test/results/clientpositive/count.q.out 0b4032c 
  

Re: skew join optimization

2011-03-22 Thread Ted Yu
How about link to http://imageshack.us/ or TinyPic ?

Thanks

On Sun, Mar 20, 2011 at 7:56 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 On Sun, Mar 20, 2011 at 10:30 AM, Ted Yu yuzhih...@gmail.com wrote:
  Can someone re-attach the missing figures for that wiki ?
 
  Thanks
 
  On Sun, Mar 20, 2011 at 7:15 AM, bharath vissapragada
  bharathvissapragada1...@gmail.com wrote:
 
  Hi Igor,
 
  See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the
  jira 1642 which automatically converts a normal join into map-join
  (Otherwise you can specify the mapjoin hints in the query itself.).
  Because your 'S' table is very small , it can be replicated across all
  the mappers and the reduce phase can be avoided. This can greatly
  reduce the runtime .. (See the results section in the page for
  details.).
 
  Hope this helps.
 
  Thanks
 
 
  On Sun, Mar 20, 2011 at 6:37 PM, Jov zhao6...@gmail.com wrote:
   2011/3/20 Igor Tatarinov i...@decide.com:
   I have the following join that takes 4.5 hours (with 12 nodes) mostly
   because of a single reduce task that gets the bulk of the work:
   SELECT ...
   FROM T
   LEFT OUTER JOIN S
   ON T.timestamp = S.timestamp and T.id = S.id
   This is a 1:0/1 join so the size of the output is exactly the same as
   the
   size of T (500M records). S is actually very small (5K).
   I've tried:
   - switching the order of the join conditions
   - using a different hash function setting (jenkins instead of murmur)
   - using SET set hive.auto.convert.join = true;
  
   are you sure your query convert to mapjoin? if not,try use explicit
   mapjoin hint.
  
  
   - using SET hive.optimize.skewjoin = true;
   but nothing helped :(
   Anything else I can try?
   Thanks!
  
 
 
 
  --
  Regards,
  Bharath .V
  w:http://research.iiit.ac.in/~bharath.v
 
 

 The wiki does not allow images, confluence does but we have not moved their
 yet.



Re: skew join optimization

2011-03-22 Thread Ted Yu
Can someone re-attach the missing figures for that wiki ?

Thanks

On Sun, Mar 20, 2011 at 7:15 AM, bharath vissapragada 
bharathvissapragada1...@gmail.com wrote:

 Hi Igor,

 See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the
 jira 1642 which automatically converts a normal join into map-join
 (Otherwise you can specify the mapjoin hints in the query itself.).
 Because your 'S' table is very small , it can be replicated across all
 the mappers and the reduce phase can be avoided. This can greatly
 reduce the runtime .. (See the results section in the page for
 details.).

 Hope this helps.

 Thanks


 On Sun, Mar 20, 2011 at 6:37 PM, Jov zhao6...@gmail.com wrote:
  2011/3/20 Igor Tatarinov i...@decide.com:
  I have the following join that takes 4.5 hours (with 12 nodes) mostly
  because of a single reduce task that gets the bulk of the work:
  SELECT ...
  FROM T
  LEFT OUTER JOIN S
  ON T.timestamp = S.timestamp and T.id = S.id
  This is a 1:0/1 join so the size of the output is exactly the same as
 the
  size of T (500M records). S is actually very small (5K).
  I've tried:
  - switching the order of the join conditions
  - using a different hash function setting (jenkins instead of murmur)
  - using SET set hive.auto.convert.join = true;
 
  are you sure your query convert to mapjoin? if not,try use explicit
  mapjoin hint.
 
 
  - using SET hive.optimize.skewjoin = true;
  but nothing helped :(
  Anything else I can try?
  Thanks!
 



 --
 Regards,
 Bharath .V
 w:http://research.iiit.ac.in/~bharath.v



[jira] [Commented] (HIVE-2050) batch processing partition pruning process

2011-03-22 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010001#comment-13010001
 ] 

Ning Zhang commented on HIVE-2050:
--

Note that this patch implements a simple API that passes a list of partition 
names rather than a range of partition names. My performance testing indicates 
that bottleneck is not in the JDO query itself. The JDO queries that getting 
the list of all MPartitions takes about 5 secs for a list of 20k partitions. 
However converting these 20k MPartitions to Partitions took about 3 mins. 
Committing the transaction took another 3 mins. 

Note that converting MPartitions to Partitions and committing transactions are 
common operations. Even though we use JDO pushdown (HIVE-2048) or use range 
queries, these costs are still there. We need to optimize these costs away in 
the next step. 

 batch processing partition pruning process
 --

 Key: HIVE-2050
 URL: https://issues.apache.org/jira/browse/HIVE-2050
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2050.patch


 For partition predicates that cannot be pushed down to JDO filtering 
 (HIVE-2049), we should fall back to the old approach of listing all partition 
 names first and use Hive's expression evaluation engine to select the correct 
 partitions. Then the partition pruner should hand Hive a list of partition 
 names and return a list of Partition Object (this should be added to the Hive 
 API). 
 A possible optimization is that the the partition pruner should give Hive a 
 set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
 the JDO query should be formulated as range queries. Range queries are 
 possible because the first step list all partition names in sorted order. 
 It's easy to come up with a range and it is guaranteed that the JDO range 
 query results should be equivalent to the query with a list of partition 
 names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-trunk-h0.20 #634

2011-03-22 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/634/changes

Changes:

[namit] HIVE-2003 LOAD compilation does not set the outputs during semantic 
analysis resulting
  in no authorization checks being done for it (Krishna Kumar via namit)

--
[...truncated 34300 lines...]
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-22_22-31-53_647_1696713168865305108/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=number
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=number
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=number
[junit] Job running in-process (local Hadoop)
[junit] Hadoop job information for null: number of mappers: 0; number of 
reducers: 0
[junit] 2011-03-22 22:31:56,771 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-22_22-31-53_647_1696713168865305108/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_20110331_1564066846.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-22_22-31-58_373_6901690555475865206/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-22_22-31-58_373_6901690555475865206/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_20110331_922641636.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit]