Work around for using OR in Joins

2011-03-22 Thread MIS
I want to use OR in the join expression, but it seems only AND is supported
as of now.
I have a work around though to use DeMorgan's law {C1 OR C2 = !(!C1 AND
!C2))} , but it would be nice if somebody can point me to the location in
code base that would need modification to support the OR in the join
expression.

Thanks,
MIS.


Re: Work around for using OR in Joins

2011-03-22 Thread MIS
Found it at  *org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.java* line
no. 1122
There is some concern mentioned that supporting OR would lead to data
explosion. Is it discussed/documneted in a little more detail somewhere ? If
so, some pointers towards the same will be helpful.

Thanks,
MIS.

On Tue, Mar 22, 2011 at 1:19 PM, MIS  wrote:

> I want to use OR in the join expression, but it seems only AND is supported
> as of now.
> I have a work around though to use DeMorgan's law {C1 OR C2 = !(!C1 AND
> !C2))} , but it would be nice if somebody can point me to the location in
> code base that would need modification to support the OR in the join
> expression.
>
> Thanks,
> MIS.
>


[jira] [Commented] (HIVE-2031) Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the l

2011-03-22 Thread Chinna Rao Lalam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009586#comment-13009586
 ] 

Chinna Rao Lalam commented on HIVE-2031:


Updated the patch with test cases.

> Correct the exception message for the better traceability for the scenario 
> load into the partitioned table having 2  partitions by specifying only one 
> partition in the load statement. 
> 
>
> Key: HIVE-2031
> URL: https://issues.apache.org/jira/browse/HIVE-2031
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 0.7.0
> Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
> Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-2031.2.patch, HIVE-2031.patch
>
>
>  Load into the partitioned table having 2 partitions by specifying only one 
> partition in the load statement is failing and logging the following 
> exception message.
> {noformat}
>  org.apache.hadoop.hive.ql.parse.SemanticException: line 1:91 Partition not 
> found '21Oct'
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.(BaseSemanticAnalyzer.java:685)
>   at 
> org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:196)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
>   at 
> org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:151)
>   at 
> org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:764)
>   at 
> org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:742)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:619)
> {noformat}
> This needs to be corrected in such a way what is the actual root cause for 
> this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2042) In error scenario some opened streams may not closed in ExplainTask.java and Throttle.java

2011-03-22 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-2042:
--

   Resolution: Fixed
Fix Version/s: 0.8.0
   Status: Resolved  (was: Patch Available)

I just committed this. Thanks Chinna.

> In error scenario some opened streams may not closed in ExplainTask.java and 
> Throttle.java
> --
>
> Key: HIVE-2042
> URL: https://issues.apache.org/jira/browse/HIVE-2042
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
> Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
> Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
> Fix For: 0.8.0
>
> Attachments: HIVE-2042.2.Patch, HIVE-2042.Patch
>
>
> 1) In error scenario PrintStream may not be closed in execute() of  
> ExplainTask.java
> 2) In error scenario InputStream may not be closed in checkJobTracker() of 
> Throttle.java 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2042) In error scenario some opened streams may not closed

2011-03-22 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-2042:
--

Summary: In error scenario some opened streams may not closed  (was: In 
error scenario some opened streams may not closed in ExplainTask.java and 
Throttle.java)

> In error scenario some opened streams may not closed
> 
>
> Key: HIVE-2042
> URL: https://issues.apache.org/jira/browse/HIVE-2042
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
> Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
> Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
> Fix For: 0.8.0
>
> Attachments: HIVE-2042.2.Patch, HIVE-2042.Patch
>
>
> 1) In error scenario PrintStream may not be closed in execute() of  
> ExplainTask.java
> 2) In error scenario InputStream may not be closed in checkJobTracker() of 
> Throttle.java 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2063) jdbc return only 1 collumn

2011-03-22 Thread Alexey Diomin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009623#comment-13009623
 ] 

Alexey Diomin commented on HIVE-2063:
-

wait
bug very interesting

reproducing on hadoop-0.20.2, but

on cloudera CDH3B4 bug not reproducing and apply patch break correct parsing 
input row,
as delimiter in input row is have code '1' (default)



> jdbc return only 1 collumn
> --
>
> Key: HIVE-2063
> URL: https://issues.apache.org/jira/browse/HIVE-2063
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 0.7.0
>Reporter: Alexey Diomin
>Assignee: Alexey Diomin
>Priority: Critical
> Attachments: HIVE-2063.patch, HIVE-2063.patch
>
>
> we not set separator for data and all data return in first columns and all 
> other fields set NULL
> addition we get WARNING: Missing fields! Expected 27 fields but only got 1! 
> Ignoring similar problems.
> it's regresion after HIVE-1378 
> bug:
> use delimiter '\t' for fields on server side
> use default delimiter with code '1' on client side

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1538) FilterOperator is applied twice with ppd on.

2011-03-22 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-1538:
--

Attachment: patch-1538-2.txt

Added configuration hive.ppd.remove.duplicatefilters, with default value of 
true. Updated ppd tests to run with both configuration off and on.

> FilterOperator is applied twice with ppd on.
> 
>
> Key: HIVE-1538
> URL: https://issues.apache.org/jira/browse/HIVE-1538
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1538-1.txt, patch-1538-2.txt, patch-1538.txt
>
>
> With hive.optimize.ppd set to true, FilterOperator is applied twice. And it 
> seems second operator is always filtering zero rows.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2031) Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the l

2011-03-22 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009643#comment-13009643
 ] 

Amareshwari Sriramadasu commented on HIVE-2031:
---

+1. Will commit if tests pass.

> Correct the exception message for the better traceability for the scenario 
> load into the partitioned table having 2  partitions by specifying only one 
> partition in the load statement. 
> 
>
> Key: HIVE-2031
> URL: https://issues.apache.org/jira/browse/HIVE-2031
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 0.7.0
> Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
> Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-2031.2.patch, HIVE-2031.patch
>
>
>  Load into the partitioned table having 2 partitions by specifying only one 
> partition in the load statement is failing and logging the following 
> exception message.
> {noformat}
>  org.apache.hadoop.hive.ql.parse.SemanticException: line 1:91 Partition not 
> found '21Oct'
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.(BaseSemanticAnalyzer.java:685)
>   at 
> org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:196)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
>   at 
> org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:151)
>   at 
> org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:764)
>   at 
> org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:742)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:619)
> {noformat}
> This needs to be corrected in such a way what is the actual root cause for 
> this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2049) Push down partition pruning to JDO filtering for a subset of partition predicates

2011-03-22 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-2049:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Ning

> Push down partition pruning to JDO filtering for a subset of partition 
> predicates
> -
>
> Key: HIVE-2049
> URL: https://issues.apache.org/jira/browse/HIVE-2049
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2049.2.patch, HIVE-2049.3.patch, HIVE-2049.4.patch, 
> HIVE-2049.patch
>
>
> Several tasks:
>   - expose HiveMetaStoreClient.listPartitionsByFilter() to Hive.java so that 
> PartitionPruner can use that for certain partition predicates. 
>   - figure out a safe subset of partition predicates that can be pushed down 
> to JDO filtering. 
> By my initial testing for the 2nd part is equality queries with AND/OR can be 
> pushed down and return correct results. However range queries on partition 
> columns gave NPE by the JDO execute() function. This might be a bug in the 
> JDO query string itself, but we need to figure it out and heavily test all 
> cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-2050) batch processing partition pruning process

2011-03-22 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang reassigned HIVE-2050:


Assignee: Ning Zhang

> batch processing partition pruning process
> --
>
> Key: HIVE-2050
> URL: https://issues.apache.org/jira/browse/HIVE-2050
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
>Assignee: Ning Zhang
>
> For partition predicates that cannot be pushed down to JDO filtering 
> (HIVE-2049), we should fall back to the old approach of listing all partition 
> names first and use Hive's expression evaluation engine to select the correct 
> partitions. Then the partition pruner should hand Hive a list of partition 
> names and return a list of Partition Object (this should be added to the Hive 
> API). 
> A possible optimization is that the the partition pruner should give Hive a 
> set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
> the JDO query should be formulated as range queries. Range queries are 
> possible because the first step list all partition names in sorted order. 
> It's easy to come up with a range and it is guaranteed that the JDO range 
> query results should be equivalent to the query with a list of partition 
> names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1983) Bundle Log4j configuration files in Hive JARs

2011-03-22 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1983:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Carl

> Bundle Log4j configuration files in Hive JARs
> -
>
> Key: HIVE-1983
> URL: https://issues.apache.org/jira/browse/HIVE-1983
> Project: Hive
>  Issue Type: Sub-task
>  Components: Configuration
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Attachments: HIVE-1983.1.patch.txt
>
>
> Splitting this off as a subtask so that it can be resolved independently of 
> the hive-default.xml issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.

2011-03-22 Thread Krishna Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2003:


Status: Patch Available  (was: Open)

Please review asap as there are lots of changes to q.out files and any delay 
may cause another conflict/resolution cycle.

> LOAD compilation does not set the outputs during semantic analysis resulting 
> in no authorization checks being done for it.
> --
>
> Key: HIVE-2003
> URL: https://issues.apache.org/jira/browse/HIVE-2003
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2003.patch.1.txt, HIVE-2003.patch.txt
>
>
> The table/partition being loaded is not being added to outputs in the 
> LoadSemanticAnalyzer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.

2011-03-22 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009739#comment-13009739
 ] 

Namit Jain commented on HIVE-2003:
--

I will take a look right away

> LOAD compilation does not set the outputs during semantic analysis resulting 
> in no authorization checks being done for it.
> --
>
> Key: HIVE-2003
> URL: https://issues.apache.org/jira/browse/HIVE-2003
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2003.patch.1.txt, HIVE-2003.patch.txt
>
>
> The table/partition being loaded is not being added to outputs in the 
> LoadSemanticAnalyzer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.

2011-03-22 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009772#comment-13009772
 ] 

Namit Jain commented on HIVE-2003:
--

   Instead of adding a new configuration parameter which is being checked in
   EnforceReadOnlyTables, it might be easier to remove EnforceReadOnlyTables
   from the hive.exec.pre.hooks at creation time. But, this can be done in a
   follow-up also (if other things look good).

Will commit if tests pass, please file a follow-up jira for the cleanup 
mentioned
above.

> LOAD compilation does not set the outputs during semantic analysis resulting 
> in no authorization checks being done for it.
> --
>
> Key: HIVE-2003
> URL: https://issues.apache.org/jira/browse/HIVE-2003
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2003.patch.1.txt, HIVE-2003.patch.txt
>
>
> The table/partition being loaded is not being added to outputs in the 
> LoadSemanticAnalyzer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.7.0-h0.20 #50

2011-03-22 Thread Apache Hudson Server
See 

--
[...truncated 27029 lines...]
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] Copying file: 

[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src
[junit] PREHOOK: type: LOAD
[junit] Copying data from 

[junit] Loading data to table default.src
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src1
[junit] PREHOOK: type: LOAD
[junit] Copying data from 

[junit] Copying file: 

[junit] Loading data to table default.src1
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src1
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src1
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src_sequencefile
[junit] PREHOOK: type: LOAD
[junit] Copying data from 

[junit] Copying file: 

[junit] Loading data to table default.src_sequencefile
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src_sequencefile
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_sequencefile
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src_thrift
[junit] PREHOOK: type: LOAD
[junit] Copying data from 

[junit] Copying file: 

[junit] Loading data to table default.src_thrift
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src_thrift
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_thrift
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src_json
[junit] PREHOOK: type: LOAD
[junit] Copying data from 

[junit] Copying file: 

[junit] Loading data to table default.src_json
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src_json
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_json
[junit] OK
[junit] diff 

 

[junit] Hive history 
file=
[junit] Done query: wrong_distinct1.q
[junit] Begin query: wrong_distinct2.q
[junit] Hive history 
file=
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'
 OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-08',hr='11')
[junit] PREHOOK: type: LOAD
[junit] Copying data from 

[junit] Copying file: 

[junit] Loading data to table default.srcpart partition (ds=200

Build failed in Jenkins: Hive-trunk-h0.20 #632

2011-03-22 Thread Apache Hudson Server
See 

--
[...truncated 28061 lines...]
[junit] POSTHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-22_12-11-03_852_331101495470767803/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=
[junit] Job running in-process (local Hadoop)
[junit] Hadoop job information for null: number of mappers: 0; number of 
reducers: 0
[junit] 2011-03-22 12:11:06,951 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-22_12-11-03_852_331101495470767803/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] Copying data from 

[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-22_12-11-08_486_3558028685216045491/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-22_12-11-08_486_3558028685216045491/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] P

[jira] [Created] (HIVE-2069) NullPointerException on getSchemas

2011-03-22 Thread Bennie Schut (JIRA)
NullPointerException on getSchemas
--

 Key: HIVE-2069
 URL: https://issues.apache.org/jira/browse/HIVE-2069
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.8.0
Reporter: Bennie Schut
Assignee: Bennie Schut
 Fix For: 0.8.0


Calling getSchemas will cause a nullpointerexception

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2069) NullPointerException on getSchemas

2011-03-22 Thread Bennie Schut (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009825#comment-13009825
 ] 

Bennie Schut commented on HIVE-2069:


java.lang.NullPointerException
at java.util.ArrayList.(ArrayList.java:131)
at 
org.apache.hadoop.hive.jdbc.HiveMetaDataResultSet.(HiveMetaDataResultSet.java:32)
at 
org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData$3.(HiveDatabaseMetaData.java:481)
at 
org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.getSchemas(HiveDatabaseMetaData.java:480)
at 
org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.getSchemas(HiveDatabaseMetaData.java:475)
at 
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetSchemas(TestJdbcDriver.java:488)

Probably introduced on HIVE-1126. getCatalogs works correctly but this wasn't 
tested.

> NullPointerException on getSchemas
> --
>
> Key: HIVE-2069
> URL: https://issues.apache.org/jira/browse/HIVE-2069
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 0.8.0
>Reporter: Bennie Schut
>Assignee: Bennie Schut
> Fix For: 0.8.0
>
>
> Calling getSchemas will cause a nullpointerexception

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2069) NullPointerException on getSchemas

2011-03-22 Thread Bennie Schut (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bennie Schut updated HIVE-2069:
---

Attachment: HIVE-2069.1.patch.txt

This patch includes a fix and a test which can be used to reproduce the 
nullpointer.

> NullPointerException on getSchemas
> --
>
> Key: HIVE-2069
> URL: https://issues.apache.org/jira/browse/HIVE-2069
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 0.8.0
>Reporter: Bennie Schut
>Assignee: Bennie Schut
> Fix For: 0.8.0
>
> Attachments: HIVE-2069.1.patch.txt
>
>
> Calling getSchemas will cause a nullpointerexception

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2069) NullPointerException on getSchemas

2011-03-22 Thread Bennie Schut (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bennie Schut updated HIVE-2069:
---

Release Note: Fix for NullPointerException on the jdbc driver on getSchemas
  Status: Patch Available  (was: Open)

> NullPointerException on getSchemas
> --
>
> Key: HIVE-2069
> URL: https://issues.apache.org/jira/browse/HIVE-2069
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 0.8.0
>Reporter: Bennie Schut
>Assignee: Bennie Schut
> Fix For: 0.8.0
>
> Attachments: HIVE-2069.1.patch.txt
>
>
> Calling getSchemas will cause a nullpointerexception

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2069) NullPointerException on getSchemas

2011-03-22 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009841#comment-13009841
 ] 

Ning Zhang commented on HIVE-2069:
--

+1. will commit if tests pass. 

> NullPointerException on getSchemas
> --
>
> Key: HIVE-2069
> URL: https://issues.apache.org/jira/browse/HIVE-2069
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 0.8.0
>Reporter: Bennie Schut
>Assignee: Bennie Schut
> Fix For: 0.8.0
>
> Attachments: HIVE-2069.1.patch.txt
>
>
> Calling getSchemas will cause a nullpointerexception

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-2070) SHOW GRANT grantTime field should be a human-readable timestamp

2011-03-22 Thread Jonathan Natkins (JIRA)
SHOW GRANT grantTime field should be a human-readable timestamp
---

 Key: HIVE-2070
 URL: https://issues.apache.org/jira/browse/HIVE-2070
 Project: Hive
  Issue Type: Improvement
Reporter: Jonathan Natkins


Unix timestamps are not super useful when trying to interpret metadata

hive> show grant user foo on table bar;
databasedefault 
table   bar 
principalName   foo 
principalType   USER
privilege   Select  
grantTime   1300828549  
grantor natty   


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-trunk-h0.20 #633

2011-03-22 Thread Apache Hudson Server
See 

Changes:

[namit] HIVE-1983 Bundle Log4j configuration files in Hive JARs
(Carl Steinbach via namit)

[namit] HIVE-2049 Push down partition pruning to JDO filtering for a subset
  of partition predicates (Ning Zhang via namit)

--
[...truncated 28056 lines...]
[junit] POSTHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-22_15-36-43_835_7815733213958374623/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=
[junit] Job running in-process (local Hadoop)
[junit] Hadoop job information for null: number of mappers: 0; number of 
reducers: 0
[junit] 2011-03-22 15:36:47,016 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-22_15-36-43_835_7815733213958374623/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] Copying data from 

[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-22_15-36-48_513_5063512059106812506/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-22_15-36-48_513_5063512059106812506/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
  

[jira] [Updated] (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.

2011-03-22 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-2003:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Krishna

> LOAD compilation does not set the outputs during semantic analysis resulting 
> in no authorization checks being done for it.
> --
>
> Key: HIVE-2003
> URL: https://issues.apache.org/jira/browse/HIVE-2003
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2003.patch.1.txt, HIVE-2003.patch.txt
>
>
> The table/partition being loaded is not being added to outputs in the 
> LoadSemanticAnalyzer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: HIVE-2050. batch processing partition pruning process

2011-03-22 Thread Ning Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/522/
---

Review request for hive.


Summary
---

Introducing a new metastore API to retrieve a list of partitions in batch. 


Diffs
-

  trunk/metastore/if/hive_metastore.thrift 1084243 
  trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 1084243 
  trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 1084243 
  
trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp 
1084243 
  
trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java
 1084243 
  trunk/metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php 
1084243 
  
trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote 
1084243 
  trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 
1084243 
  trunk/metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 1084243 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
1084243 
  
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
 1084243 
  
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
1084243 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
1084243 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
1084243 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1084243 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartExprEvalUtils.java
 1084243 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
1084243 

Diff: https://reviews.apache.org/r/522/diff


Testing
---


Thanks,

Ning



[jira] [Updated] (HIVE-2050) batch processing partition pruning process

2011-03-22 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2050:
-

Status: Patch Available  (was: Open)

> batch processing partition pruning process
> --
>
> Key: HIVE-2050
> URL: https://issues.apache.org/jira/browse/HIVE-2050
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2050.patch
>
>
> For partition predicates that cannot be pushed down to JDO filtering 
> (HIVE-2049), we should fall back to the old approach of listing all partition 
> names first and use Hive's expression evaluation engine to select the correct 
> partitions. Then the partition pruner should hand Hive a list of partition 
> names and return a list of Partition Object (this should be added to the Hive 
> API). 
> A possible optimization is that the the partition pruner should give Hive a 
> set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
> the JDO query should be formulated as range queries. Range queries are 
> possible because the first step list all partition names in sorted order. 
> It's easy to come up with a range and it is guaranteed that the JDO range 
> query results should be equivalent to the query with a list of partition 
> names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2050) batch processing partition pruning process

2011-03-22 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2050:
-

Attachment: HIVE-2050.patch

Uploading a new patch for review. Still running tests. The review board 
request: https://reviews.apache.org/r/522/

> batch processing partition pruning process
> --
>
> Key: HIVE-2050
> URL: https://issues.apache.org/jira/browse/HIVE-2050
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2050.patch
>
>
> For partition predicates that cannot be pushed down to JDO filtering 
> (HIVE-2049), we should fall back to the old approach of listing all partition 
> names first and use Hive's expression evaluation engine to select the correct 
> partitions. Then the partition pruner should hand Hive a list of partition 
> names and return a list of Partition Object (this should be added to the Hive 
> API). 
> A possible optimization is that the the partition pruner should give Hive a 
> set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
> the JDO query should be formulated as range queries. Range queries are 
> possible because the first step list all partition names in sorted order. 
> It's easy to come up with a range and it is guaranteed that the JDO range 
> query results should be equivalent to the query with a list of partition 
> names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Meanings of privileges

2011-03-22 Thread Jonathan Natkins
Hi all,

I'm trying to understand the meaning of some of the privileges in the
system, and I'm a bit stumped on what some of them actually do.

Privileges that confuse me:
INDEX - my best guess is that this allows me to create/drop indexes on a
table?  Is it the case that if I have select access on a table, I can use
any index that exists on a table?
LOCK - Presumably this allows users to lock or unlock a table, so maybe a
better question is: are these locks like mutexes, where only I can access
the table, or is this literally locking down the table, so it can't be
modified in any way?
SHOW_DATABASE - I'm not sure what the scope of this one is: if I don't have
show_database access, can I not use the show database command? Or does this
extend to not being able to see the tables within a database?

It seems like you can grant some privileges on objects that don't have a lot
of meaning, i.e. create access on a table doesn't seem to have a lot of
semantic value, unless Hive requires that permission to create indexes on a
table, or something along those lines.  Similarly, I'm having a hard time
rationalizing why I can grant SHOW_DATABASE on a table.

Thanks a lot,
Jon


Re: Meanings of privileges

2011-03-22 Thread yongqiang he
>>INDEX - my best guess is that this allows me to create/drop indexes on a
table?
Yes. It is there for this purpose.

>> Is it the case that if I have select access on a table, I can use
any index that exists on a table?
No. index is also a table now, so you need to have access to both of them.

>>LOCK - Presumably this allows users to lock or unlock a table, so maybe a
better question is: are these locks like mutexes, where only I can access
the table, or is this literally locking down the table, so it can't be
modified in any way?

Yes. If only you have lock privilege on this table, and concurrency is
enabled, no one will be able to run anything against the table.

>>SHOW_DATABASE - I'm not sure what the scope of this one is: if I don't have
show_database access, can I not use the show database command?

if you don't have show_database access, you should not be able to use
the show database command. I do not think today this privilege is
supported.

>> create access on a table doesn't seem to have a lot of semantic value
i think create on a table means create partition

>>Similarly, I'm having a hard time rationalizing why I can grant SHOW_DATABASE 
>>on a table.
This should be a bug. Basically each privilege has its set of scope,
(can apply to db level or table level or column or user level,
non-exclusive)

Thanks
Yongqiang
On Tue, Mar 22, 2011 at 6:30 PM, Jonathan Natkins  wrote:
> Hi all,
>
> I'm trying to understand the meaning of some of the privileges in the
> system, and I'm a bit stumped on what some of them actually do.
>
> Privileges that confuse me:
> INDEX - my best guess is that this allows me to create/drop indexes on a
> table?  Is it the case that if I have select access on a table, I can use
> any index that exists on a table?
> LOCK - Presumably this allows users to lock or unlock a table, so maybe a
> better question is: are these locks like mutexes, where only I can access
> the table, or is this literally locking down the table, so it can't be
> modified in any way?
> SHOW_DATABASE - I'm not sure what the scope of this one is: if I don't have
> show_database access, can I not use the show database command? Or does this
> extend to not being able to see the tables within a database?
>
> It seems like you can grant some privileges on objects that don't have a lot
> of meaning, i.e. create access on a table doesn't seem to have a lot of
> semantic value, unless Hive requires that permission to create indexes on a
> table, or something along those lines.  Similarly, I'm having a hard time
> rationalizing why I can grant SHOW_DATABASE on a table.
>
> Thanks a lot,
> Jon
>


[jira] [Commented] (HIVE-2065) RCFile issues

2011-03-22 Thread Krishna Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009981#comment-13009981
 ] 

Krishna Kumar commented on HIVE-2065:
-

Hmm. #3 is taking me a bit too far than I originally thought. I assume being 
able to read an RCFile as SequenceFile is required, while being able to write 
an RCFile via the SequenceFile interface is desirable.

Having made changes so that record length is correctly set, in order to be able 
to make sure that the rcfile is handled correctly as a sequence file, the 
following changes are also required, IIUC.

 - the second field should be the key length (4 + compressed/plain key contents)
 - the key class (KeyBuffer) must be made responsible for reading/writing the 
next field - plain key contents length - as well as compression/decompression 
of the key contents
 - the value class (ValueBuffer) related changes will be trickier. Since the 
value is not compressed as a unit, we can not use record-compressed format. We 
need to mark the records as plain records, and move the codec to a metadata 
entry. Then the valueBuffer class will work correctly with sequencefile 
implementation.

Thoughts? worth it?


> RCFile issues
> -
>
> Key: HIVE-2065
> URL: https://issues.apache.org/jira/browse/HIVE-2065
> Project: Hive
>  Issue Type: Bug
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: Slide1.png, proposal.png
>
>
> Some potential issues with RCFile
> 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
> yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
> as well get rid of the confusing and performance-impacting lock acquisitions.
> 2. Record Length overstated for compressed files. IIUC, the key compression 
> happens after we have written the record length.
> {code}
>   int keyLength = key.getSize();
>   if (keyLength < 0) {
> throw new IOException("negative length keys not allowed: " + key);
>   }
>   out.writeInt(keyLength + valueLength); // total record length
>   out.writeInt(keyLength); // key portion length
>   if (!isCompressed()) {
> out.writeInt(keyLength);
> key.write(out); // key
>   } else {
> keyCompressionBuffer.reset();
> keyDeflateFilter.resetState();
> key.write(keyDeflateOut);
> keyDeflateOut.flush();
> keyDeflateFilter.finish();
> int compressedKeyLen = keyCompressionBuffer.getLength();
> out.writeInt(compressedKeyLen);
> out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
>   }
> {code}
> 3. For sequence file compatibility, the compressed key length should be the 
> next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2065) RCFile issues

2011-03-22 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009982#comment-13009982
 ] 

He Yongqiang commented on HIVE-2065:


if being compatible with sequencefile does not break the rcfile's backward 
compatibility, it should be ok. But even after that, hive still won't be able 
to process it as a sequence file because of hive's serde layer.

> RCFile issues
> -
>
> Key: HIVE-2065
> URL: https://issues.apache.org/jira/browse/HIVE-2065
> Project: Hive
>  Issue Type: Bug
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: Slide1.png, proposal.png
>
>
> Some potential issues with RCFile
> 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
> yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
> as well get rid of the confusing and performance-impacting lock acquisitions.
> 2. Record Length overstated for compressed files. IIUC, the key compression 
> happens after we have written the record length.
> {code}
>   int keyLength = key.getSize();
>   if (keyLength < 0) {
> throw new IOException("negative length keys not allowed: " + key);
>   }
>   out.writeInt(keyLength + valueLength); // total record length
>   out.writeInt(keyLength); // key portion length
>   if (!isCompressed()) {
> out.writeInt(keyLength);
> key.write(out); // key
>   } else {
> keyCompressionBuffer.reset();
> keyDeflateFilter.resetState();
> key.write(keyDeflateOut);
> keyDeflateOut.flush();
> keyDeflateFilter.finish();
> int compressedKeyLen = keyCompressionBuffer.getLength();
> out.writeInt(compressedKeyLen);
> out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
>   }
> {code}
> 3. For sequence file compatibility, the compressed key length should be the 
> next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2050) batch processing partition pruning process

2011-03-22 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009993#comment-13009993
 ] 

Ning Zhang commented on HIVE-2050:
--

passed all unit tests.

> batch processing partition pruning process
> --
>
> Key: HIVE-2050
> URL: https://issues.apache.org/jira/browse/HIVE-2050
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2050.patch
>
>
> For partition predicates that cannot be pushed down to JDO filtering 
> (HIVE-2049), we should fall back to the old approach of listing all partition 
> names first and use Hive's expression evaluation engine to select the correct 
> partitions. Then the partition pruner should hand Hive a list of partition 
> names and return a list of Partition Object (this should be added to the Hive 
> API). 
> A possible optimization is that the the partition pruner should give Hive a 
> set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
> the JDO query should be formulated as range queries. Range queries are 
> possible because the first step list all partition names in sorted order. 
> It's easy to come up with a range and it is guaranteed that the JDO range 
> query results should be equivalent to the query with a list of partition 
> names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Work around for using OR in Joins

2011-03-22 Thread Ning Zhang
Joins with OR conditions are not supported by Hive currently. I think even 
though you rewrite the condition to use NOT and AND only, the results may be 
wrong. 
 
It is quite hard to implement joins of any tables with OR conditions in a 
MapReduce framework. it is straightforward to implement it in nested-loop join, 
but due to the nature of distributed processing, nested loop join cannot be 
implemented in an efficient and scalable way in MapReduce. In nested-loop join, 
each mapper need to join a split of LHS table with the whole RHS table which 
could be terabytes. 

The regular (reduce-side) join in Hive is essentially a sort-merge join 
operator. With that in mind, it's hard to implement OR conditions in the 
sort-merge join. 

One exception is the map-side join, which assumes the RHS table is small and 
will be read fully into each mapper. Currently map-side join in Hive is a 
hash-based join operator. You can implement a nested-loop map-side join 
operator to enable any join conditions including OR. 

On Mar 22, 2011, at 1:39 AM, MIS wrote:

> Found it at  *org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.java* line
> no. 1122
> There is some concern mentioned that supporting OR would lead to data
> explosion. Is it discussed/documneted in a little more detail somewhere ? If
> so, some pointers towards the same will be helpful.
> 
> Thanks,
> MIS.
> 
> On Tue, Mar 22, 2011 at 1:19 PM, MIS  wrote:
> 
>> I want to use OR in the join expression, but it seems only AND is supported
>> as of now.
>> I have a work around though to use DeMorgan's law {C1 OR C2 = !(!C1 AND
>> !C2))} , but it would be nice if somebody can point me to the location in
>> code base that would need modification to support the OR in the join
>> expression.
>> 
>> Thanks,
>> MIS.
>> 



Bug in using columns with leading underscores in subqueries

2011-03-22 Thread Marquis Wang
Hi,

I believe I've found a bug in the semantic analyzer (or maybe something else?). 
It occurs when using a column with a leading underscore in a subquery.

> create table temp (`_col` int, key int);
> select key from temp;
> select `_col` from temp;
> select key from (select key from temp) t;

The above queries all work fine. 

> select `_col` from (select `_col` from temp) t;
> 

This query fails with "FAILED: Error in semantic analysis: line 1:7 Invalid 
Table Alias or Column Reference `_col`"

The following query works in lieu of the above.

> select col as `_col` from (select `_col` as col from temp) t;
> 

Thanks,
Marquis Wang
HMC Computer Science '11






Review Request: HIVE-2069: NullPointerException on getSchemas

2011-03-22 Thread Bennie Schut

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/521/
---

Review request for hive.


Summary
---

HIVE-2069: NullPointerException on getSchemas


This addresses bug HIVE-2069.
https://issues.apache.org/jira/browse/HIVE-2069


Diffs
-

  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveMetaDataResultSet.java 
1083926 
  trunk/jdbc/src/test/org/apache/hadoop/hive/jdbc/TestJdbcDriver.java 1083926 

Diff: https://reviews.apache.org/r/521/diff


Testing
---


Thanks,

Bennie



[jira] [Created] (HIVE-2071) enforcereadonlytables hook should not check a configuration variable

2011-03-22 Thread Namit Jain (JIRA)
enforcereadonlytables hook should not check a configuration variable


 Key: HIVE-2071
 URL: https://issues.apache.org/jira/browse/HIVE-2071
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Krishna Kumar


Instead of adding a new configuration parameter which is being checked in
EnforceReadOnlyTables, it might be easier to remove EnforceReadOnlyTables
from the hive.exec.pre.hooks at creation time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-1803: Implement bitmap indexing in Hive (new review starting from patch 6)

2011-03-22 Thread Marquis Wang


> On None, John Sichi wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/index/compact/HiveCompactIndexInputFormat.java,
> >  line 45
> > 
> >
> > I'm confused about how the backwards compatibility works for the index 
> > filename property...who uses this property name?

The property name is set on the command line when the index query is run (see 
the index_compact.q tests). This String is how the class knows where the index 
filename is stored.


> On None, John Sichi wrote:
> > ql/build.xml, line 187
> > 
> >
> > Why do you need to unpack the .jar?  And why to json/classes?

I was getting "java.lang.NoClassDefFoundError: javaewah/EWAHCompressedBitmap" 
errors at runtime without unpacking it. I guess I forgot to change the 
destination to something else when I copied that line. Is unpacking the .jar 
unnecessary? I'm not really familiar with how ivy(?) handles these libraries.


- Marquis


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/481/#review315
---


On 2011-03-08 16:27:50, John Sichi wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/481/
> ---
> 
> (Updated 2011-03-08 16:27:50)
> 
> 
> Review request for hive.
> 
> 
> Summary
> ---
> 
> Review board was giving me grief trying to update the old patch, so I'm 
> creating a fresh review request for HIVE-1803.6
> 
> 
> This addresses bug HIVE-1803.
> https://issues.apache.org/jira/browse/HIVE-1803
> 
> 
> Diffs
> -
> 
>   lib/README 1c2f0b1 
>   lib/javaewah-0.2.jar PRE-CREATION 
>   ql/build.xml 50c604e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java ba222f3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java ff74f08 
>   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 308d985 
>   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/index/IndexMetadataChangeTask.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/index/IndexMetadataChangeWork.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapObjectInput.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapObjectOutput.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
> 1f01446 
>   
> ql/src/java/org/apache/hadoop/hive/ql/index/compact/HiveCompactIndexInputFormat.java
>  6c320c5 
>   
> ql/src/java/org/apache/hadoop/hive/ql/index/compact/HiveCompactIndexResult.java
>  0c9ccea 
>   
> ql/src/java/org/apache/hadoop/hive/ql/index/compact/IndexMetadataChangeTask.java
>  eac168f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/index/compact/IndexMetadataChangeWork.java
>  26beb4e 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java 
> 391e5de 
>   ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java 77220a1 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java 30714b8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/AbstractGenericUDFEWAHBitmapOp.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEWAHBitmap.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapAnd.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapEmpty.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapOr.java 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/index_bitmap.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/index_bitmap1.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/index_bitmap2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/index_bitmap3.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/index_compact.q 6547a52 
>   ql/src/test/queries/clientpositive/index_compact_1.q 6d59353 
>   ql/src/test/queries/clientpositive/index_compact_2.q 358b5e9 
>   ql/src/test/queries/clientpositive/index_compact_3.q ee8abda 
>   ql/src/test/queries/clientpositive/udf_bitmap_and.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/udf_bitmap_or.q PRE-CREATION 
>   ql/src/test/results/clientpositive/index_bitmap.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/index_bi

Re: Review Request: HIVE-2054: fix for IOException on the jdbc driver on windows.

2011-03-22 Thread Bennie Schut

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/513/
---

(Updated 2011-03-21 12:50:40.422997)


Review request for hive.


Changes
---

New patch because of changes from HIVE-2062


Summary
---

HIVE-2054: fix for IOException on the jdbc driver on windows.


This addresses bug HIVE-2054.
https://issues.apache.org/jira/browse/HIVE-2054


Diffs (updated)
-

  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java 1083914 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HivePreparedStatement.java 
1083914 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveStatement.java 1083914 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/JdbcSessionState.java 1083914 

Diff: https://reviews.apache.org/r/513/diff


Testing
---


Thanks,

Bennie



Review Request: Patch for HIVE-2003: Load analysis should add table/partition to the outputs

2011-03-22 Thread Krishna

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/518/
---

Review request for hive.


Summary
---

Patch for HIVE-2003: Load analysis should add table/partition to the outputs


Diffs
-

  contrib/src/test/results/clientpositive/serde_regex.q.out c8b2dac 
  contrib/src/test/results/clientpositive/serde_s3.q.out 95cc726 
  ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 1ff9ea3 
  ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 892e759 
  ql/src/test/org/apache/hadoop/hive/ql/hooks/EnforceReadOnlyTables.java 
86a6d49 
  ql/src/test/queries/clientnegative/load_exist_part_authfail.q PRE-CREATION 
  ql/src/test/queries/clientnegative/load_nonpart_authfail.q PRE-CREATION 
  ql/src/test/queries/clientnegative/load_part_authfail.q PRE-CREATION 
  ql/src/test/queries/clientnegative/load_part_nospec.q PRE-CREATION 
  ql/src/test/queries/clientpositive/load_exist_part_authsuccess.q PRE-CREATION 
  ql/src/test/queries/clientpositive/load_nonpart_authsuccess.q PRE-CREATION 
  ql/src/test/queries/clientpositive/load_part_authsuccess.q PRE-CREATION 
  ql/src/test/results/clientnegative/exim_00_unsupported_schema.q.out 119510d 
  ql/src/test/results/clientnegative/exim_01_nonpart_over_loaded.q.out 242da6c 
  ql/src/test/results/clientnegative/exim_02_all_part_over_overlap.q.out 
b8b019b 
  ql/src/test/results/clientnegative/exim_03_nonpart_noncompat_colschema.q.out 
420eade 
  ql/src/test/results/clientnegative/exim_04_nonpart_noncompat_colnumber.q.out 
8b89284 
  ql/src/test/results/clientnegative/exim_05_nonpart_noncompat_coltype.q.out 
a07fb62 
  ql/src/test/results/clientnegative/exim_06_nonpart_noncompat_storage.q.out 
c7638d2 
  ql/src/test/results/clientnegative/exim_07_nonpart_noncompat_ifof.q.out 
3062dbe 
  ql/src/test/results/clientnegative/exim_08_nonpart_noncompat_serde.q.out 
f229498 
  ql/src/test/results/clientnegative/exim_09_nonpart_noncompat_serdeparam.q.out 
92c27ad 
  ql/src/test/results/clientnegative/exim_10_nonpart_noncompat_bucketing.q.out 
a98f4f9 
  ql/src/test/results/clientnegative/exim_11_nonpart_noncompat_sorting.q.out 
1fe4b50 
  ql/src/test/results/clientnegative/exim_13_nonnative_import.q.out 4c4297e 
  ql/src/test/results/clientnegative/exim_14_nonpart_part.q.out 04fa808 
  ql/src/test/results/clientnegative/exim_15_part_nonpart.q.out e1c67bb 
  ql/src/test/results/clientnegative/exim_16_part_noncompat_schema.q.out 
2393918 
  ql/src/test/results/clientnegative/exim_17_part_spec_underspec.q.out 7f29cb6 
  ql/src/test/results/clientnegative/exim_18_part_spec_missing.q.out 7f29cb6 
  ql/src/test/results/clientnegative/exim_19_external_over_existing.q.out 
0711b89 
  
ql/src/test/results/clientnegative/exim_20_managed_location_over_existing.q.out 
3ad0ad5 
  ql/src/test/results/clientnegative/exim_21_part_managed_external.q.out 
42c7600 
  ql/src/test/results/clientnegative/exim_23_import_exist_authfail.q.out 
8372910 
  ql/src/test/results/clientnegative/exim_24_import_part_authfail.q.out 0d82700 
  ql/src/test/results/clientnegative/exim_25_import_nonexist_authfail.q.out 
3814e14 
  ql/src/test/results/clientnegative/fetchtask_ioexception.q.out b9dd07c 
  ql/src/test/results/clientnegative/load_exist_part_authfail.q.out 
PRE-CREATION 
  ql/src/test/results/clientnegative/load_nonpart_authfail.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/load_part_authfail.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/load_part_nospec.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/load_wrong_fileformat.q.out 645e143 
  ql/src/test/results/clientnegative/load_wrong_fileformat_rc_seq.q.out 4809d31 
  ql/src/test/results/clientnegative/load_wrong_fileformat_txt_seq.q.out 
9b1ea48 
  ql/src/test/results/clientnegative/protectmode_part2.q.out daaae80 
  ql/src/test/results/clientpositive/alter3.q.out e6e5b49 
  ql/src/test/results/clientpositive/alter_merge.q.out 789ca14 
  ql/src/test/results/clientpositive/alter_merge_stats.q.out 5c9d387 
  ql/src/test/results/clientpositive/auto_join_filters.q.out 167c4b0 
  ql/src/test/results/clientpositive/auto_join_nulls.q.out 4ced637 
  ql/src/test/results/clientpositive/binarysortable_1.q.out a2e540e 
  ql/src/test/results/clientpositive/bucketizedhiveinputformat.q.out cd3489e 
  ql/src/test/results/clientpositive/bucketmapjoin1.q.out da27428 
  ql/src/test/results/clientpositive/bucketmapjoin2.q.out 4aeb731 
  ql/src/test/results/clientpositive/bucketmapjoin3.q.out 1109aae 
  ql/src/test/results/clientpositive/bucketmapjoin4.q.out a45b625 
  ql/src/test/results/clientpositive/bucketmapjoin5.q.out 3858ae0 
  ql/src/test/results/clientpositive/bucketmapjoin_negative.q.out c5b4a9c 
  ql/src/test/results/clientpositive/bucketmapjoin_negative2.q.out b320252 
  ql/src/test/results/clientpositive/count.q.out 0b4032c 
  ql/src/test/results/clientpositive

Re: skew join optimization

2011-03-22 Thread Ted Yu
How about link to http://imageshack.us/ or TinyPic ?

Thanks

On Sun, Mar 20, 2011 at 7:56 AM, Edward Capriolo wrote:

> On Sun, Mar 20, 2011 at 10:30 AM, Ted Yu  wrote:
> > Can someone re-attach the missing figures for that wiki ?
> >
> > Thanks
> >
> > On Sun, Mar 20, 2011 at 7:15 AM, bharath vissapragada
> >  wrote:
> >>
> >> Hi Igor,
> >>
> >> See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the
> >> jira 1642 which automatically converts a normal join into map-join
> >> (Otherwise you can specify the mapjoin hints in the query itself.).
> >> Because your 'S' table is very small , it can be replicated across all
> >> the mappers and the reduce phase can be avoided. This can greatly
> >> reduce the runtime .. (See the results section in the page for
> >> details.).
> >>
> >> Hope this helps.
> >>
> >> Thanks
> >>
> >>
> >> On Sun, Mar 20, 2011 at 6:37 PM, Jov  wrote:
> >> > 2011/3/20 Igor Tatarinov :
> >> >> I have the following join that takes 4.5 hours (with 12 nodes) mostly
> >> >> because of a single reduce task that gets the bulk of the work:
> >> >> SELECT ...
> >> >> FROM T
> >> >> LEFT OUTER JOIN S
> >> >> ON T.timestamp = S.timestamp and T.id = S.id
> >> >> This is a 1:0/1 join so the size of the output is exactly the same as
> >> >> the
> >> >> size of T (500M records). S is actually very small (5K).
> >> >> I've tried:
> >> >> - switching the order of the join conditions
> >> >> - using a different hash function setting (jenkins instead of murmur)
> >> >> - using SET set hive.auto.convert.join = true;
> >> >
> >> > are you sure your query convert to mapjoin? if not,try use explicit
> >> > mapjoin hint.
> >> >
> >> >
> >> >> - using SET hive.optimize.skewjoin = true;
> >> >> but nothing helped :(
> >> >> Anything else I can try?
> >> >> Thanks!
> >> >
> >>
> >>
> >>
> >> --
> >> Regards,
> >> Bharath .V
> >> w:http://research.iiit.ac.in/~bharath.v
> >
> >
>
> The wiki does not allow images, confluence does but we have not moved their
> yet.
>


Re: skew join optimization

2011-03-22 Thread Ted Yu
Can someone re-attach the missing figures for that wiki ?

Thanks

On Sun, Mar 20, 2011 at 7:15 AM, bharath vissapragada <
bharathvissapragada1...@gmail.com> wrote:

> Hi Igor,
>
> See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the
> jira 1642 which automatically converts a normal join into map-join
> (Otherwise you can specify the mapjoin hints in the query itself.).
> Because your 'S' table is very small , it can be replicated across all
> the mappers and the reduce phase can be avoided. This can greatly
> reduce the runtime .. (See the results section in the page for
> details.).
>
> Hope this helps.
>
> Thanks
>
>
> On Sun, Mar 20, 2011 at 6:37 PM, Jov  wrote:
> > 2011/3/20 Igor Tatarinov :
> >> I have the following join that takes 4.5 hours (with 12 nodes) mostly
> >> because of a single reduce task that gets the bulk of the work:
> >> SELECT ...
> >> FROM T
> >> LEFT OUTER JOIN S
> >> ON T.timestamp = S.timestamp and T.id = S.id
> >> This is a 1:0/1 join so the size of the output is exactly the same as
> the
> >> size of T (500M records). S is actually very small (5K).
> >> I've tried:
> >> - switching the order of the join conditions
> >> - using a different hash function setting (jenkins instead of murmur)
> >> - using SET set hive.auto.convert.join = true;
> >
> > are you sure your query convert to mapjoin? if not,try use explicit
> > mapjoin hint.
> >
> >
> >> - using SET hive.optimize.skewjoin = true;
> >> but nothing helped :(
> >> Anything else I can try?
> >> Thanks!
> >
>
>
>
> --
> Regards,
> Bharath .V
> w:http://research.iiit.ac.in/~bharath.v
>


[jira] [Commented] (HIVE-2050) batch processing partition pruning process

2011-03-22 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010001#comment-13010001
 ] 

Ning Zhang commented on HIVE-2050:
--

Note that this patch implements a simple API that passes a list of partition 
names rather than a range of partition names. My performance testing indicates 
that bottleneck is not in the JDO query itself. The JDO queries that getting 
the list of all MPartitions takes about 5 secs for a list of 20k partitions. 
However converting these 20k MPartitions to Partitions took about 3 mins. 
Committing the transaction took another 3 mins. 

Note that converting MPartitions to Partitions and committing transactions are 
common operations. Even though we use JDO pushdown (HIVE-2048) or use range 
queries, these costs are still there. We need to optimize these costs away in 
the next step. 

> batch processing partition pruning process
> --
>
> Key: HIVE-2050
> URL: https://issues.apache.org/jira/browse/HIVE-2050
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2050.patch
>
>
> For partition predicates that cannot be pushed down to JDO filtering 
> (HIVE-2049), we should fall back to the old approach of listing all partition 
> names first and use Hive's expression evaluation engine to select the correct 
> partitions. Then the partition pruner should hand Hive a list of partition 
> names and return a list of Partition Object (this should be added to the Hive 
> API). 
> A possible optimization is that the the partition pruner should give Hive a 
> set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
> the JDO query should be formulated as range queries. Range queries are 
> possible because the first step list all partition names in sorted order. 
> It's easy to come up with a range and it is guaranteed that the JDO range 
> query results should be equivalent to the query with a list of partition 
> names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-trunk-h0.20 #634

2011-03-22 Thread Apache Hudson Server
See 

Changes:

[namit] HIVE-2003 LOAD compilation does not set the outputs during semantic 
analysis resulting
  in no authorization checks being done for it (Krishna Kumar via namit)

--
[...truncated 34300 lines...]
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-22_22-31-53_647_1696713168865305108/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=
[junit] Job running in-process (local Hadoop)
[junit] Hadoop job information for null: number of mappers: 0; number of 
reducers: 0
[junit] 2011-03-22 22:31:56,771 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-22_22-31-53_647_1696713168865305108/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 

[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-22_22-31-58_373_6901690555475865206/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-22_22-31-58_373_6901690555475865206/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK

[jira] [Commented] (HIVE-1434) Cassandra Storage Handler

2011-03-22 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010009#comment-13010009
 ] 

John Sichi commented on HIVE-1434:
--

Hey Ed, I'm out on vacation so just saw this).  A couple of corrections:

* When the HBase handler was originally committed, tests were running fine.  We 
hadn't yet realized the dynamic ports problem because the test machines used by 
committers didn't have a lot of random ports open.  Only recently, those 
machines (at Facebook) started getting some service changes which caused the 
port problem to show up.  So the problem wasn't actually the HBase handler; it 
was that people started seeing test failures and then committing anyway because 
they assumed it was just the HBase test flaking.  Once we finally tracked it 
down, we fixed the dynamic ports problem.  Now that we're aware of the problem, 
it would be a bad idea to repeat it.

* Regarding ivy and HIVE-1235:  when the HBase Handler was committed, HBase and 
its dependencies weren't yet available in ivy.  We got that kicked off and then 
started using it once available.

Keep up the good work with this one; we'll get it in.


> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: cas-handle.tar.gz, cass_handler.diff, hive-1434-1.txt, 
> hive-1434-2-patch.txt, hive-1434-2011-02-26.patch.txt, 
> hive-1434-2011-03-07.patch.txt, hive-1434-2011-03-07.patch.txt, 
> hive-1434-2011-03-14.patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt, 
> hive-1434-5.patch.txt, hive-1434.2011-02-27.diff.txt, 
> hive-cassandra.2011-02-25.txt, hive.diff
>
>
> Add a cassandra storage handler.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1434) Cassandra Storage Handler

2011-03-22 Thread Amr Awadallah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010010#comment-13010010
 ] 

Amr Awadallah commented on HIVE-1434:
-

I am out of office on a business trip this week and will be slower
than usual in responding to emails. If this is urgent then please call
my cell phone (or send an SMS), otherwise I will reply to your email
when I get back.

Thanks for your patience,

-- amr


> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: cas-handle.tar.gz, cass_handler.diff, hive-1434-1.txt, 
> hive-1434-2-patch.txt, hive-1434-2011-02-26.patch.txt, 
> hive-1434-2011-03-07.patch.txt, hive-1434-2011-03-07.patch.txt, 
> hive-1434-2011-03-14.patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt, 
> hive-1434-5.patch.txt, hive-1434.2011-02-27.diff.txt, 
> hive-cassandra.2011-02-25.txt, hive.diff
>
>
> Add a cassandra storage handler.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2031) Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the l

2011-03-22 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010011#comment-13010011
 ] 

Amareshwari Sriramadasu commented on HIVE-2031:
---

Committed. Thanks Chinna.

> Correct the exception message for the better traceability for the scenario 
> load into the partitioned table having 2  partitions by specifying only one 
> partition in the load statement. 
> 
>
> Key: HIVE-2031
> URL: https://issues.apache.org/jira/browse/HIVE-2031
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 0.7.0
> Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
> Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-2031.2.patch, HIVE-2031.patch
>
>
>  Load into the partitioned table having 2 partitions by specifying only one 
> partition in the load statement is failing and logging the following 
> exception message.
> {noformat}
>  org.apache.hadoop.hive.ql.parse.SemanticException: line 1:91 Partition not 
> found '21Oct'
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.(BaseSemanticAnalyzer.java:685)
>   at 
> org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:196)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
>   at 
> org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:151)
>   at 
> org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:764)
>   at 
> org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:742)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:619)
> {noformat}
> This needs to be corrected in such a way what is the actual root cause for 
> this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1644) use filter pushdown for automatically accessing indexes

2011-03-22 Thread Russell Melick (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Melick updated HIVE-1644:
-

Attachment: HIVE-1644.10.patch

Yongqiang, I think I fixed the double processing by using a different regular 
expression.  I use FIL%SEL% instead of just FIL%.  I have also fixed the other 
comments.  After you look over it, I think it's ready for a reviewboard.

> use filter pushdown for automatically accessing indexes
> ---
>
> Key: HIVE-1644
> URL: https://issues.apache.org/jira/browse/HIVE-1644
> Project: Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Russell Melick
> Attachments: HIVE-1644.1.patch, HIVE-1644.10.patch, 
> HIVE-1644.2.patch, HIVE-1644.3.patch, HIVE-1644.4.patch, HIVE-1644.5.patch, 
> HIVE-1644.6.patch, HIVE-1644.7.patch, HIVE-1644.8.patch, HIVE-1644.9.patch
>
>
> HIVE-1226 provides utilities for analyzing filters which have been pushed 
> down to a table scan.  The next step is to use these for selecting available 
> indexes and generating access plans for those indexes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira