Query regarding Metastore(Derby) SDS table data

2011-02-03 Thread Chinna
Hi all,

 While creating hive table using create table command, the code flow will
also insert the HDFS location in SDS table present in Derby

For Example on executing:-
create table sample(rate int) stored as textfile;


The SDS table (meta table present in Derby) contains following entry
corresponding to Hive table sample like
http://{IP}:9001/user/hive/warehouse/sample  

say 9001 is the port configured,
and {HDFS-URL}/user/hive/warehouse is the warehouse configured.

Now here I'm interested to know the intention of maintaining full path of
the HDFS location in SDS Table, say why not a relative path, anyhow Hive is
capable of constructing full HDFS URL Path.

Any design inputs on it, will surely help.

Thanks,
Chinna Rao Lalam





 

***

This e-mail and attachments contain confidential information
from HUAWEI, which is intended only for the person or entity whose address
is listed above. Any use of the information contained herein in any way
(including, but not limited to, total or partial disclosure, reproduction,
or dissemination) by persons other than the intended recipient's) is
prohibited. If you receive this e-mail in error, please notify the sender by
phone or email immediately and delete it!




[jira] Updated: (HIVE-1716) make TestHBaseCliDriver use dynamic ports to avoid conflicts with already-running services

2011-02-03 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1716:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks John

 make TestHBaseCliDriver use dynamic ports to avoid conflicts with 
 already-running services
 --

 Key: HIVE-1716
 URL: https://issues.apache.org/jira/browse/HIVE-1716
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.7.0
Reporter: Ning Zhang
Assignee: John Sichi
 Fix For: 0.7.0

 Attachments: HIVE-1716.1.patch


 ant test -Dhadoop.version=0.20.0 -Dtestcase=TestHBaseCliDriver:
  
[junit] org.apache.hadoop.hbase.client.NoServerForRegionException: Timed 
 out trying to locate root region
 [junit] at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:976)
 [junit] at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:625)
 [junit] at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:607)
 [junit] at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:738)
 [junit] at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:634)
 [junit] at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601)
 [junit] at 
 org.apache.hadoop.hbase.client.HTable.init(HTable.java:128)
 [junit] at 
 org.apache.hadoop.hive.hbase.HBaseTestSetup.setUpFixtures(HBaseTestSetup.java:87)
 [junit] at 
 org.apache.hadoop.hive.hbase.HBaseTestSetup.preTest(HBaseTestSetup.java:59)
 [junit] at 
 org.apache.hadoop.hive.hbase.HBaseQTestUtil.init(HBaseQTestUtil.java:31)
 [junit] at 
 org.apache.hadoop.hive.cli.TestHBaseCliDriver.setUp(TestHBaseCliDriver.java:43)
 [junit] at junit.framework.TestCase.runBare(TestCase.java:125)
 [junit] at junit.framework.TestResult$1.protect(TestResult.java:106)
 [junit] at 
 junit.framework.TestResult.runProtected(TestResult.java:124)
 [junit] at junit.framework.TestResult.run(TestResult.java:109)
 [junit] at junit.framework.TestCase.run(TestCase.java:118)
 [junit] at junit.framework.TestSuite.runTest(TestSuite.java:208)
 [junit] at junit.framework.TestSuite.run(TestSuite.java:203)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (HIVE-1950) Block merge for RCFile

2011-02-03 Thread He Yongqiang (JIRA)
Block merge for RCFile
--

 Key: HIVE-1950
 URL: https://issues.apache.org/jira/browse/HIVE-1950
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: He Yongqiang


In our env, there are a lot of small files inside one partition/table. In order 
to reduce the namenode load, we have one dedicated housekeeping job running to 
merge these file. Right now the merge is an 'insert overwrite' in hive, and 
requires decompress the data and compress it. This jira is to add a command in 
Hive to do the merge without decompress and recompress the data.

Something like alter table tbl_name [partition ()] merge files. In this jira 
the new command will only support RCFile, since there need some new APIs to the 
fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Build failed in Hudson: Hive-trunk-h0.20 #530

2011-02-03 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/530/

--
[...truncated 22570 lines...]
[junit] POSTHOOK: Output: default@srcbucket
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt'
 INTO TABLE srcbucket
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt
[junit] Loading data to table srcbucket
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt'
 INTO TABLE srcbucket
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket
[junit] OK
[junit] PREHOOK: query: CREATE TABLE srcbucket2(key int, value string) 
CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: CREATE TABLE srcbucket2(key int, value string) 
CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 INTO TABLE src
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table src
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 INTO TABLE src
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt'
 INTO TABLE src1
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt
[junit] Loading data to table src1
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt'
 INTO TABLE src1
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src1
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 

Re: Query regarding Metastore(Derby) SDS table data

2011-02-03 Thread Namit Jain
Different partitions can have different paths -
A partition's path need not be a sub-direcotry of the table's path.
In facebook, we use this regularly, specially for external tables.

So, it simplifies things if the full path is stored for the partition in
the metastore.


Thanks,
-namit


On 2/3/11 2:22 AM, Chinna chinna...@huawei.com wrote:

Hi all,

 While creating hive table using create table command, the code flow will
also insert the HDFS location in SDS table present in Derby

For Example on executing:-
create table sample(rate int) stored as textfile;


The SDS table (meta table present in Derby) contains following entry
corresponding to Hive table sample like
http://{IP}:9001/user/hive/warehouse/sample

say 9001 is the port configured,
and {HDFS-URL}/user/hive/warehouse is the warehouse configured.

Now here I'm interested to know the intention of maintaining full path of
the HDFS location in SDS Table, say why not a relative path, anyhow Hive
is
capable of constructing full HDFS URL Path.

Any design inputs on it, will surely help.

Thanks,
Chinna Rao Lalam





 
**
**
***

This e-mail and attachments contain confidential information
from HUAWEI, which is intended only for the person or entity whose address
is listed above. Any use of the information contained herein in any way
(including, but not limited to, total or partial disclosure, reproduction,
or dissemination) by persons other than the intended recipient's) is
prohibited. If you receive this e-mail in error, please notify the sender
by
phone or email immediately and delete it!





[jira] Updated: (HIVE-1952) fix some outputs and make some tests deterministic

2011-02-03 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1952:
-

Attachment: hive.1952.1.patch

 fix some outputs and make some tests deterministic
 --

 Key: HIVE-1952
 URL: https://issues.apache.org/jira/browse/HIVE-1952
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.1952.1.patch


 Some of the tests are un-deterministic, and are causing intermediate diffs

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HIVE-1952) fix some outputs and make some tests deterministic

2011-02-03 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1952:
-

Status: Patch Available  (was: Open)

 fix some outputs and make some tests deterministic
 --

 Key: HIVE-1952
 URL: https://issues.apache.org/jira/browse/HIVE-1952
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.1952.1.patch


 Some of the tests are un-deterministic, and are causing intermediate diffs

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (HIVE-1952) fix some outputs and make some tests deterministic

2011-02-03 Thread Namit Jain (JIRA)
fix some outputs and make some tests deterministic
--

 Key: HIVE-1952
 URL: https://issues.apache.org/jira/browse/HIVE-1952
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.1952.1.patch

Some of the tests are un-deterministic, and are causing intermediate diffs

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1952) fix some outputs and make some tests deterministic

2011-02-03 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990256#comment-12990256
 ] 

He Yongqiang commented on HIVE-1952:


+1, running tests.

 fix some outputs and make some tests deterministic
 --

 Key: HIVE-1952
 URL: https://issues.apache.org/jira/browse/HIVE-1952
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.1952.1.patch


 Some of the tests are un-deterministic, and are causing intermediate diffs

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HIVE-1939) Fix test failure in TestContribCliDriver/url_hook.q

2011-02-03 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1939:
-

Fix Version/s: 0.7.0

 Fix test failure in TestContribCliDriver/url_hook.q
 ---

 Key: HIVE-1939
 URL: https://issues.apache.org/jira/browse/HIVE-1939
 Project: Hive
  Issue Type: Bug
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.7.0




-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1939) Fix test failure in TestContribCliDriver/url_hook.q

2011-02-03 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990293#comment-12990293
 ] 

John Sichi commented on HIVE-1939:
--

I did some bisection on svn commits and found that the commit for HIVE-1636 
seems to be the point where this broke.

http://svn.apache.org/viewvc?view=revrev=1063549


 Fix test failure in TestContribCliDriver/url_hook.q
 ---

 Key: HIVE-1939
 URL: https://issues.apache.org/jira/browse/HIVE-1939
 Project: Hive
  Issue Type: Bug
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.7.0




-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Hive queries consuming 100% cpu

2011-02-03 Thread Vijay
Hi,

The simplest of hive queries seem to be consuming 100% cpu. This is
with a small 4-node cluster. The machines are pretty beefy (16 cores
per machine, tons of RAM, 16 M+R maximum tasks configured, 1GB RAM for
mapred.child.java.opts, etc). A simple query like select count(1)
from events where the events table has daily partitions of log files
in gzipped file format). While this is probably too generic a question
and there is a bunch of investigation we need to, are there any
specific areas for me to look at? Has anyone see anything like this
before? Also, are there any tools or easy options to profile hive
query execution?

Thanks in advance,
Vijay


[jira] Updated: (HIVE-1951) input16_cc.q is failing in testminimrclidriver

2011-02-03 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1951:
---

Attachment: HIVE-1951.1.patch

changing the test file for a quick fix.

Will open a new jira for the real problem.

The problem here is that hive should process all comments in CliDriver. Hive 
comment can be followd by any other commands (not just hive query command).

 input16_cc.q is failing in testminimrclidriver
 --

 Key: HIVE-1951
 URL: https://issues.apache.org/jira/browse/HIVE-1951
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: He Yongqiang
 Attachments: HIVE-1951.1.patch




-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1951) input16_cc.q is failing in testminimrclidriver

2011-02-03 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990307#comment-12990307
 ] 

He Yongqiang commented on HIVE-1951:


opened jira https://issues.apache.org/jira/browse/HIVE-1953 for the real 
problem.

 input16_cc.q is failing in testminimrclidriver
 --

 Key: HIVE-1951
 URL: https://issues.apache.org/jira/browse/HIVE-1951
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: He Yongqiang
 Attachments: HIVE-1951.1.patch




-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1939) Fix test failure in TestContribCliDriver/url_hook.q

2011-02-03 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990309#comment-12990309
 ] 

Carl Steinbach commented on HIVE-1939:
--

@John: Yup, you're right. The problem is that HIVE-1636 modified
'SHOW TABLES IN db' to throw an error when db doesn't exist.
Previously in this situation the SHOW TABLES command just returned
an empty result set.

url_hook.q points the MetaStore to a new JDO URL and then runs
'SHOW TABLES'. In the past this caused the metastore to initialize
a new metastore schema, but without creating the 'default' database.
Since 'SHOW TABLES' wasn't checking for the existence of the
default database the command succeeded with an empty result set.

I think the correct fix for this problem is to make sure that the metastore
creates the 'default' table if it does not already exist.


 Fix test failure in TestContribCliDriver/url_hook.q
 ---

 Key: HIVE-1939
 URL: https://issues.apache.org/jira/browse/HIVE-1939
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.7.0




-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HIVE-1922) semantic analysis error, when using group by and order by together

2011-02-03 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1922:
-

Priority: Critical  (was: Blocker)

 semantic analysis error, when using group by and order by together
 --

 Key: HIVE-1922
 URL: https://issues.apache.org/jira/browse/HIVE-1922
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.3.0, 0.4.0, 0.4.1, 0.5.0, 0.6.0, 0.7.0
 Environment: Ubuntu Karmic, hadoop 0.20.0, hive 0.7.0
Reporter: Hongwei
Priority: Critical
   Original Estimate: 168h
  Remaining Estimate: 168h

 When I tried queries like, 'select t.c from t  group by t.c sort by t.c;', 
 hive reported error ,'FAILED: Error in semantic analysis: line 1:40 Invalid 
 Table Alias or Column Reference t'.
 But 'select t.c from t  group by t.c ' or 'select t.c from t  sort by t.c;' 
 are ok. 
 'select t.c from t  group by t.c sort by c;' is ok too.
 The hive server gives stack trace like
 11/01/20 03:07:34 INFO parse.SemanticAnalyzer: Get metadata for subqueries
 11/01/20 03:07:34 INFO parse.SemanticAnalyzer: Get metadata for destination 
 tables
 11/01/20 03:07:34 INFO parse.SemanticAnalyzer: Completed getting MetaData in 
 Semantic Analysis
 FAILED: Error in semantic analysis: line 1:40 Invalid Table Alias or Column 
 Reference t
 11/01/20 03:07:34 ERROR ql.Driver: FAILED: Error in semantic analysis: line 
 1:40 Invalid Table Alias or Column Reference t
 org.apache.hadoop.hive.ql.parse.SemanticException: line 1:40 Invalid Table 
 Alias or Column Reference t
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:6743)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:4288)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5446)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6007)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6583)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:343)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:731)
   at 
 org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:116)
   at 
 org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:699)
   at 
 org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:677)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HIVE-1951) input16_cc.q is failing in testminimrclidriver

2011-02-03 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1951:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Yongqiang

 input16_cc.q is failing in testminimrclidriver
 --

 Key: HIVE-1951
 URL: https://issues.apache.org/jira/browse/HIVE-1951
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: He Yongqiang
 Attachments: HIVE-1951.1.patch




-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Resolved: (HIVE-1559) Contrib tests not run as part of 'ant test'

2011-02-03 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach resolved HIVE-1559.
--

Resolution: Invalid

I think this was fixed a while ago as part of some other
ticket. I see contrib/build.xml listed in the filelist of
of the 'iterate-test' Ant macro.

 Contrib tests not run as part of 'ant test'
 ---

 Key: HIVE-1559
 URL: https://issues.apache.org/jira/browse/HIVE-1559
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Namit Jain

 Copying from https://issues.apache.org/jira/browse/HIVE-1556
  BTW, if I run 'ant test' in hive's root directory, it seems the 
  TestContrib* were not tested. Is it expected?
 TestContribCliDriver should be run as part of 'ant test'

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms

2011-02-03 Thread Anja Gruenheid (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990331#comment-12990331
 ] 

Anja Gruenheid commented on HIVE-1940:
--

I have set up the last stable version, but as far as I understood, some 
features have been added during the current iteration, which also have had 
impact on the design of the MetaStore. Is there an up-to-date overview of the 
MetaStore somewhere or should I retrace the updates that have been made since 
the last release?

If I can collect all the data that I need, I'll create the model.

 Query Optimization Using Column Metadata and Histograms
 ---

 Key: HIVE-1940
 URL: https://issues.apache.org/jira/browse/HIVE-1940
 Project: Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Reporter: Anja Gruenheid

 The current basis for cost-based query optimization in Hive is information 
 gathered on tables and partitions. To make further improvements in query 
 optimization possible, the next step is to develop and implement 
 possibilities to gather information on columns as discussed in issue HIVE-33. 
 After that, an implementation of histograms is a possible option to use and 
 collect run-time statistics. Next to the actual implementation of these 
 features, it is also necessary to develop a consistent storage model for the 
 MetaStore.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (HIVE-1954) Allow CLI to connect to Hive server and execute commands remotely

2011-02-03 Thread Ning Zhang (JIRA)
Allow CLI to connect to Hive server and execute commands remotely
-

 Key: HIVE-1954
 URL: https://issues.apache.org/jira/browse/HIVE-1954
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Reporter: Ning Zhang
Assignee: Ning Zhang


Currently Hive CLI runs the client side code (compilation and metastore 
operations etc) in local machine. We should extend CLI to connect to Hive 
server and execute commands remotely. 

Benefits include: 
  * client side memory requirement is alleviated.
  * better security control on Hive server side.
  * possible use of metastore cache layer in Hive server side, etc. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Build failed in Hudson: Hive-trunk-h0.20 #531

2011-02-03 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/531/changes

Changes:

[namit] HIVE-1716 Make TestHBaseCliDriver use dynamic ports to avoid conflicts 
with
already-running services (John Sichi via namit)

--
[...truncated 22563 lines...]
[junit] POSTHOOK: Output: default@srcbucket
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt'
 INTO TABLE srcbucket
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt
[junit] Loading data to table srcbucket
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt'
 INTO TABLE srcbucket
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket
[junit] OK
[junit] PREHOOK: query: CREATE TABLE srcbucket2(key int, value string) 
CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: CREATE TABLE srcbucket2(key int, value string) 
CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 INTO TABLE src
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table src
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 INTO TABLE src
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt'
 INTO TABLE src1
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt
[junit] Loading data to table src1
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt'
 INTO TABLE src1
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: 

Re: Hive queries consuming 100% cpu

2011-02-03 Thread Viral Bajaria
Hey Vijay,

You can go to the mapred ui, normally it runs on port 50030 of the namenode
and see how many map jobs got created for your submitted query.

You said that the events table has daily partitions but the example query
that you have does not prune the partitions by specifying a WHERE clause. So
I have the following questions
1) how big is the table (you can just do a hadoop dfs -dus
hdfs-dir-for-table ? how many partitions ?
2) do you really intend to count the number of events across all days ?
3) could you build a query which computes over 1-5 day(s) and persists the
data in a separate table for consumption later on ?

Based on your node configuration, I am just guessing the amount of data to
process is too large and hence the high CPU.

Thanks,
Viral

On Thu, Feb 3, 2011 at 12:49 PM, Vijay tec...@gmail.com wrote:

 Hi,

 The simplest of hive queries seem to be consuming 100% cpu. This is
 with a small 4-node cluster. The machines are pretty beefy (16 cores
 per machine, tons of RAM, 16 M+R maximum tasks configured, 1GB RAM for
 mapred.child.java.opts, etc). A simple query like select count(1)
 from events where the events table has daily partitions of log files
 in gzipped file format). While this is probably too generic a question
 and there is a bunch of investigation we need to, are there any
 specific areas for me to look at? Has anyone see anything like this
 before? Also, are there any tools or easy options to profile hive
 query execution?

 Thanks in advance,
 Vijay



[jira] Commented: (HIVE-1954) Allow CLI to connect to Hive server and execute commands remotely

2011-02-03 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990346#comment-12990346
 ] 

Edward Capriolo commented on HIVE-1954:
---

This might be a dupe of https://issues.apache.org/jira/browse/HIVE-818

 Allow CLI to connect to Hive server and execute commands remotely
 -

 Key: HIVE-1954
 URL: https://issues.apache.org/jira/browse/HIVE-1954
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Reporter: Ning Zhang
Assignee: Ning Zhang

 Currently Hive CLI runs the client side code (compilation and metastore 
 operations etc) in local machine. We should extend CLI to connect to Hive 
 server and execute commands remotely. 
 Benefits include: 
   * client side memory requirement is alleviated.
   * better security control on Hive server side.
   * possible use of metastore cache layer in Hive server side, etc. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (HIVE-1955) Support non-constant expressions for array indexes.

2011-02-03 Thread Adam Kramer (JIRA)
Support non-constant expressions for array indexes.
---

 Key: HIVE-1955
 URL: https://issues.apache.org/jira/browse/HIVE-1955
 Project: Hive
  Issue Type: Improvement
Reporter: Adam Kramer


FAILED: Error in semantic analysis: line 4:8 Non Constant Expressions for Array 
Indexes not Supported dut

...just wrote my own UDF to do this, and it is trivial. We should support this 
natively.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1954) Allow CLI to connect to Hive server and execute commands remotely

2011-02-03 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990372#comment-12990372
 ] 

Ning Zhang commented on HIVE-1954:
--

Cool. I should have searched the JIRA first. Are you working on this right now?

 Allow CLI to connect to Hive server and execute commands remotely
 -

 Key: HIVE-1954
 URL: https://issues.apache.org/jira/browse/HIVE-1954
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Reporter: Ning Zhang
Assignee: Ning Zhang

 Currently Hive CLI runs the client side code (compilation and metastore 
 operations etc) in local machine. We should extend CLI to connect to Hive 
 server and execute commands remotely. 
 Benefits include: 
   * client side memory requirement is alleviated.
   * better security control on Hive server side.
   * possible use of metastore cache layer in Hive server side, etc. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Hive queries consuming 100% cpu

2011-02-03 Thread Vijay
Sorry i should've given more details.

The query was limited by a partition range; I just omitted the WHERE
clause in the mail.
The table is not that big. For each day, there is one gzipped file.
The largest file is about 250MB (close to 2GB uncompressed).
I did intend to count and that was just to test since I wanted to run
a query that did the most minimal logic/processing.

Here's a test I ran now. The query is getting count(1) for 8 days. It
spawned 8 maps as expected. The maps run for anywhere between 42 to 69
seconds (which may or may not be right; I need to check that). It
spawned only one reduce task. The reducer ran for 117 seconds, which
seems long for this query.

On Thu, Feb 3, 2011 at 2:31 PM, Viral Bajaria viral.baja...@gmail.com wrote:
 Hey Vijay,
 You can go to the mapred ui, normally it runs on port 50030 of the namenode
 and see how many map jobs got created for your submitted query.
 You said that the events table has daily partitions but the example query
 that you have does not prune the partitions by specifying a WHERE clause. So
 I have the following questions
 1) how big is the table (you can just do a hadoop dfs -dus
 hdfs-dir-for-table ? how many partitions ?
 2) do you really intend to count the number of events across all days ?
 3) could you build a query which computes over 1-5 day(s) and persists the
 data in a separate table for consumption later on ?
 Based on your node configuration, I am just guessing the amount of data to
 process is too large and hence the high CPU.
 Thanks,
 Viral
 On Thu, Feb 3, 2011 at 12:49 PM, Vijay tec...@gmail.com wrote:

 Hi,

 The simplest of hive queries seem to be consuming 100% cpu. This is
 with a small 4-node cluster. The machines are pretty beefy (16 cores
 per machine, tons of RAM, 16 M+R maximum tasks configured, 1GB RAM for
 mapred.child.java.opts, etc). A simple query like select count(1)
 from events where the events table has daily partitions of log files
 in gzipped file format). While this is probably too generic a question
 and there is a bunch of investigation we need to, are there any
 specific areas for me to look at? Has anyone see anything like this
 before? Also, are there any tools or easy options to profile hive
 query execution?

 Thanks in advance,
 Vijay




[jira] Updated: (HIVE-1952) fix some outputs and make some tests deterministic

2011-02-03 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1952:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed! Thanks Namit!

 fix some outputs and make some tests deterministic
 --

 Key: HIVE-1952
 URL: https://issues.apache.org/jira/browse/HIVE-1952
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.1952.1.patch


 Some of the tests are un-deterministic, and are causing intermediate diffs

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HIVE-1950) Block merge for RCFile

2011-02-03 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1950:
---

Attachment: HIVE-1950.1.patch

A patch for review. 

The code now is kind of very clean. Comments about how to make it clean are 
welcome!

 Block merge for RCFile
 --

 Key: HIVE-1950
 URL: https://issues.apache.org/jira/browse/HIVE-1950
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1950.1.patch


 In our env, there are a lot of small files inside one partition/table. In 
 order to reduce the namenode load, we have one dedicated housekeeping job 
 running to merge these file. Right now the merge is an 'insert overwrite' in 
 hive, and requires decompress the data and compress it. This jira is to add a 
 command in Hive to do the merge without decompress and recompress the data.
 Something like alter table tbl_name [partition ()] merge files. In this 
 jira the new command will only support RCFile, since there need some new APIs 
 to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-03 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990389#comment-12990389
 ] 

He Yongqiang commented on HIVE-1950:


review board:
https://reviews.apache.org/r/388/


 Block merge for RCFile
 --

 Key: HIVE-1950
 URL: https://issues.apache.org/jira/browse/HIVE-1950
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1950.1.patch


 In our env, there are a lot of small files inside one partition/table. In 
 order to reduce the namenode load, we have one dedicated housekeeping job 
 running to merge these file. Right now the merge is an 'insert overwrite' in 
 hive, and requires decompress the data and compress it. This jira is to add a 
 command in Hive to do the merge without decompress and recompress the data.
 Something like alter table tbl_name [partition ()] merge files. In this 
 jira the new command will only support RCFile, since there need some new APIs 
 to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Review Request: HIVE-1950

2011-02-03 Thread Yongqiang He

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/388/
---

Review request for hive.


Summary
---

early review


This addresses bug HIVE-1950.
https://issues.apache.org/jira/browse/HIVE-1950


Diffs
-

  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ConditionalTask.java 1067036 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1067036 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 1067036 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java 
PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHook.java 
PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Throttle.java 1067036 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 1067036 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 1067036 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 
PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeInputFormat.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeOutputFormat.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileValueBufferWrapper.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/AlterTablePartMergeFilesDesc.java
 PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
1067036 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1067036 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 
1067036 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/DDLWork.java 1067036 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java 1067036 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1067036 
  trunk/ql/src/test/queries/clientpositive/alter_merge.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/alter_merge.q.out PRE-CREATION 
  trunk/shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java 
1067036 
  trunk/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java 
1067036 
  trunk/shims/src/common/java/org/apache/hadoop/hive/shims/CombineHiveKey.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/388/diff


Testing
---


Thanks,

Yongqiang



[jira] Created: (HIVE-1956) Provide DFS initialization script for Hive

2011-02-03 Thread JIRA
Provide DFS initialization script for Hive
---

 Key: HIVE-1956
 URL: https://issues.apache.org/jira/browse/HIVE-1956
 Project: Hive
  Issue Type: Improvement
  Components: Configuration, Server Infrastructure
Affects Versions: 0.7.0
Reporter: Bruno Mahé
Priority: Trivial
 Fix For: 0.7.0
 Attachments: HIVE-1956.patch

This script automates the creation of the Hive warehouse and scratch 
directories on DFS

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HIVE-1956) Provide DFS initialization script for Hive

2011-02-03 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Mahé updated HIVE-1956:
-

Attachment: HIVE-1956.patch

 Provide DFS initialization script for Hive
 ---

 Key: HIVE-1956
 URL: https://issues.apache.org/jira/browse/HIVE-1956
 Project: Hive
  Issue Type: Improvement
  Components: Configuration, Server Infrastructure
Affects Versions: 0.7.0
Reporter: Bruno Mahé
Priority: Trivial
 Fix For: 0.7.0

 Attachments: HIVE-1956.patch


 This script automates the creation of the Hive warehouse and scratch 
 directories on DFS

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1956) Provide DFS initialization script for Hive

2011-02-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990395#comment-12990395
 ] 

Bruno Mahé commented on HIVE-1956:
--

Review request: https://reviews.apache.org/r/389/

 Provide DFS initialization script for Hive
 ---

 Key: HIVE-1956
 URL: https://issues.apache.org/jira/browse/HIVE-1956
 Project: Hive
  Issue Type: Improvement
  Components: Configuration, Server Infrastructure
Affects Versions: 0.7.0
Reporter: Bruno Mahé
Priority: Trivial
 Fix For: 0.7.0

 Attachments: HIVE-1956.patch


 This script automates the creation of the Hive warehouse and scratch 
 directories on DFS

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Review Request: HIVE-1941: support explicit view partitioning

2011-02-03 Thread John Sichi

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/390/
---

Review request for hive.


Summary
---

review request from JVS


This addresses bug HIVE-1941.
https://issues.apache.org/jira/browse/HIVE-1941


Diffs
-

  
http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 1067043 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 1067043 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java
 1067043 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
 1067043 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ErrorMsg.java
 1067043 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g
 1067043 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
 1067043 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java
 1067043 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/AddPartitionDesc.java
 1067043 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableDesc.java
 1067043 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CreateViewDesc.java
 1067043 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java
 1067043 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/alter_view_failure2.q
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/alter_view_failure3.q
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/alter_view_failure4.q
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/alter_view_failure5.q
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/analyze_view.q
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_view_failure6.q
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_view_failure7.q
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_view_failure8.q
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_view_failure9.q
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/create_view_partitioned.q
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/alter_view_failure.q.out
 1067043 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/alter_view_failure2.q.out
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/alter_view_failure3.q.out
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/alter_view_failure4.q.out
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/alter_view_failure5.q.out
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/analyze_view.q.out
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_view_failure6.q.out
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_view_failure7.q.out
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_view_failure8.q.out
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_view_failure9.q.out
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/create_view_partitioned.q.out
 PRE-CREATION 

Diff: https://reviews.apache.org/r/390/diff


Testing
---


Thanks,

John



[jira] Updated: (HIVE-1941) support explicit view partitioning

2011-02-03 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1941:
-

Status: Patch Available  (was: Open)

https://reviews.apache.org/r/390/


 support explicit view partitioning
 --

 Key: HIVE-1941
 URL: https://issues.apache.org/jira/browse/HIVE-1941
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Attachments: HIVE-1941.1.patch, HIVE-1941.2.patch


 Allow creation of a view with an explicit partitioning definition, and 
 support ALTER VIEW ADD/DROP PARTITION for instantiating partitions.
 For more information, see
 http://wiki.apache.org/hadoop/Hive/PartitionedViews

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms

2011-02-03 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990411#comment-12990411
 ] 

John Sichi commented on HIVE-1940:
--

If you just svn update to the tip of trunk and build/install from there, you'll 
get the latest metastore.  Substantial additions since 0.6 include support for 
indexes, authorization, and various database properties.

 Query Optimization Using Column Metadata and Histograms
 ---

 Key: HIVE-1940
 URL: https://issues.apache.org/jira/browse/HIVE-1940
 Project: Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Reporter: Anja Gruenheid

 The current basis for cost-based query optimization in Hive is information 
 gathered on tables and partitions. To make further improvements in query 
 optimization possible, the next step is to develop and implement 
 possibilities to gather information on columns as discussed in issue HIVE-33. 
 After that, an implementation of histograms is a possible option to use and 
 collect run-time statistics. Next to the actual implementation of these 
 features, it is also necessary to develop a consistent storage model for the 
 MetaStore.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HIVE-1956) Provide DFS initialization script for Hive

2011-02-03 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Mahé updated HIVE-1956:
-

Status: Patch Available  (was: Open)

 Provide DFS initialization script for Hive
 ---

 Key: HIVE-1956
 URL: https://issues.apache.org/jira/browse/HIVE-1956
 Project: Hive
  Issue Type: Improvement
  Components: Configuration, Server Infrastructure
Affects Versions: 0.7.0
Reporter: Bruno Mahé
Priority: Trivial
 Fix For: 0.7.0

 Attachments: HIVE-1956.patch


 This script automates the creation of the Hive warehouse and scratch 
 directories on DFS

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-02-03 Thread John Sichi

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/392/
---

Review request for hive.


Summary
---

Preliminary review.


This addresses bug HIVE-1694.
https://issues.apache.org/jira/browse/HIVE-1694


Diffs
-

  http://svn.apache.org/repos/asf/hive/trunk/build.xml 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyCtx.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteIndexSubqueryCtx.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteIndexSubqueryProcFactory.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteRemoveGroupbyCtx.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteRemoveGroupbyProcFactory.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java
 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java
 1067048 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/fatal.q
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out
 PRE-CREATION 

Diff: https://reviews.apache.org/r/392/diff


Testing
---


Thanks,

John



[jira] Updated: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-02-03 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1694:
-

Description: 
The index building patch (Hive-417) is checked into trunk, this JIRA issue 
tracks supporting indexes in Hive compiler  execution engine for SELECT 
queries.

This is in ref. to John's comment at


https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869

on creating separate JIRA issue for tracking index usage in optimizer  query 
execution.

The aim of this effort is to use indexes to accelerate query execution (for 
certain class of queries). E.g.
- Filters and range scans (already being worked on by He Yongqiang as part of 
HIVE-417?)
- Joins (index based joins)
- Group By, Order By and other misc cases

The proposal is multi-step:
1. Building index based operators, compiler and execution engine changes
2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
between index scans, full table scans etc.)

This JIRA initially focuses on the first step. This JIRA is expected to hold 
the information about index based plans  operator implementations for above 
mentioned cases. 

  was:

The index building patch (Hive-417) is checked into trunk, this JIRA issue 
tracks supporting indexes in Hive compiler  execution engine for SELECT 
queries.

This is in ref. to John's comment at


https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869

on creating separate JIRA issue for tracking index usage in optimizer  query 
execution.

The aim of this effort is to use indexes to accelerate query execution (for 
certain class of queries). E.g.
- Filters and range scans (already being worked on by He Yongqiang as part of 
HIVE-417?)
- Joins (index based joins)
- Group By, Order By and other misc cases

The proposal is multi-step:
1. Building index based operators, compiler and execution engine changes
2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
between index scans, full table scans etc.)

This JIRA initially focuses on the first step. This JIRA is expected to hold 
the information about index based plans  operator implementations for above 
mentioned cases. 

Summary: Accelerate GROUP BY execution using indexes  (was: Accelerate 
query execution using indexes)

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Nikhil Deshpande
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, 
 demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Build failed in Hudson: Hive-trunk-h0.20 #532

2011-02-03 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/532/changes

Changes:

[namit] HIVE-1951 input16_cc.q is failing in testminimrclidriver
(He Yongqiang via namit)

--
[...truncated 22598 lines...]
[junit] POSTHOOK: Output: default@srcbucket
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt'
 INTO TABLE srcbucket
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt
[junit] Loading data to table srcbucket
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt'
 INTO TABLE srcbucket
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket
[junit] OK
[junit] PREHOOK: query: CREATE TABLE srcbucket2(key int, value string) 
CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: CREATE TABLE srcbucket2(key int, value string) 
CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 INTO TABLE src
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table src
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 INTO TABLE src
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt'
 INTO TABLE src1
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt
[junit] Loading data to table src1
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt'
 INTO TABLE src1
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src1
[junit] OK

[jira] Commented: (HIVE-1956) Provide DFS initialization script for Hive

2011-02-03 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990429#comment-12990429
 ] 

Namit Jain commented on HIVE-1956:
--

+1

 Provide DFS initialization script for Hive
 ---

 Key: HIVE-1956
 URL: https://issues.apache.org/jira/browse/HIVE-1956
 Project: Hive
  Issue Type: Improvement
  Components: Configuration, Server Infrastructure
Affects Versions: 0.7.0
Reporter: Bruno Mahé
Priority: Trivial
 Fix For: 0.7.0

 Attachments: HIVE-1956.patch


 This script automates the creation of the Hive warehouse and scratch 
 directories on DFS

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Build failed in Hudson: Hive-trunk-h0.20 #533

2011-02-03 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/533/changes

Changes:

[heyongqiang] HIVE-1952. fix some outputs and make some tests deterministic 
(namit via He Yongqiang)

--
[...truncated 21915 lines...]
[junit] POSTHOOK: Output: default@srcbucket
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt'
 INTO TABLE srcbucket
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt
[junit] Loading data to table srcbucket
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt'
 INTO TABLE srcbucket
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket
[junit] OK
[junit] PREHOOK: query: CREATE TABLE srcbucket2(key int, value string) 
CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: CREATE TABLE srcbucket2(key int, value string) 
CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 INTO TABLE src
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table src
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 INTO TABLE src
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt'
 INTO TABLE src1
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt
[junit] Loading data to table src1
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt'
 INTO TABLE src1
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src1

[jira] Updated: (HIVE-1956) Provide DFS initialization script for Hive

2011-02-03 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1956:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Bruno

 Provide DFS initialization script for Hive
 ---

 Key: HIVE-1956
 URL: https://issues.apache.org/jira/browse/HIVE-1956
 Project: Hive
  Issue Type: Improvement
  Components: Configuration, Server Infrastructure
Affects Versions: 0.7.0
Reporter: Bruno Mahé
Priority: Trivial
 Fix For: 0.7.0

 Attachments: HIVE-1956.patch


 This script automates the creation of the Hive warehouse and scratch 
 directories on DFS

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-02-03 Thread Prajakta Kalmegh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990466#comment-12990466
 ] 

Prajakta Kalmegh commented on HIVE-1694:


Thanks John. We will ensure that henceforth. 

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Nikhil Deshpande
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, 
 demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HIVE-1948) Have audit logging in the Metastore

2011-02-03 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HIVE-1948:
--

Attachment: audit-log.1.patch

A slightly updated patch.

 Have audit logging in the Metastore
 ---

 Key: HIVE-1948
 URL: https://issues.apache.org/jira/browse/HIVE-1948
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.7.0

 Attachments: audit-log.1.patch, audit-log.patch


 It would be good to have audit logging in the metastore, similar to Hadoop's 
 NameNode audit logging. This would allow administrators to dig into details 
 about which user performed metadata operations (like create/drop 
 tables/partitions) and from where (IP address).

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira