[jira] Commented: (HIVE-2053) Hive can't find the Plan

2011-03-17 Thread Aaron Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007839#comment-13007839
 ] 

Aaron Guo commented on HIVE-2053:
-

Zhangning,this isn't a bug. sorry for your time.

 Hive can't find the Plan
 

 Key: HIVE-2053
 URL: https://issues.apache.org/jira/browse/HIVE-2053
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Aaron Guo
Priority: Critical
 Attachments: patch-1.patch


 We I execute this SQL: select count(1) from table1;
 The MR can't execute for it can't find the Plan File in local path.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-1815) The class HiveResultSet should implement batch fetching.

2011-03-17 Thread Bennie Schut (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bennie Schut updated HIVE-1815:
---

Attachment: HIVE-1815.2.patch.txt

Updated to use an iterator instead of deleting items.

 The class HiveResultSet should implement batch fetching.
 

 Key: HIVE-1815
 URL: https://issues.apache.org/jira/browse/HIVE-1815
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.8.0
 Environment: Custom Java application using the Hive JDBC driver to 
 connect to a Hive server, execute a Hive query and process the results.
Reporter: Guy le Mar
 Fix For: 0.8.0

 Attachments: HIVE-1815.1.patch.txt, HIVE-1815.2.patch.txt


 When using the Hive JDBC driver, you can execute a Hive query and obtain a 
 HiveResultSet instance that contains the results of the query.
 Unfortunately, HiveResultSet can then only fetch a single row of these 
 results from the Hive server at a time. As a consequence, it's extremely slow 
 to fetch a resultset of anything other than a trivial size.
 It would be nice for the HiveResultSet to be able to fetch N rows from the 
 server at a time, so that performance is suitable to support applications 
 that provide human interaction. 
 (From memory, I think it took me around 20 minutes to fetch 4000 rows.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-trunk-h0.20 #615

2011-03-17 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/615/changes

Changes:

[jvs] HIVE-2059. Add datanucleus.identifierFactory property HiveConf to avoid
unintentional MetaStore Schema corruption
(Carl Steinbach via jvs)

--
[...truncated 27861 lines...]
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-17_01-17-25_028_5624071252066739812/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=number
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=number
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=number
[junit] Job running in-process (local Hadoop)
[junit] Hadoop job information for null: number of mappers: 0; number of 
reducers: 0
[junit] 2011-03-17 01:17:28,126 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-17_01-17-25_028_5624071252066739812/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103170117_1121386497.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-17_01-17-29_660_4426750602106202109/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-17_01-17-29_660_4426750602106202109/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: 

[jira] Commented: (HIVE-2054) Exception on windows when using the jdbc driver. IOException: The system cannot find the path specified

2011-03-17 Thread Bennie Schut (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007848#comment-13007848
 ] 

Bennie Schut commented on HIVE-2054:


Yes setting hive.querylog.location makes it work.

At the very least we should remove the extends SessionState since it 
introduces a link to the hive server code which makes no sense at this point in 
time. However I have a preference for removing it all together since it 
currently adds no value.  On the jdbc side I would expect the HiveConnection to 
hold the state which it is actually doing right now.



 Exception on windows when using the jdbc driver. IOException: The system 
 cannot find the path specified
 -

 Key: HIVE-2054
 URL: https://issues.apache.org/jira/browse/HIVE-2054
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.8.0
Reporter: Bennie Schut
Assignee: Bennie Schut
Priority: Minor
 Fix For: 0.8.0

 Attachments: HIVE-2054.1.patch.txt


 It seems something recently changed on the jdbc driver which causes this 
 IOException on windows.
 java.lang.RuntimeException: java.io.IOException: The system cannot find the 
 path specified
   at 
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:237)
   at 
 org.apache.hadoop.hive.jdbc.HiveConnection.init(HiveConnection.java:73)
   at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:110)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.7.0-h0.20 #41

2011-03-17 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/41/changes

Changes:

[jvs] HIVE-2059. Add datanucleus.identifierFactory property HiveConf to avoid
unintentional MetaStore Schema corruption
(Carl Steinbach via jvs)

--
[...truncated 27312 lines...]
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-17_01-47-58_661_1540025728397251083/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=number
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=number
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=number
[junit] Job running in-process (local Hadoop)
[junit] 2011-03-17 01:48:01,677 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-17_01-47-58_661_1540025728397251083/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103170148_942168971.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-17_01-48-03_350_385696872197569180/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-17_01-48-03_350_385696872197569180/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table 

Root/ Fetch Stage

2011-03-17 Thread Joerg Schad

Hi,
when exploring the Hive Explain statement we were wondering about the different 
stages.
So here two questions regarding the below Explain statement 
1. Why are there two root stages? What exactly does root stage mean (i assume 
it meanst there are no predecessors)?
2. What exactly is a Fetch Stage? Is it an actual MapReduce stage?
3. Where can I find additional information about these stages in general?

Thanks a lot for your support 
JS
P.S. This has already been posted to the user mailing list but from there we 
unfortunately received no reply...


hive EXPLAIN   SELECT l_orderkey, o_shippingpriority, sum(l_extendedprice) 
FROM orders JOIN lineitem ON (lineitem.l_orderkey = orders.o_orderkey) GROUP BY 
l_orderkey, o_shippingpriority;
OK
ABSTRACT SYNTAX TREE:
 (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF orders) (TOK_TABREF lineitem) (= (. 
(TOK_TABLE_OR_COL lineitem) l_orderkey) (. (TOK_TABLE_OR_COL orders) 
o_orderkey (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT 
(TOK_SELEXPR (TOK_TABLE_OR_COL l_orderkey)) (TOK_SELEXPR (TOK_TABLE_OR_COL 
o_shippingpriority)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_TABLE_OR_COL 
l_extendedprice (TOK_GROUPBY (TOK_TABLE_OR_COL l_orderkey) 
(TOK_TABLE_OR_COL o_shippingpriority

STAGE DEPENDENCIES:
 Stage-1 is a root stage
 Stage-2 depends on stages: Stage-1
 Stage-0 is a root stage

STAGE PLANS:
 Stage: Stage-1
 Map Reduce
 Alias - Map Operator Tree:
 lineitem 
 TableScan
 alias: lineitem
 Reduce Output Operator
 key expressions:
 expr: l_orderkey
 type: int
 sort order: +
 Map-reduce partition columns:
 expr: l_orderkey
 type: int
 tag: 1
 value expressions:
 expr: l_orderkey
 type: int
 expr: l_extendedprice
 type: int
 orders 
 TableScan
 alias: orders
 Reduce Output Operator
 key expressions:
 expr: o_orderkey
 type: int
 sort order: +
 Map-reduce partition columns:
 expr: o_orderkey
 type: int
 tag: 0
 value expressions:
 expr: o_shippingpriority
 type: int
 Reduce Operator Tree:
 Join Operator
 condition map:
 Inner Join 0 to 1
 condition expressions:
 0 {VALUE._col1}
 1 {VALUE._col0} {VALUE._col1}
 handleSkewJoin: false
 outputColumnNames: _col1, _col3, _col4
 Select Operator
 expressions:
 expr: _col3
 type: int
 expr: _col1
 type: int
 expr: _col4
 type: int
 outputColumnNames: _col3, _col1, _col4
 Group By Operator
 aggregations:
 expr: sum(_col4)
 bucketGroup: false
 keys:
 expr: _col3
 type: int
 expr: _col1
 type: int
 mode: hash
 outputColumnNames: _col0, _col1, _col2
 File Output Operator
 compressed: false
 GlobalTableId: 0
 table:
 input format: org.apache.hadoop.mapred.SequenceFileInputFormat
 output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat

 Stage: Stage-2
 Map Reduce
 Alias - Map Operator Tree:
 
hdfs://localhost:9000/tmp/hive-joergschad/hive_2011-03-14_17-22-14_249_1239673786236436657/10002
 
 Reduce Output Operator
 key expressions:
 expr: _col0
 type: int
 expr: _col1
 type: int
 sort order: ++
 Map-reduce partition columns:
 expr: _col0
 type: int
 expr: _col1
 type: int
 tag: -1
 value expressions:
 expr: _col2
 type: bigint
 Reduce Operator Tree:
 Group By Operator
 aggregations:
 expr: sum(VALUE._col0)
 bucketGroup: false
 keys:
 expr: KEY._col0
 type: int
 expr: KEY._col1
 type: int
 mode: mergepartial
 outputColumnNames: _col0, _col1, _col2
 Select Operator
 expressions:
 expr: _col0
 type: int
 expr: _col1
 type: int
 expr: _col2
 type: bigint
 outputColumnNames: _col0, _col1, _col2
 File Output Operator
 compressed: false
 GlobalTableId: 0
 table:
 input format: org.apache.hadoop.mapred.TextInputFormat
 output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

 Stage: Stage-0
 Fetch Operator
 limit: -1


Re: Review Request: HIVE-1815: The class HiveResultSet should implement batch fetching.

2011-03-17 Thread Bennie Schut

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/514/
---

(Updated 2011-03-17 01:06:34.734673)


Review request for hive.


Changes
---

Updated to use an iterator instead of deleting items.


Summary
---

HIVE-1815: The class HiveResultSet should implement batch fetching.


This addresses bug HIVE-1815.
https://issues.apache.org/jira/browse/HIVE-1815


Diffs (updated)
-

  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveQueryResultSet.java 
1081785 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveStatement.java 1081785 
  trunk/jdbc/src/test/org/apache/hadoop/hive/jdbc/TestJdbcDriver.java 1081785 

Diff: https://reviews.apache.org/r/514/diff


Testing
---


Thanks,

Bennie



[jira] Created: (HIVE-2060) CLI local mode hit NPE when exiting by ^D

2011-03-17 Thread Ning Zhang (JIRA)
CLI local mode hit NPE when exiting by ^D
-

 Key: HIVE-2060
 URL: https://issues.apache.org/jira/browse/HIVE-2060
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.8.0
Reporter: Ning Zhang
Assignee: Ning Zhang
Priority: Minor
 Fix For: 0.8.0


CLI gets an NPE when running in local mode and hit an ^D to exit it. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2060) CLI local mode hit NPE when exiting by ^D

2011-03-17 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2060:
-

Attachment: HIVE-2060.patch

 CLI local mode hit NPE when exiting by ^D
 -

 Key: HIVE-2060
 URL: https://issues.apache.org/jira/browse/HIVE-2060
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.8.0
Reporter: Ning Zhang
Assignee: Ning Zhang
Priority: Minor
 Fix For: 0.8.0

 Attachments: HIVE-2060.patch


 CLI gets an NPE when running in local mode and hit an ^D to exit it. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2060) CLI local mode hit NPE when exiting by ^D

2011-03-17 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2060:
-

Status: Patch Available  (was: Open)

 CLI local mode hit NPE when exiting by ^D
 -

 Key: HIVE-2060
 URL: https://issues.apache.org/jira/browse/HIVE-2060
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.8.0
Reporter: Ning Zhang
Assignee: Ning Zhang
Priority: Minor
 Fix For: 0.8.0

 Attachments: HIVE-2060.patch


 CLI gets an NPE when running in local mode and hit an ^D to exit it. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-2060) CLI local mode hit NPE when exiting by ^D

2011-03-17 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008065#comment-13008065
 ] 

He Yongqiang commented on HIVE-2060:


+1, running tests

 CLI local mode hit NPE when exiting by ^D
 -

 Key: HIVE-2060
 URL: https://issues.apache.org/jira/browse/HIVE-2060
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.8.0
Reporter: Ning Zhang
Assignee: Ning Zhang
Priority: Minor
 Fix For: 0.8.0

 Attachments: HIVE-2060.patch


 CLI gets an NPE when running in local mode and hit an ^D to exit it. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1384) HiveServer should run as the user who submitted the query.

2011-03-17 Thread Ankita Bakshi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008077#comment-13008077
 ] 

Ankita Bakshi commented on HIVE-1384:
-

This is required to use hive authorization infrastructure. 

 HiveServer should run as the user who submitted the query.
 --

 Key: HIVE-1384
 URL: https://issues.apache.org/jira/browse/HIVE-1384
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Server Infrastructure
Reporter: He Yongqiang
Assignee: He Yongqiang



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.7.0-h0.20 #42

2011-03-17 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/42/

--
[...truncated 27035 lines...]
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 INTO TABLE src
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt
[junit] Copying file: 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.src
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 INTO TABLE src
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src
[junit] OK
[junit] Copying file: 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt'
 INTO TABLE src1
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt
[junit] Loading data to table default.src1
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt'
 INTO TABLE src1
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src1
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq'
 INTO TABLE src_sequencefile
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq
[junit] Copying file: 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq
[junit] Loading data to table default.src_sequencefile
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq'
 INTO TABLE src_sequencefile
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_sequencefile
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq'
 INTO TABLE src_thrift
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq
[junit] Copying file: 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq
[junit] Loading data to table default.src_thrift
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq'
 INTO TABLE src_thrift
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_thrift
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt'
 INTO TABLE src_json
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt
[junit] Copying file: 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt
[junit] Loading data to table default.src_json
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt'
 INTO TABLE src_json
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_json
[junit] OK
[junit] diff 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/ql/test/logs/negative/wrong_distinct1.q.out
 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/ql/src/test/results/compiler/errors/wrong_distinct1.q.out
[junit] Done query: wrong_distinct1.q
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/ql/tmp/hive_job_log_hudson_201103171212_142244274.txt
[junit] Begin query: wrong_distinct2.q
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/ql/tmp/hive_job_log_hudson_201103171212_1172891919.txt
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-08',hr='11')
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt
[junit] Copying file: 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.srcpart partition (ds=2008-04-08, 
hr=11)
[junit] POSTHOOK: query: LOAD DATA 

Build failed in Jenkins: Hive-trunk-h0.20 #616

2011-03-17 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/616/

--
[...truncated 27983 lines...]
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-17_12-15-12_194_1579804070209208471/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=number
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=number
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=number
[junit] Job running in-process (local Hadoop)
[junit] Hadoop job information for null: number of mappers: 0; number of 
reducers: 0
[junit] 2011-03-17 12:15:15,304 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-17_12-15-12_194_1579804070209208471/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103171215_555658050.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-17_12-15-16_806_158600829949742058/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-17_12-15-16_806_158600829949742058/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 

[jira] Updated: (HIVE-1959) Potential memory leak when same connection used for long time. TaskInfo and QueryInfo objects are getting accumulated on executing more queries on the same connection.

2011-03-17 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1959:
-

   Resolution: Fixed
Fix Version/s: 0.8.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Comitted. Thanks Chinna!

 Potential memory leak when same connection used for long time. TaskInfo and 
 QueryInfo objects are getting accumulated on executing more queries on the 
 same connection.
 ---

 Key: HIVE-1959
 URL: https://issues.apache.org/jira/browse/HIVE-1959
 Project: Hive
  Issue Type: Bug
  Components: Server Infrastructure
Affects Versions: 0.8.0
 Environment: Hadoop 0.20.1, Hive0.5.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: 0.8.0

 Attachments: HIVE-1959.patch


 *org.apache.hadoop.hive.ql.history.HiveHistory$TaskInfo* and 
 *org.apache.hadoop.hive.ql.history.HiveHistory$QueryInfo* these two objects 
 are getting accumulated on executing more number of queries on the same 
 connection. These objects are getting released only when the connection is 
 closed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1815) The class HiveResultSet should implement batch fetching.

2011-03-17 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008205#comment-13008205
 ] 

Ning Zhang commented on HIVE-1815:
--

+1. Will commit if tests pass. 

 The class HiveResultSet should implement batch fetching.
 

 Key: HIVE-1815
 URL: https://issues.apache.org/jira/browse/HIVE-1815
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.8.0
 Environment: Custom Java application using the Hive JDBC driver to 
 connect to a Hive server, execute a Hive query and process the results.
Reporter: Guy le Mar
 Fix For: 0.8.0

 Attachments: HIVE-1815.1.patch.txt, HIVE-1815.2.patch.txt


 When using the Hive JDBC driver, you can execute a Hive query and obtain a 
 HiveResultSet instance that contains the results of the query.
 Unfortunately, HiveResultSet can then only fetch a single row of these 
 results from the Hive server at a time. As a consequence, it's extremely slow 
 to fetch a resultset of anything other than a trivial size.
 It would be nice for the HiveResultSet to be able to fetch N rows from the 
 server at a time, so that performance is suitable to support applications 
 that provide human interaction. 
 (From memory, I think it took me around 20 minutes to fetch 4000 rows.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Jenkins build is back to normal : Hive-trunk-h0.20 #617

2011-03-17 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/617/changes




Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-03-17 Thread John Sichi

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/505/#review339
---



ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
https://reviews.apache.org/r/505/#comment684

Suggestion is to make this configurable (via IDXPROPERTIES) to save space 
when column is known NOT NULL.  (Also later to allow for specification of other 
aggregates.)



ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java
https://reviews.apache.org/r/505/#comment686

Indentation is messed up here.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java
https://reviews.apache.org/r/505/#comment685

Please eliminate all TODO's, and don't use printStackTrace.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java
https://reviews.apache.org/r/505/#comment687

Instead of downcasting over and over, you should probably be doing it just 
once in the calling method (and asserting that you got the right type since 
otherwise generateOperatorTree is not going to have the desired effect.




ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java
https://reviews.apache.org/r/505/#comment688

Hive naming convention for variables is camelCase, not under_score.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java
https://reviews.apache.org/r/505/#comment690

I see query_has_distinct being written but never read.  Why do we need it?  
I don't think we want to be relying on the parse tree at all.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java
https://reviews.apache.org/r/505/#comment689

Don't swallow exceptions like this.


- John


On 2011-03-13 20:00:28, Prajakta Kalmegh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/505/
 ---
 
 (Updated 2011-03-13 20:00:28)
 
 
 Review request for hive.
 
 
 Summary
 ---
 
 New Review starting from patch 3.
 
 
 This addresses bug HIVE-1694.
 https://issues.apache.org/jira/browse/HIVE-1694
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 46739b7 
   ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 308d985 
   ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 916b235 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 50db44c 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteIndexSubqueryCtx.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteIndexSubqueryProcFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteRemoveGroupbyCtx.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteRemoveGroupbyProcFactory.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
 04f560f 
   ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
   ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/505/diff
 
 
 Testing
 ---
 
 
 Thanks,
 
 Prajakta
 




[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-03-17 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008235#comment-13008235
 ] 

John Sichi commented on HIVE-1694:
--

I added a few review board comments; there are a lot of places where the 
exception handling is still wrong; I didn't comment on all of those but they 
need to be fixed.

We still need to reconcile with HIVE-1803, but I'll ask Namit and Yongqiang to 
take a look now to get their comments on the rewrite implementation.


 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-2052) PostHook and PreHook API to add flag to indicate it is pre or post hook plus cache for content summary

2011-03-17 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008238#comment-13008238
 ] 

Joydeep Sen Sarma commented on HIVE-2052:
-

small nits:
- setinputpathtocontentsummary is called twice on the same hookcontext object
- we are setting the hook type again and again (can do it once before calling 
postexecute)

should the inputpathtocontentsummary be marked final in the hook and passed 
along with the constructor? (why would we ever change the map to a new one?).

 PostHook and PreHook API to add flag to indicate it is pre or post hook plus 
 cache for content summary
 --

 Key: HIVE-2052
 URL: https://issues.apache.org/jira/browse/HIVE-2052
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2052.1.patch, HIVE-2052.2.patch


 This will allow hooks to share some information better and reduce their 
 latency

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel

2011-03-17 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2051:
--

Attachment: HIVE-2051.4.patch

 getInputSummary() to call FileSystem.getContentSummary() in parallel
 

 Key: HIVE-2051
 URL: https://issues.apache.org/jira/browse/HIVE-2051
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, 
 HIVE-2051.4.patch


 getInputSummary() now call FileSystem.getContentSummary() one by one, which 
 can be extremely slow when the number of input paths are huge. By calling 
 those functions in parallel, we can cut latency in most cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2052) PostHook and PreHook API to add flag to indicate it is pre or post hook plus cache for content summary

2011-03-17 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2052:
--

Attachment: HIVE-2051.3.patch

Modify as Joydeep's comments.

 PostHook and PreHook API to add flag to indicate it is pre or post hook plus 
 cache for content summary
 --

 Key: HIVE-2052
 URL: https://issues.apache.org/jira/browse/HIVE-2052
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2051.3.patch, HIVE-2052.1.patch, HIVE-2052.2.patch


 This will allow hooks to share some information better and reduce their 
 latency

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-1644) use filter pushdown for automatically accessing indexes

2011-03-17 Thread Russell Melick (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Melick updated HIVE-1644:
-

Attachment: HIVE-1644.8.patch

HIVE-1644.8.patch

Fixed unit tests per John and Yonqiang.  Cleaned up comments.  Ready for 
review.  I still have a few questions that are probably best answered in the 
review:

 * When we have multiple indexes, and we get different tasks lists from 
querying each index, what should we do?  Right now we use all tasks 
(IndexWhereProcessor.java:57)
 * Is it possible to improve the regex we use so that it only matches WHERE 
clauses?  Right now we use FIL to get to the WHERE 
(IndexWhereTaskDispatcher.java:141)
 * What comparison operators should we support?  Right now it's only , , and 
=.  We don't have = or = (CompactIndexHandler.java:272)

Should I put this into the reviewboard?

 use filter pushdown for automatically accessing indexes
 ---

 Key: HIVE-1644
 URL: https://issues.apache.org/jira/browse/HIVE-1644
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.7.0
Reporter: John Sichi
Assignee: Russell Melick
 Attachments: HIVE-1644.1.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, 
 HIVE-1644.4.patch, HIVE-1644.5.patch, HIVE-1644.6.patch, HIVE-1644.7.patch, 
 HIVE-1644.8.patch


 HIVE-1226 provides utilities for analyzing filters which have been pushed 
 down to a table scan.  The next step is to use these for selecting available 
 indexes and generating access plans for those indexes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Assigned: (HIVE-1815) The class HiveResultSet should implement batch fetching.

2011-03-17 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang reassigned HIVE-1815:


Assignee: Bennie Schut

 The class HiveResultSet should implement batch fetching.
 

 Key: HIVE-1815
 URL: https://issues.apache.org/jira/browse/HIVE-1815
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.8.0
 Environment: Custom Java application using the Hive JDBC driver to 
 connect to a Hive server, execute a Hive query and process the results.
Reporter: Guy le Mar
Assignee: Bennie Schut
 Fix For: 0.8.0

 Attachments: HIVE-1815.1.patch.txt, HIVE-1815.2.patch.txt


 When using the Hive JDBC driver, you can execute a Hive query and obtain a 
 HiveResultSet instance that contains the results of the query.
 Unfortunately, HiveResultSet can then only fetch a single row of these 
 results from the Hive server at a time. As a consequence, it's extremely slow 
 to fetch a resultset of anything other than a trivial size.
 It would be nice for the HiveResultSet to be able to fetch N rows from the 
 server at a time, so that performance is suitable to support applications 
 that provide human interaction. 
 (From memory, I think it took me around 20 minutes to fetch 4000 rows.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-1815) The class HiveResultSet should implement batch fetching.

2011-03-17 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1815:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Bennie!

 The class HiveResultSet should implement batch fetching.
 

 Key: HIVE-1815
 URL: https://issues.apache.org/jira/browse/HIVE-1815
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.8.0
 Environment: Custom Java application using the Hive JDBC driver to 
 connect to a Hive server, execute a Hive query and process the results.
Reporter: Guy le Mar
Assignee: Bennie Schut
 Fix For: 0.8.0

 Attachments: HIVE-1815.1.patch.txt, HIVE-1815.2.patch.txt


 When using the Hive JDBC driver, you can execute a Hive query and obtain a 
 HiveResultSet instance that contains the results of the query.
 Unfortunately, HiveResultSet can then only fetch a single row of these 
 results from the Hive server at a time. As a consequence, it's extremely slow 
 to fetch a resultset of anything other than a trivial size.
 It would be nice for the HiveResultSet to be able to fetch N rows from the 
 server at a time, so that performance is suitable to support applications 
 that provide human interaction. 
 (From memory, I think it took me around 20 minutes to fetch 4000 rows.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1644) use filter pushdown for automatically accessing indexes

2011-03-17 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008323#comment-13008323
 ] 

John Sichi commented on HIVE-1644:
--

Russell, the plan still looks wrong.  It shows two stage 1's, with a dependency 
from one to the other.  The stage numbers should be unique, so probably this is 
due to the way we merge the two queries?

 use filter pushdown for automatically accessing indexes
 ---

 Key: HIVE-1644
 URL: https://issues.apache.org/jira/browse/HIVE-1644
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.7.0
Reporter: John Sichi
Assignee: Russell Melick
 Attachments: HIVE-1644.1.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, 
 HIVE-1644.4.patch, HIVE-1644.5.patch, HIVE-1644.6.patch, HIVE-1644.7.patch, 
 HIVE-1644.8.patch


 HIVE-1226 provides utilities for analyzing filters which have been pushed 
 down to a table scan.  The next step is to use these for selecting available 
 indexes and generating access plans for those indexes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel

2011-03-17 Thread MIS (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008328#comment-13008328
 ] 

MIS commented on HIVE-2051:
---

Yes it is necessary for the executor to be terminated if the jobs have been 
submitted to it, even though submitted jobs may have been completed. 

However, what we need not do here is, after the executor is shutdown, await 
till the termination gets over, since this is redundant. As all the submitted 
jobs to the executor will be completed by the time we shutdown the executor. 
This is what is ensured when we do result.get()
i.e., the following piece of code is not required.
+  do {
+try {
+  executor.awaitTermination(Integer.MAX_VALUE, TimeUnit.SECONDS);
+  executorDone = true;
+} catch (InterruptedException e) {
+}
+  } while (!executorDone);

 getInputSummary() to call FileSystem.getContentSummary() in parallel
 

 Key: HIVE-2051
 URL: https://issues.apache.org/jira/browse/HIVE-2051
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, 
 HIVE-2051.4.patch


 getInputSummary() now call FileSystem.getContentSummary() one by one, which 
 can be extremely slow when the number of input paths are huge. By calling 
 those functions in parallel, we can cut latency in most cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel

2011-03-17 Thread MIS (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008331#comment-13008331
 ] 

MIS commented on HIVE-2051:
---

The solution to this issue resembles that of HIVE-2026, so we can follow a 
similar approach.

 getInputSummary() to call FileSystem.getContentSummary() in parallel
 

 Key: HIVE-2051
 URL: https://issues.apache.org/jira/browse/HIVE-2051
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, 
 HIVE-2051.4.patch


 getInputSummary() now call FileSystem.getContentSummary() one by one, which 
 can be extremely slow when the number of input paths are huge. By calling 
 those functions in parallel, we can cut latency in most cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-2054) Exception on windows when using the jdbc driver. IOException: The system cannot find the path specified

2011-03-17 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008333#comment-13008333
 ] 

Ning Zhang commented on HIVE-2054:
--

Sounds reasonable to me. 

Raghu, what do you think?

 Exception on windows when using the jdbc driver. IOException: The system 
 cannot find the path specified
 -

 Key: HIVE-2054
 URL: https://issues.apache.org/jira/browse/HIVE-2054
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.8.0
Reporter: Bennie Schut
Assignee: Bennie Schut
Priority: Minor
 Fix For: 0.8.0

 Attachments: HIVE-2054.1.patch.txt


 It seems something recently changed on the jdbc driver which causes this 
 IOException on windows.
 java.lang.RuntimeException: java.io.IOException: The system cannot find the 
 path specified
   at 
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:237)
   at 
 org.apache.hadoop.hive.jdbc.HiveConnection.init(HiveConnection.java:73)
   at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:110)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (HIVE-2061) Create a hive_contrib.jar symlink to hive-contrib-{version}.jar for backward compatibility

2011-03-17 Thread Ning Zhang (JIRA)
Create a hive_contrib.jar symlink to hive-contrib-{version}.jar for backward 
compatibility
--

 Key: HIVE-2061
 URL: https://issues.apache.org/jira/browse/HIVE-2061
 Project: Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
Priority: Minor


We have seen a use case where in the user's script, it run 'add jar 
hive_contrib.jar'. Since Hive has moved the jar file to be 
hive-contrib-{version}.jar, it introduced backward incompatibility. If we as 
the user to change the script and when Hive upgrade version again, the user 
need to change the script again. Creating a symlink seems to be the best 
solution. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2061) Create a hive_contrib.jar symlink to hive-contrib-{version}.jar for backward compatibility

2011-03-17 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2061:
-

Attachment: HIVE-2061.patch

 Create a hive_contrib.jar symlink to hive-contrib-{version}.jar for backward 
 compatibility
 --

 Key: HIVE-2061
 URL: https://issues.apache.org/jira/browse/HIVE-2061
 Project: Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
Priority: Minor
 Attachments: HIVE-2061.patch


 We have seen a use case where in the user's script, it run 'add jar 
 hive_contrib.jar'. Since Hive has moved the jar file to be 
 hive-contrib-{version}.jar, it introduced backward incompatibility. If we as 
 the user to change the script and when Hive upgrade version again, the user 
 need to change the script again. Creating a symlink seems to be the best 
 solution. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira