[jira] Commented: (HIVE-2053) Hive can't find the Plan
[ https://issues.apache.org/jira/browse/HIVE-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007839#comment-13007839 ] Aaron Guo commented on HIVE-2053: - Zhangning,this isn't a bug. sorry for your time. Hive can't find the Plan Key: HIVE-2053 URL: https://issues.apache.org/jira/browse/HIVE-2053 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.0 Reporter: Aaron Guo Priority: Critical Attachments: patch-1.patch We I execute this SQL: select count(1) from table1; The MR can't execute for it can't find the Plan File in local path. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1815) The class HiveResultSet should implement batch fetching.
[ https://issues.apache.org/jira/browse/HIVE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bennie Schut updated HIVE-1815: --- Attachment: HIVE-1815.2.patch.txt Updated to use an iterator instead of deleting items. The class HiveResultSet should implement batch fetching. Key: HIVE-1815 URL: https://issues.apache.org/jira/browse/HIVE-1815 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.8.0 Environment: Custom Java application using the Hive JDBC driver to connect to a Hive server, execute a Hive query and process the results. Reporter: Guy le Mar Fix For: 0.8.0 Attachments: HIVE-1815.1.patch.txt, HIVE-1815.2.patch.txt When using the Hive JDBC driver, you can execute a Hive query and obtain a HiveResultSet instance that contains the results of the query. Unfortunately, HiveResultSet can then only fetch a single row of these results from the Hive server at a time. As a consequence, it's extremely slow to fetch a resultset of anything other than a trivial size. It would be nice for the HiveResultSet to be able to fetch N rows from the server at a time, so that performance is suitable to support applications that provide human interaction. (From memory, I think it took me around 20 minutes to fetch 4000 rows.) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-trunk-h0.20 #615
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/615/changes Changes: [jvs] HIVE-2059. Add datanucleus.identifierFactory property HiveConf to avoid unintentional MetaStore Schema corruption (Carl Steinbach via jvs) -- [...truncated 27861 lines...] [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-03-17_01-17-25_028_5624071252066739812/-mr-1 [junit] Total MapReduce jobs = 1 [junit] Launching Job 1 out of 1 [junit] Number of reduce tasks determined at compile time: 1 [junit] In order to change the average load for a reducer (in bytes): [junit] set hive.exec.reducers.bytes.per.reducer=number [junit] In order to limit the maximum number of reducers: [junit] set hive.exec.reducers.max=number [junit] In order to set a constant number of reducers: [junit] set mapred.reduce.tasks=number [junit] Job running in-process (local Hadoop) [junit] Hadoop job information for null: number of mappers: 0; number of reducers: 0 [junit] 2011-03-17 01:17:28,126 null map = 100%, reduce = 100% [junit] Ended Job = job_local_0001 [junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-03-17_01-17-25_028_5624071252066739812/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103170117_1121386497.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-03-17_01-17-29_660_4426750602106202109/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-03-17_01-17-29_660_4426750602106202109/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK:
[jira] Commented: (HIVE-2054) Exception on windows when using the jdbc driver. IOException: The system cannot find the path specified
[ https://issues.apache.org/jira/browse/HIVE-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007848#comment-13007848 ] Bennie Schut commented on HIVE-2054: Yes setting hive.querylog.location makes it work. At the very least we should remove the extends SessionState since it introduces a link to the hive server code which makes no sense at this point in time. However I have a preference for removing it all together since it currently adds no value. On the jdbc side I would expect the HiveConnection to hold the state which it is actually doing right now. Exception on windows when using the jdbc driver. IOException: The system cannot find the path specified - Key: HIVE-2054 URL: https://issues.apache.org/jira/browse/HIVE-2054 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.8.0 Reporter: Bennie Schut Assignee: Bennie Schut Priority: Minor Fix For: 0.8.0 Attachments: HIVE-2054.1.patch.txt It seems something recently changed on the jdbc driver which causes this IOException on windows. java.lang.RuntimeException: java.io.IOException: The system cannot find the path specified at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:237) at org.apache.hadoop.hive.jdbc.HiveConnection.init(HiveConnection.java:73) at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:110) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-0.7.0-h0.20 #41
See https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/41/changes Changes: [jvs] HIVE-2059. Add datanucleus.identifierFactory property HiveConf to avoid unintentional MetaStore Schema corruption (Carl Steinbach via jvs) -- [...truncated 27312 lines...] [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-03-17_01-47-58_661_1540025728397251083/-mr-1 [junit] Total MapReduce jobs = 1 [junit] Launching Job 1 out of 1 [junit] Number of reduce tasks determined at compile time: 1 [junit] In order to change the average load for a reducer (in bytes): [junit] set hive.exec.reducers.bytes.per.reducer=number [junit] In order to limit the maximum number of reducers: [junit] set hive.exec.reducers.max=number [junit] In order to set a constant number of reducers: [junit] set mapred.reduce.tasks=number [junit] Job running in-process (local Hadoop) [junit] 2011-03-17 01:48:01,677 null map = 100%, reduce = 100% [junit] Ended Job = job_local_0001 [junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-03-17_01-47-58_661_1540025728397251083/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103170148_942168971.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-03-17_01-48-03_350_385696872197569180/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-03-17_01-48-03_350_385696872197569180/-mr-1 [junit] OK [junit] PREHOOK: query: drop table
Root/ Fetch Stage
Hi, when exploring the Hive Explain statement we were wondering about the different stages. So here two questions regarding the below Explain statement 1. Why are there two root stages? What exactly does root stage mean (i assume it meanst there are no predecessors)? 2. What exactly is a Fetch Stage? Is it an actual MapReduce stage? 3. Where can I find additional information about these stages in general? Thanks a lot for your support JS P.S. This has already been posted to the user mailing list but from there we unfortunately received no reply... hive EXPLAIN SELECT l_orderkey, o_shippingpriority, sum(l_extendedprice) FROM orders JOIN lineitem ON (lineitem.l_orderkey = orders.o_orderkey) GROUP BY l_orderkey, o_shippingpriority; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF orders) (TOK_TABREF lineitem) (= (. (TOK_TABLE_OR_COL lineitem) l_orderkey) (. (TOK_TABLE_OR_COL orders) o_orderkey (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL l_orderkey)) (TOK_SELEXPR (TOK_TABLE_OR_COL o_shippingpriority)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_TABLE_OR_COL l_extendedprice (TOK_GROUPBY (TOK_TABLE_OR_COL l_orderkey) (TOK_TABLE_OR_COL o_shippingpriority STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1 Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: lineitem TableScan alias: lineitem Reduce Output Operator key expressions: expr: l_orderkey type: int sort order: + Map-reduce partition columns: expr: l_orderkey type: int tag: 1 value expressions: expr: l_orderkey type: int expr: l_extendedprice type: int orders TableScan alias: orders Reduce Output Operator key expressions: expr: o_orderkey type: int sort order: + Map-reduce partition columns: expr: o_orderkey type: int tag: 0 value expressions: expr: o_shippingpriority type: int Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col1} 1 {VALUE._col0} {VALUE._col1} handleSkewJoin: false outputColumnNames: _col1, _col3, _col4 Select Operator expressions: expr: _col3 type: int expr: _col1 type: int expr: _col4 type: int outputColumnNames: _col3, _col1, _col4 Group By Operator aggregations: expr: sum(_col4) bucketGroup: false keys: expr: _col3 type: int expr: _col1 type: int mode: hash outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-2 Map Reduce Alias - Map Operator Tree: hdfs://localhost:9000/tmp/hive-joergschad/hive_2011-03-14_17-22-14_249_1239673786236436657/10002 Reduce Output Operator key expressions: expr: _col0 type: int expr: _col1 type: int sort order: ++ Map-reduce partition columns: expr: _col0 type: int expr: _col1 type: int tag: -1 value expressions: expr: _col2 type: bigint Reduce Operator Tree: Group By Operator aggregations: expr: sum(VALUE._col0) bucketGroup: false keys: expr: KEY._col0 type: int expr: KEY._col1 type: int mode: mergepartial outputColumnNames: _col0, _col1, _col2 Select Operator expressions: expr: _col0 type: int expr: _col1 type: int expr: _col2 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1
Re: Review Request: HIVE-1815: The class HiveResultSet should implement batch fetching.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/514/ --- (Updated 2011-03-17 01:06:34.734673) Review request for hive. Changes --- Updated to use an iterator instead of deleting items. Summary --- HIVE-1815: The class HiveResultSet should implement batch fetching. This addresses bug HIVE-1815. https://issues.apache.org/jira/browse/HIVE-1815 Diffs (updated) - trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveQueryResultSet.java 1081785 trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveStatement.java 1081785 trunk/jdbc/src/test/org/apache/hadoop/hive/jdbc/TestJdbcDriver.java 1081785 Diff: https://reviews.apache.org/r/514/diff Testing --- Thanks, Bennie
[jira] Created: (HIVE-2060) CLI local mode hit NPE when exiting by ^D
CLI local mode hit NPE when exiting by ^D - Key: HIVE-2060 URL: https://issues.apache.org/jira/browse/HIVE-2060 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.8.0 Reporter: Ning Zhang Assignee: Ning Zhang Priority: Minor Fix For: 0.8.0 CLI gets an NPE when running in local mode and hit an ^D to exit it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2060) CLI local mode hit NPE when exiting by ^D
[ https://issues.apache.org/jira/browse/HIVE-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2060: - Attachment: HIVE-2060.patch CLI local mode hit NPE when exiting by ^D - Key: HIVE-2060 URL: https://issues.apache.org/jira/browse/HIVE-2060 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.8.0 Reporter: Ning Zhang Assignee: Ning Zhang Priority: Minor Fix For: 0.8.0 Attachments: HIVE-2060.patch CLI gets an NPE when running in local mode and hit an ^D to exit it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2060) CLI local mode hit NPE when exiting by ^D
[ https://issues.apache.org/jira/browse/HIVE-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2060: - Status: Patch Available (was: Open) CLI local mode hit NPE when exiting by ^D - Key: HIVE-2060 URL: https://issues.apache.org/jira/browse/HIVE-2060 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.8.0 Reporter: Ning Zhang Assignee: Ning Zhang Priority: Minor Fix For: 0.8.0 Attachments: HIVE-2060.patch CLI gets an NPE when running in local mode and hit an ^D to exit it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2060) CLI local mode hit NPE when exiting by ^D
[ https://issues.apache.org/jira/browse/HIVE-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008065#comment-13008065 ] He Yongqiang commented on HIVE-2060: +1, running tests CLI local mode hit NPE when exiting by ^D - Key: HIVE-2060 URL: https://issues.apache.org/jira/browse/HIVE-2060 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.8.0 Reporter: Ning Zhang Assignee: Ning Zhang Priority: Minor Fix For: 0.8.0 Attachments: HIVE-2060.patch CLI gets an NPE when running in local mode and hit an ^D to exit it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1384) HiveServer should run as the user who submitted the query.
[ https://issues.apache.org/jira/browse/HIVE-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008077#comment-13008077 ] Ankita Bakshi commented on HIVE-1384: - This is required to use hive authorization infrastructure. HiveServer should run as the user who submitted the query. -- Key: HIVE-1384 URL: https://issues.apache.org/jira/browse/HIVE-1384 Project: Hive Issue Type: Improvement Components: Metastore, Server Infrastructure Reporter: He Yongqiang Assignee: He Yongqiang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-0.7.0-h0.20 #42
See https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/42/ -- [...truncated 27035 lines...] [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' INTO TABLE src [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.src [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' INTO TABLE src [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src [junit] OK [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt' INTO TABLE src1 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt [junit] Loading data to table default.src1 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt' INTO TABLE src1 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src1 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq' INTO TABLE src_sequencefile [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq [junit] Loading data to table default.src_sequencefile [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq' INTO TABLE src_sequencefile [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src_sequencefile [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq' INTO TABLE src_thrift [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq [junit] Loading data to table default.src_thrift [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq' INTO TABLE src_thrift [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src_thrift [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt' INTO TABLE src_json [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt [junit] Loading data to table default.src_json [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt' INTO TABLE src_json [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src_json [junit] OK [junit] diff https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/ql/test/logs/negative/wrong_distinct1.q.out https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/ql/src/test/results/compiler/errors/wrong_distinct1.q.out [junit] Done query: wrong_distinct1.q [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/ql/tmp/hive_job_log_hudson_201103171212_142244274.txt [junit] Begin query: wrong_distinct2.q [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/ql/tmp/hive_job_log_hudson_201103171212_1172891919.txt [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-08',hr='11') [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.srcpart partition (ds=2008-04-08, hr=11) [junit] POSTHOOK: query: LOAD DATA
Build failed in Jenkins: Hive-trunk-h0.20 #616
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/616/ -- [...truncated 27983 lines...] [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-03-17_12-15-12_194_1579804070209208471/-mr-1 [junit] Total MapReduce jobs = 1 [junit] Launching Job 1 out of 1 [junit] Number of reduce tasks determined at compile time: 1 [junit] In order to change the average load for a reducer (in bytes): [junit] set hive.exec.reducers.bytes.per.reducer=number [junit] In order to limit the maximum number of reducers: [junit] set hive.exec.reducers.max=number [junit] In order to set a constant number of reducers: [junit] set mapred.reduce.tasks=number [junit] Job running in-process (local Hadoop) [junit] Hadoop job information for null: number of mappers: 0; number of reducers: 0 [junit] 2011-03-17 12:15:15,304 null map = 100%, reduce = 100% [junit] Ended Job = job_local_0001 [junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-03-17_12-15-12_194_1579804070209208471/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103171215_555658050.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-03-17_12-15-16_806_158600829949742058/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-03-17_12-15-16_806_158600829949742058/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output:
[jira] Updated: (HIVE-1959) Potential memory leak when same connection used for long time. TaskInfo and QueryInfo objects are getting accumulated on executing more queries on the same connection.
[ https://issues.apache.org/jira/browse/HIVE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1959: - Resolution: Fixed Fix Version/s: 0.8.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Comitted. Thanks Chinna! Potential memory leak when same connection used for long time. TaskInfo and QueryInfo objects are getting accumulated on executing more queries on the same connection. --- Key: HIVE-1959 URL: https://issues.apache.org/jira/browse/HIVE-1959 Project: Hive Issue Type: Bug Components: Server Infrastructure Affects Versions: 0.8.0 Environment: Hadoop 0.20.1, Hive0.5.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: 0.8.0 Attachments: HIVE-1959.patch *org.apache.hadoop.hive.ql.history.HiveHistory$TaskInfo* and *org.apache.hadoop.hive.ql.history.HiveHistory$QueryInfo* these two objects are getting accumulated on executing more number of queries on the same connection. These objects are getting released only when the connection is closed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1815) The class HiveResultSet should implement batch fetching.
[ https://issues.apache.org/jira/browse/HIVE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008205#comment-13008205 ] Ning Zhang commented on HIVE-1815: -- +1. Will commit if tests pass. The class HiveResultSet should implement batch fetching. Key: HIVE-1815 URL: https://issues.apache.org/jira/browse/HIVE-1815 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.8.0 Environment: Custom Java application using the Hive JDBC driver to connect to a Hive server, execute a Hive query and process the results. Reporter: Guy le Mar Fix For: 0.8.0 Attachments: HIVE-1815.1.patch.txt, HIVE-1815.2.patch.txt When using the Hive JDBC driver, you can execute a Hive query and obtain a HiveResultSet instance that contains the results of the query. Unfortunately, HiveResultSet can then only fetch a single row of these results from the Hive server at a time. As a consequence, it's extremely slow to fetch a resultset of anything other than a trivial size. It would be nice for the HiveResultSet to be able to fetch N rows from the server at a time, so that performance is suitable to support applications that provide human interaction. (From memory, I think it took me around 20 minutes to fetch 4000 rows.) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Jenkins build is back to normal : Hive-trunk-h0.20 #617
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/617/changes
Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/505/#review339 --- ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java https://reviews.apache.org/r/505/#comment684 Suggestion is to make this configurable (via IDXPROPERTIES) to save space when column is known NOT NULL. (Also later to allow for specification of other aggregates.) ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java https://reviews.apache.org/r/505/#comment686 Indentation is messed up here. ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java https://reviews.apache.org/r/505/#comment685 Please eliminate all TODO's, and don't use printStackTrace. ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java https://reviews.apache.org/r/505/#comment687 Instead of downcasting over and over, you should probably be doing it just once in the calling method (and asserting that you got the right type since otherwise generateOperatorTree is not going to have the desired effect. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java https://reviews.apache.org/r/505/#comment688 Hive naming convention for variables is camelCase, not under_score. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java https://reviews.apache.org/r/505/#comment690 I see query_has_distinct being written but never read. Why do we need it? I don't think we want to be relying on the parse tree at all. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java https://reviews.apache.org/r/505/#comment689 Don't swallow exceptions like this. - John On 2011-03-13 20:00:28, Prajakta Kalmegh wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/505/ --- (Updated 2011-03-13 20:00:28) Review request for hive. Summary --- New Review starting from patch 3. This addresses bug HIVE-1694. https://issues.apache.org/jira/browse/HIVE-1694 Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 46739b7 ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 308d985 ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 916b235 ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 50db44c ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteIndexSubqueryCtx.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteIndexSubqueryProcFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteRemoveGroupbyCtx.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteRemoveGroupbyProcFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 04f560f ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION Diff: https://reviews.apache.org/r/505/diff Testing --- Thanks, Prajakta
[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008235#comment-13008235 ] John Sichi commented on HIVE-1694: -- I added a few review board comments; there are a lot of places where the exception handling is still wrong; I didn't comment on all of those but they need to be fixed. We still need to reconcile with HIVE-1803, but I'll ask Namit and Yongqiang to take a look now to get their comments on the rewrite implementation. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2052) PostHook and PreHook API to add flag to indicate it is pre or post hook plus cache for content summary
[ https://issues.apache.org/jira/browse/HIVE-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008238#comment-13008238 ] Joydeep Sen Sarma commented on HIVE-2052: - small nits: - setinputpathtocontentsummary is called twice on the same hookcontext object - we are setting the hook type again and again (can do it once before calling postexecute) should the inputpathtocontentsummary be marked final in the hook and passed along with the constructor? (why would we ever change the map to a new one?). PostHook and PreHook API to add flag to indicate it is pre or post hook plus cache for content summary -- Key: HIVE-2052 URL: https://issues.apache.org/jira/browse/HIVE-2052 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2052.1.patch, HIVE-2052.2.patch This will allow hooks to share some information better and reduce their latency -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel
[ https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2051: -- Attachment: HIVE-2051.4.patch getInputSummary() to call FileSystem.getContentSummary() in parallel Key: HIVE-2051 URL: https://issues.apache.org/jira/browse/HIVE-2051 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, HIVE-2051.4.patch getInputSummary() now call FileSystem.getContentSummary() one by one, which can be extremely slow when the number of input paths are huge. By calling those functions in parallel, we can cut latency in most cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2052) PostHook and PreHook API to add flag to indicate it is pre or post hook plus cache for content summary
[ https://issues.apache.org/jira/browse/HIVE-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2052: -- Attachment: HIVE-2051.3.patch Modify as Joydeep's comments. PostHook and PreHook API to add flag to indicate it is pre or post hook plus cache for content summary -- Key: HIVE-2052 URL: https://issues.apache.org/jira/browse/HIVE-2052 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2051.3.patch, HIVE-2052.1.patch, HIVE-2052.2.patch This will allow hooks to share some information better and reduce their latency -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1644) use filter pushdown for automatically accessing indexes
[ https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Melick updated HIVE-1644: - Attachment: HIVE-1644.8.patch HIVE-1644.8.patch Fixed unit tests per John and Yonqiang. Cleaned up comments. Ready for review. I still have a few questions that are probably best answered in the review: * When we have multiple indexes, and we get different tasks lists from querying each index, what should we do? Right now we use all tasks (IndexWhereProcessor.java:57) * Is it possible to improve the regex we use so that it only matches WHERE clauses? Right now we use FIL to get to the WHERE (IndexWhereTaskDispatcher.java:141) * What comparison operators should we support? Right now it's only , , and =. We don't have = or = (CompactIndexHandler.java:272) Should I put this into the reviewboard? use filter pushdown for automatically accessing indexes --- Key: HIVE-1644 URL: https://issues.apache.org/jira/browse/HIVE-1644 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.7.0 Reporter: John Sichi Assignee: Russell Melick Attachments: HIVE-1644.1.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, HIVE-1644.4.patch, HIVE-1644.5.patch, HIVE-1644.6.patch, HIVE-1644.7.patch, HIVE-1644.8.patch HIVE-1226 provides utilities for analyzing filters which have been pushed down to a table scan. The next step is to use these for selecting available indexes and generating access plans for those indexes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (HIVE-1815) The class HiveResultSet should implement batch fetching.
[ https://issues.apache.org/jira/browse/HIVE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang reassigned HIVE-1815: Assignee: Bennie Schut The class HiveResultSet should implement batch fetching. Key: HIVE-1815 URL: https://issues.apache.org/jira/browse/HIVE-1815 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.8.0 Environment: Custom Java application using the Hive JDBC driver to connect to a Hive server, execute a Hive query and process the results. Reporter: Guy le Mar Assignee: Bennie Schut Fix For: 0.8.0 Attachments: HIVE-1815.1.patch.txt, HIVE-1815.2.patch.txt When using the Hive JDBC driver, you can execute a Hive query and obtain a HiveResultSet instance that contains the results of the query. Unfortunately, HiveResultSet can then only fetch a single row of these results from the Hive server at a time. As a consequence, it's extremely slow to fetch a resultset of anything other than a trivial size. It would be nice for the HiveResultSet to be able to fetch N rows from the server at a time, so that performance is suitable to support applications that provide human interaction. (From memory, I think it took me around 20 minutes to fetch 4000 rows.) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1815) The class HiveResultSet should implement batch fetching.
[ https://issues.apache.org/jira/browse/HIVE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1815: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Bennie! The class HiveResultSet should implement batch fetching. Key: HIVE-1815 URL: https://issues.apache.org/jira/browse/HIVE-1815 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.8.0 Environment: Custom Java application using the Hive JDBC driver to connect to a Hive server, execute a Hive query and process the results. Reporter: Guy le Mar Assignee: Bennie Schut Fix For: 0.8.0 Attachments: HIVE-1815.1.patch.txt, HIVE-1815.2.patch.txt When using the Hive JDBC driver, you can execute a Hive query and obtain a HiveResultSet instance that contains the results of the query. Unfortunately, HiveResultSet can then only fetch a single row of these results from the Hive server at a time. As a consequence, it's extremely slow to fetch a resultset of anything other than a trivial size. It would be nice for the HiveResultSet to be able to fetch N rows from the server at a time, so that performance is suitable to support applications that provide human interaction. (From memory, I think it took me around 20 minutes to fetch 4000 rows.) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1644) use filter pushdown for automatically accessing indexes
[ https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008323#comment-13008323 ] John Sichi commented on HIVE-1644: -- Russell, the plan still looks wrong. It shows two stage 1's, with a dependency from one to the other. The stage numbers should be unique, so probably this is due to the way we merge the two queries? use filter pushdown for automatically accessing indexes --- Key: HIVE-1644 URL: https://issues.apache.org/jira/browse/HIVE-1644 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.7.0 Reporter: John Sichi Assignee: Russell Melick Attachments: HIVE-1644.1.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, HIVE-1644.4.patch, HIVE-1644.5.patch, HIVE-1644.6.patch, HIVE-1644.7.patch, HIVE-1644.8.patch HIVE-1226 provides utilities for analyzing filters which have been pushed down to a table scan. The next step is to use these for selecting available indexes and generating access plans for those indexes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel
[ https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008328#comment-13008328 ] MIS commented on HIVE-2051: --- Yes it is necessary for the executor to be terminated if the jobs have been submitted to it, even though submitted jobs may have been completed. However, what we need not do here is, after the executor is shutdown, await till the termination gets over, since this is redundant. As all the submitted jobs to the executor will be completed by the time we shutdown the executor. This is what is ensured when we do result.get() i.e., the following piece of code is not required. + do { +try { + executor.awaitTermination(Integer.MAX_VALUE, TimeUnit.SECONDS); + executorDone = true; +} catch (InterruptedException e) { +} + } while (!executorDone); getInputSummary() to call FileSystem.getContentSummary() in parallel Key: HIVE-2051 URL: https://issues.apache.org/jira/browse/HIVE-2051 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, HIVE-2051.4.patch getInputSummary() now call FileSystem.getContentSummary() one by one, which can be extremely slow when the number of input paths are huge. By calling those functions in parallel, we can cut latency in most cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel
[ https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008331#comment-13008331 ] MIS commented on HIVE-2051: --- The solution to this issue resembles that of HIVE-2026, so we can follow a similar approach. getInputSummary() to call FileSystem.getContentSummary() in parallel Key: HIVE-2051 URL: https://issues.apache.org/jira/browse/HIVE-2051 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, HIVE-2051.4.patch getInputSummary() now call FileSystem.getContentSummary() one by one, which can be extremely slow when the number of input paths are huge. By calling those functions in parallel, we can cut latency in most cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2054) Exception on windows when using the jdbc driver. IOException: The system cannot find the path specified
[ https://issues.apache.org/jira/browse/HIVE-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008333#comment-13008333 ] Ning Zhang commented on HIVE-2054: -- Sounds reasonable to me. Raghu, what do you think? Exception on windows when using the jdbc driver. IOException: The system cannot find the path specified - Key: HIVE-2054 URL: https://issues.apache.org/jira/browse/HIVE-2054 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.8.0 Reporter: Bennie Schut Assignee: Bennie Schut Priority: Minor Fix For: 0.8.0 Attachments: HIVE-2054.1.patch.txt It seems something recently changed on the jdbc driver which causes this IOException on windows. java.lang.RuntimeException: java.io.IOException: The system cannot find the path specified at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:237) at org.apache.hadoop.hive.jdbc.HiveConnection.init(HiveConnection.java:73) at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:110) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HIVE-2061) Create a hive_contrib.jar symlink to hive-contrib-{version}.jar for backward compatibility
Create a hive_contrib.jar symlink to hive-contrib-{version}.jar for backward compatibility -- Key: HIVE-2061 URL: https://issues.apache.org/jira/browse/HIVE-2061 Project: Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang Priority: Minor We have seen a use case where in the user's script, it run 'add jar hive_contrib.jar'. Since Hive has moved the jar file to be hive-contrib-{version}.jar, it introduced backward incompatibility. If we as the user to change the script and when Hive upgrade version again, the user need to change the script again. Creating a symlink seems to be the best solution. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2061) Create a hive_contrib.jar symlink to hive-contrib-{version}.jar for backward compatibility
[ https://issues.apache.org/jira/browse/HIVE-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2061: - Attachment: HIVE-2061.patch Create a hive_contrib.jar symlink to hive-contrib-{version}.jar for backward compatibility -- Key: HIVE-2061 URL: https://issues.apache.org/jira/browse/HIVE-2061 Project: Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang Priority: Minor Attachments: HIVE-2061.patch We have seen a use case where in the user's script, it run 'add jar hive_contrib.jar'. Since Hive has moved the jar file to be hive-contrib-{version}.jar, it introduced backward incompatibility. If we as the user to change the script and when Hive upgrade version again, the user need to change the script again. Creating a symlink seems to be the best solution. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira